Category archives: Tuxedo
Discussions about Linux, Unix and related technologies RSS
So today I started into the basic command-and-control part of the Listener system. The original tack I took was to create escaped commands inside the text stream using the "typing" substitutions (the stuff that converts ,comma into ','):
some text \action_do_something more text
But that got rather grotty rather fast when looking at corner cases (e.g. when you want to type \action to document that mechanism). So I reworked to have two different levels of operation, the first pre-processes to find commands and splits out the text such that you get a sequence of commands-and-text while interpreting. That should allow ...
So as of earlier this evening I've now got the Listener service hooked up such that I can dictate code into Eric (via an Eric Plug-in talking to the service over DBus). There's still an enormous amount that doesn't work. But that first-light moment has made me rather happy; instead of a collection of little tools/toys, there's something that has the rough shape of a working project.
The actual Eric code so far is about 150 lines, with a lot of that being boilerplate for an Eric plugin. I'll likely do quite a bit ...
[Update] I got it working by going down to the "connection" level and registering the callback there. The code below is updated with the working version... as of now I've got some basic "voice coding" possible, but lots more to do to make it practical and useful.
Plowing ahead with integrating Listener into Eric (my primary IDE). That seemed to be going swimmingly, got a new plugin created, set it up to notice new editors, disconnect old ones and (in theory) process new results by inserting the recognized and interpreted text at the current position. Then I tried to ...
Sometimes as you're developing a project it's easy to lose site of the end goal. I've spent so long on giving Listener context setup, code-to-speech converters, etc. that the actual point of the whole thing (i.e. letting you dictate text into an editor) hasn't actually been built. Today I started on that a bit more. I got the spike-test for a dbus service integrated into the main GUI application, so that you can actually get events from the same Listener that's showing you the app-tray icon and the results. I disabled the sending of ...
So I got tired of paying work this afternoon and decided I would work on getting a dbus service started for Listener. The idea here is that there will be a DBus service which does all the context management, microphone setup, playback, etc and client software (such as the main GUI and apps that want to allow voice coding without going through low-level-grotty simulated typing) can use to interact with it.
But how does one go about exposing objects on DBus in the DBus-ian way? It *seems* that object-paths should produce a REST-like hierarchy where each object I want to ...
So at this point Listener can generate a language model from a (Python) project and the dictation results from that are reasonable, but by no means perfect, still, you can dictate text as long as you stick to the words and statements already used in the project. There are some obvious reworks now showing up:
- we need the dictionaries to chain, and we likely need to extract a base "Python" statement-set by processing a few hundred projects (yay open source)
- we need to have separate statement, dictionary, etc. storage for the automatically generated and actual user-generated stuff so that on ...
Since the point of Listener is to allow for coding, the language model needs to reflect how one would dictate code. Further, to get reasonable accuracy I'm hoping to tightly focus the language models so that your language model for project X reflects the identifiers (and operators, etc) you are using in that project. To kick-start that process, I'm planning to run a piece of code over each project and generate from it a language model where we guess what you would have typed to produce that piece of code.
Which immediately runs into issues; do you think ...
I spent the day working on Listener. The bulk of the work was just cleanup, getting TODO items set up, fixing the setup script, etc. The actual task of voice dictation didn't get moved very far (I got a trivial "correct that") event working, but it doesn't actually make anything happen.
I also started thinking about how to integrate with other applications (and the desktop). That will likely be done via DBUS services, one for the "send keys" virtual-keyboard service and another for the per-session voice-dictation context.
So at some point I need the voice dictation client to be able to do basic interactions with applications on the desktop (think typing text and the like). So how do I go about doing that? I want to be compatible with Wayland when it shows up, but still work on X (since that's where I'm working now). That would seem to preclude using X event sending. What about making a "virtual keyboard" that actually sends the events through the Linux kernel event subsystem?
The resulting spike-test (using uinput) is checked into the listener project. It seems to ...