So at some point I need the voice dictation client to be able to do basic interactions with applications on the desktop (think typing text and the like). So how do I go about doing that? I want to be compatible with Wayland when it shows up, but still work on X (since that's where I'm working now). That would seem to preclude using X event sending. What about making a "virtual keyboard" that actually sends the events through the Linux kernel event subsystem?
The resulting spike-test (using uinput) is checked into the listener project. It seems to work. Yay! I'm able to send text to every application I've tested so far (which is only 4 apps, but they are actually most of the apps I care about, Konsole, Eric and Chromium). The same mechanism should allow injecting mouse/pointer events as well, should I decide I need that.
The idea is that a (root) listener daemon will (after dropping privileges) create a named pipe that users in a given group (maybe plugdev) can connect to and dump formatted "play these keys" messages into the daemon. Obviously that might be an issue for multi-user systems, but I'm not really all that concerned about them, I see this as a "voice keyboard" driver, and giving a user access to talk to your machine is the same basic access as sitting at the keyboard. (What about web-sites etc. that play sound? I don't know what to do about that...).
Of course, if we want particularly useful interactions with apps, we'll likely need at least the ability to do "start undo action" and "finish undo action" signals (and likely a far more involved api to provide context-sensitive language models)... but basics first.
I also looked at things such as pykeys, which uses X extensions/events to send messages, but as it's GPL licensed, didn't actually work when I tried it, and won't work in a post-X world I went with rolling my own.
I could have used python-evdev as well, but for something as trivial as this it seemed like overkill.
[Updated] to point out that uinput is what's getting used...