Interpretation of Dictation

Sometimes as you're developing a project it's easy to lose site of the end goal. I've spent so long on giving Listener context setup, code-to-speech converters, etc. that the actual point of the whole thing (i.e. letting you dictate text into an editor) hasn't actually been built. Today I started on that a bit more. I got the spike-test for a dbus service integrated into the main GUI application, so that you can actually get events from the same Listener that's showing you the app-tray icon and the results. I disabled the sending of "level" events across DBus, as there's a ridiculous number of them and client apps likely don't need them (they're just used to show a level indicator during speech).

Then I started working on the actual "interpret the utterances" code, which will be user-driven eventually, but for now I'm just hard-coding the rules.  The easiest set are the punctuation bits, where generally the dictation is ',comma' and directly maps to ',' with a few bits and bobs for getting the spaces fixed up. The next easiest are bits such as "delete spaces between successive open-parens"... and that's where I stopped today. Currently this is regex-based; eventually it's going to need to have a bit more intelligence (meta-dictation such as "cap" or "all-caps"), but I don't want to build *too* much intelligence into it; I voice-coded successfully for years with a system that was not much more involved than what I've got now, and I really should get to the "integrate the voice dictation into my editor and start really working with it to work out the kinks" phase soon.


Comments are closed.


Pingbacks are closed.