Plumbing Life's Depths - Listener v2, using DeepSpeech for Coding on Linux

So a few years ago, as I was working on a project called Listener that used PocketSphinx to allow dictation into the Eric IDE, I got some feedback suggesting that the only practical solution these days would be to use a neural network based dictation engine. That led me down a rabbit hole from which I'm finally emerging.

Yes, the neural network based solution definitely is producing better results. I've also switched IDEs in the meantime, and that's made me realise that using Eric (or any IDE) as the integration point is probably the wrong approach (for me). Given that 90% of the code in Listener v1 was assuming the phoneme-based PocketSphix model, or the Eric APIs the code was largely a write-off...

So Listener v2 is based on Mozilla Deep Speech and integrates at the Linux desktop level, rather than at the IDE level. It is still very much a technical demo, but it does seem to work pretty well as a general service for typing by voice on Linux.

At the moment it can type into all of:

VSCode

Chrome

Kate

Konsole

and likely most other Qt or Gtk based programs, as it uses IBus for the text entry method. It has a backup UInput mechanism from the original project, but it's not yet hooked up.

The focus, for me, is to allow me to code by voice to reduce RSI strain, rather than to support completely keyboard-free operation. Not opposed to patches to support such behaviour, just isn't my personal need.

Listener has a bunch of common dictation shortcuts (capitalisation, punctuation, some code-specific shortcuts etc) defined in text files (with the goal of making a GUI eventually). However it is still using the generic (upstream) language model, so it often misses the commands because it thinks the hyper-common command triggering patterns are extremely rare.

I'm currently training a corpus to produce a language model for coding by voice, which should help with that (but wow is it slow, taking 47 minutes so far to pre-process 400MB of source code, with just 5 times more code to process in this corpus), but I'll have to look into contextualised/biased scoring to make it particularly robust and useful.

I'm also thinking how to easily let programs set contextual hints (e.g. this is python context, or here is the list of autocomplete choices), with the likely result being a DBus service that can be triggered by individual IDE plugins.

Listener v2, using DeepSpeech for Coding on Linux
Written by Mike on June 7, 2020 in Snaking.

Comments

Pingbacks

Categories

Authors

Recent entries

Recent comments

Random entries