Friday, May 04, 2007

Voice rec keyboard

From the Dept. of Ideas That Won't Go Away Until I Write Them Down: Can someone tell me why this idea won't work?





Basically, I'm proposing a peripheral that enumerates as an USB keyboard but uses voice recognition instead of keypresses for input. Put a good microphone, a hefty Blackfin, and some EEPROM (or have an SD card slot, so dictionaries can be swapped in and out) on a board, perhaps with some (chordable) buttons for additional input. Load the open-source speech recognition engine PocketSphinx on the Blackfin. The processor is dedicated to speech recognition, taking the computational load off the laptop it's plugged into. Note that this is a technologically naive view: I'm not actually sure this is easily technologically feasible (I can find out, but don't have time to right now.)



In order to use it, you'd speak into the microphone (or press a button while speaking into the microphone) when you'd normally type; the Blackfin would translate this to text and either push it out over USB directly or stream the data to a PIC that logically chunks words and translates them into commands ("enter" would get turned into a newline, etc.) and sends them out over USB.



Okay. I think I can stop thinking about this now. Back to work.

3 comments:

nikki said...

I don't really know much about the rest, but you'd have to make some big stride in speech recognition and/or hyper-train your speech recognition system, which is a pain in the neck if multiple people are going to use it.

Anonymous said...

Though I wonder if we could make the problem easier by just trying to recognize individual keys instead.

Perhaps making the problem even easier by spacing things out; there's no reason why the sound for an n needs to be "enn", or that the sound for a semicolon needs to be "semi colon". For us hacker types we'd want ; to be something quick and easy and final sounding. :-)

I'd find it nice when I've spent hours coding and my wrists feel a little sore.

Anonymous said...

I'd also point out that the acoustic models that come with sphinx are for native, non-impaired English, so ditto the bit about having to retrain it. Also, I'm kind of starting to lean away from fully embedded speech and toward client-server speech. Get Sphinx running on an Opteron server, let your blackfin handle the up-front processing, then transmit cepstral coefficients one way and text the other. It wouldn't be a true USB keyboard... *unless* you're really sexy and you have the speech device use its own network capability (bluetooth?) to circumvent the host and reach the server.