Voice Recognition on the iPhone

One of my projects has the potential to get a real usability boost through voice recognition. Voice recognition is a utopia of mobile user interface, and like every utopia, it turns out to not work as well as you’d like in real life. I spent some time looking into Open Source Voice Recognition packages and if you’re looking to use Voice Recognition for what is called “Very Large Vocabularies”, you are almost guaranteed to be disappointed. If you are looking for Voice Commands, the default performance may be acceptable for what you are looking for.

Voice Recognition works by transforming each sample in an audio stream, guessing what pronunciation that sample might represent, and in turn, what word it might represent. This involves both an Acoustic Model, and a Language Model to guide probabilities, and a number of algorithms to determine the result based off of those probabilities. One key thing here is that bad results are usually not code bugs, but training deficiencies. The Acoustic Models are trained by hundreds of hours of voice, but the free models are not as robust as comercial offerings. If you want to do something about it, go to Vox Forge and contribute some audio. Restricting the language model goes a long way towards improving the results, as does training the Acoustic Model to your voice. For my project, restricting the language model was the only option since my Acoustic Models need to be speaker independent.

I investigated 2 packages, Pocket Sphinx and Julius. Pocket Sphinx is the evolution of a long line of Voice Recognition packages based in C and developed by people in CMU. Julius is a package developed most actively for the Japanese Language, but is still language agnostic.

I had fun cleaning up the Mac experience for both of these projects. Julius built fine but the CoreAudio driver was broken. Pocket Sphinx had a few compile errors under XCode and no clear way to build iPhone friendly libraries. So I submitted a new driver based off of the Audio Queue technology and submitted a patch that lets Pocket Sphinx build Mac and iPhone friendly binaries. It was great to submit patches to both of these projects. I still have an Audio Queue driver to write for Pocket Sphinx though!

So more on my project soon, but Pocket Sphinx is the package that I’m going to push forward with. This is largely due to the fact that the default Acoustic Models appeared to perform better than Julius. I’m hoping to use this for aligning text and audio, not Voice to Text or Voice commands. My challenge now is to see if this process helps even if it is only 60% accurate. But this spike solution is done, time to flesh out the rest of the code!


8 Responses to “Voice Recognition on the iPhone”

  • Kevin Butler Says:

    Nice – I’m also interested in doing some voice recognition work on the iPhone, and had come to basically the same conclusion, though you’re much further along than I am – I’ve just been reading so far.

    Do you know if your code changes to Pocket Sphinx have been accepted? Would it make sense to toss them into a github project, or better to just keep with the sphinx sourceforge project?

    kb

  • admin Says:

    Hey Kevin,
    Glad you found this helpful. If you have any other questions I’d love to see if I have any answers. Here are the two patches I submitted. They said they patched it in but that was only a few days ago.

    http://dl.dropbox.com/u/2774167/pocketsphinx.iphone.diff
    http://dl.dropbox.com/u/2774167/sphinxbase.iphone.diff

  • Gerd Van Zegbroeck Says:

    If you ever try to build the latest (as on 2010/04/19) nightly builds and you find out you get a lot of errors and no lib, try the following:

    make clean
    autoheader
    ./autogen.sh
    ./build_for_iphone.sh simulator
    ./build_for_iphone.sh device

    that should work!
    Thanks Brian!

  • Stephen Hu Says:

    great stuff, i think this is going to be a huge opportunity for future apps on iphone. wondering if there’s microphone capabilities for ipod touch…

    i was searching on google and stumbled across a project called Ceedvocal.

    not sure what library ceedvocal is based on, but ideally i only need a few voice patterns in a library, perhaps that could make the packaging smaller.

    will be checking in your progress.

  • admin Says:

    I have not tried out ceedvocal, so I can’t comment, but there’s a lot of value in the product they’ve created. lots of pieces to tie together, and having that behind a nice API sounds great.

    I’m curious about how their acoustic models are, because other than that it’s just wrapped up open source projects. But acoustic models are where the value really is so it could be great!

    Brian

  • Simon Burfield Says:

    Hello there, really interested in getting a voice to text example, would anyone be able to share there code. I want to see if I can mix this up with TTS :)
    Thanks Simon

  • admin Says:

    Hi Simon,
    I wrote a wrapper around pocket sphinx and flite that has the basics of Voice to text and TTS. One bug is on XCode 3.2.3 you want to set ‘architectures’ to armv6 in order to build right – mailing list has more examples.

    http://kingsoftwaredesigns.com/2010/05/vocalkit-objective-c-wrapper-for-speech-recognition/

Leave a Reply