How can I increase accuracy?

Home Forums OpenEars How can I increase accuracy?

Viewing 6 posts - 1 through 6 (of 6 total)

  • Author
    Posts
  • #9637
    oganix
    Participant

    I’m new to OpenEars and trying to the get numbers 1-100 recognized very well. To be clear, these numbers make up my whole vocabulary. So I have created a language model with the web tool and getting ok results. How can I go about improving the accuracy? Are any of the following can help me with that:

    1) Trying to create a better language model by using a different toolkit such as SRILM MITLM or IRSLM
    2) Build a acoustic model model with just the numbers 1-10
    3) Using LanguageModelGenerator
    4) Using JSGF instead ARPA

    or is there any other thing I can try?

    Thanks in advance

    #9638
    Halle Winkler
    Politepix

    Hi,

    Yes, recognizing numbers in isolation seems to be a difficult task for speech recognition engines.

    1) Trying to create a better language model by using a different toolkit such as SRILM MITLM or IRSLM

    3) Using LanguageModelGenerator

    Most language modeling software uses a set or subset of a few existing algorithms, so I don’t think you need to do a lot of experimentation there. The LanguageModelGenerator uses another good package so you could probably just try out whether its output is preferable and then call it a day.

    Build a acoustic model model with just the numbers 1-10

    Don’t you need 1-100? But you might want to investigate this approach and/or adapting the existing model with your new data: http://cmusphinx.sourceforge.net/wiki/tutorialadapt

    It seems like the task of creating an acoustic model that just recognizes 1-100 with a number of different voice contributors and accents is constrained enough to be feasible for an app project.

    Using JSGF instead ARPA

    In my opinion after some recent experimentation, JSGF is too slow for a good UX. Other developers do use it so as I said this is a matter of opinion. You can use the garbage loop approach for out of vocabulary rejection as well with ARPA as with JSGF: http://sourceforge.net/p/cmusphinx/discussion/help/thread/cefe4df3 which could be something that improves your results if the issue is too many false positives rather than too many false negatives or transposed recognitions.

    #9643
    oganix
    Participant

    Thanks a lot for the quick response. Yes above I meant 1-100 not 1-10

    #9644
    Halle Winkler
    Politepix

    No problem. There is another potential complication that isn’t immediately obvious but that I’ve been trying to make a point of mentioning more frequently here, which is that a lot of developers specify apps with the idea that the device can be pretty far away from the user, but this actually gives the device speech recognition task an additional disadvantage that a desktop speech recognition application would be unlikely to have: a big mismatch between the design of the available microphone and the use that is being made of it. You can even see this with Siri if you open Notes and do dictation from a distance; return time from the server will get slower and accuracy will decrease because the iPhone mic is designed to be spoken directly into and to reject “background noise” which might be your user if they are far enough away and there are competitive sounds.

    This isn’t as big a deal with command and control language models/grammars, but as soon as you’re past 20 words or so you can start to see an impact. So another approach is to see if you can educate your users to not put too much distance between themselves and the device during app use.

    #9645
    oganix
    Participant

    Good point. Thanks again.

    #1021813
    Halle Winkler
    Politepix

    Just wanted to follow up here that there is now a great method for doing dynamic JSGF grammars built into OpenEars: https://www.politepix.com/2014/04/10/openears-1-7-introducing-dynamic-grammar-generation/

Viewing 6 posts - 1 through 6 (of 6 total)
  • You must be logged in to reply to this topic.