I am using OpenEars in an iOS app to recognize individual numbers (ONE, TWO, THREE, etc.) spoken by children, but I’m getting pretty poor recognition (using OpenEars 0.912, mostly on an iPad). About my only hope of getting this to work better is to put in additional entries in my .dic file for alternate pronunciations of the words (which I’ve done a little) in hopes of “teaching” it how a kid would say those words. However, I really don’t know what to put for the alternate pronunciations.
Is there any way to see what set of phonemes OpenEars thought it heard? Then I could just show that somewhere as the app is being used and I could see what pronunciations I should add for each of the numbers.
Raw phonemes is something that I think only Sphinx 3 does, and IIRC with several caveats. I believe that the task you are doing is known to be very difficult to get good results for. There is an acoustic model called tidgits in [OPENEARS]/CMULibraries/pocketsphinx-0.6.1/model/hmm/en/tidigits with an accompanying language model in [OPENEARS]/CMULibraries/pocketsphinx-0.6.1/model/lm/en that I think is specifically oriented towards recognizing numbers that you might want to try instead of hub4wsj_sc_8k and your custom LM, although I’ve never used it myself so I can’t make any promises.
* Build a dictionary with 40 words, each word being just one of the CMU phonemes
* Build a language model or FSG where each of the words can follow any other word
Fair warning, the results will be very strange.
tidigits is a very “clean” grammar, it works well with fluent speech. And, like most 8k models, it’s most effective on adult males. Kids work best with 16k models (so do women). I’d suggest switching over to VoxForge 0.4, from the CMU site. To make this work well, though, you really need models built from kid speech.