Way to see phonemes OpenEars heard

Tagged: dic, dictionary, language model, phonemes, phones

This topic has 3 replies, 3 voices, and was last updated 12 years, 7 months ago by Halle Winkler.

Viewing 4 posts - 1 through 4 (of 4 total)

Advertisement: “RuleORama is an OpenEars™ plugin that lets you create rules-based grammars for fixed phrase recognition, fast enough for RapidEars!”

Author

Posts
September 12, 2011 at 3:54 pm #7598

matth
Participant

I am using OpenEars in an iOS app to recognize individual numbers (ONE, TWO, THREE, etc.) spoken by children, but I’m getting pretty poor recognition (using OpenEars 0.912, mostly on an iPad). About my only hope of getting this to work better is to put in additional entries in my .dic file for alternate pronunciations of the words (which I’ve done a little) in hopes of “teaching” it how a kid would say those words. However, I really don’t know what to put for the alternate pronunciations.

Is there any way to see what set of phonemes OpenEars thought it heard? Then I could just show that somewhere as the app is being used and I could see what pronunciations I should add for each of the numbers.

Thanks in advance for any insights.

September 12, 2011 at 4:37 pm #7599

Halle Winkler
Politepix

Raw phonemes is something that I think only Sphinx 3 does, and IIRC with several caveats. I believe that the task you are doing is known to be very difficult to get good results for. There is an acoustic model called tidgits in [OPENEARS]/CMULibraries/pocketsphinx-0.6.1/model/hmm/en/tidigits with an accompanying language model in [OPENEARS]/CMULibraries/pocketsphinx-0.6.1/model/lm/en that I think is specifically oriented towards recognizing numbers that you might want to try instead of hub4wsj_sc_8k and your custom LM, although I’ve never used it myself so I can’t make any promises.

September 13, 2011 at 3:18 pm #7604

Joseph S. Wisniewski
Participant

I do this as a diagnostic technique.

* Build a dictionary with 40 words, each word being just one of the CMU phonemes
* Build a language model or FSG where each of the words can follow any other word

Fair warning, the results will be very strange.

tidigits is a very “clean” grammar, it works well with fluent speech. And, like most 8k models, it’s most effective on adult males. Kids work best with 16k models (so do women). I’d suggest switching over to VoxForge 0.4, from the CMU site. To make this work well, though, you really need models built from kid speech.

You could try model adaptation.

September 13, 2011 at 4:04 pm #7606

Halle Winkler
Politepix

The issue with the Voxforge models is that they aren’t license-compatible with App Store distribution.
Author

Posts

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.