Best approach for single word recognition

This topic has 1 reply, 2 voices, and was last updated 8 years, 7 months ago by Halle Winkler.

Viewing 2 posts - 1 through 2 (of 2 total)

Advertisement: “RuleORama is an OpenEars™ plugin that lets you create rules-based grammars for fixed phrase recognition, fast enough for RapidEars!”

Author

Posts
September 13, 2015 at 3:44 pm #1026788

oldprogrammer
Participant

I’ve been playing around with OpenEars and the Rejecto plugin to get to the right level of accuracy for single word recognition. The purpose is educational, so I would say the level of recognition I’m expecting is this: as a parent, would you think your child said the word correctly enough? If the child said KATE instead of CAT, that shouldn’t pass. If the child said HAF instead of HAV, that would be okay. I know that’s a difficult balance to achieve.

1. Is the default probabilistic model or JSGF model better for detecting a single word?

2. Should I be limiting the language model to a single word, or does the language model need additional words to compare against? I’m assuming that this is kind of what Rejecto is doing for me, though at the phoneme level.

3. Are there any benefits to reducing the size of the dictionary? If I’m only looking to match a single word, should the dictionary (with the phonemes) also be limited to the single word?

4. Does it help to add similar sounding words to the language model, and then reject them when the hypothesis is received?

5. The Rejecto “withWeight” parameter: does a lower number mean less rejection? The wording in the documentation seemed precise, but I didn’t quite follow.

6. I’ve read in other areas about training the speech detector. Is that something that can help? Any advice or pointers for going down that road?

A lot of questions, I know. Thanks for your help.

September 13, 2015 at 4:40 pm #1026789

Halle Winkler
Politepix

Welcome,

In my opinion, Pocketsphinx can’t really be used to qualitatively judge children’s pronunciation in the way you’re describing, sorry – other factors including but not limited to environmental circumstances, device proximity, and the big differences between speech at different developmental stages will end up with results that are not clear enough to qualify in a way that is positive. It’s a requirement to work very closely with an academic or professional expert in childhood speech development when considering any software project with a goal of giving any kind of automated feedback on developing speech.

I can answer your questions in general terms that would apply to any app:

1. Is the default probabilistic model or JSGF model better for detecting a single word?

It doesn’t matter much for one word, but you can only use Rejecto with a language model.

2. Should I be limiting the language model to a single word, or does the language model need additional words to compare against? I’m assuming that this is kind of what Rejecto is doing for me, though at the phoneme level.

That’s right. If you want to detect one word, just use one word, and if you want to reject other words and sounds, Rejecto will handle that.

3. Are there any benefits to reducing the size of the dictionary? If I’m only looking to match a single word, should the dictionary (with the phonemes) also be limited to the single word?

Which dictionary? OpenEars creates your phonetic dictionary so you don’t need to do anything about that dictionary. If you are asking about the master lookup dictionary, you can’t reduce its content and it isn’t necessary.

4. Does it help to add similar sounding words to the language model, and then reject them when the hypothesis is received?

In cases where you have a high degree of confidence as the app designer about what specific other words will be detected, although this is not usually the case.

5. The Rejecto “withWeight” parameter: does a lower number mean less rejection? The wording in the documentation seemed precise, but I didn’t quite follow.

That’s right, higher weight means more will be rejected and vice versa.

6. I’ve read in other areas about training the speech detector. Is that something that can help? Any advice or pointers for going down that road?

In very general terms, it can be helpful to do adaptation when dealing with a userbase where you know in advance that they aren’t well-represented in the acoustic model data. However, in this case, this is one of those things that requires an expert in childhood speech development to work with in order to do the right thing.
Author

Posts

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.