Performance and Recognition Tips needed for small vocabularies

This topic has 3 replies, 2 voices, and was last updated 9 years, 10 months ago by Halle Winkler.

Viewing 4 posts - 1 through 4 (of 4 total)

Advertisement: “RuleORama is an OpenEars™ plugin that lets you create rules-based grammars for fixed phrase recognition, fast enough for RapidEars!”

Author

Posts
June 14, 2014 at 4:24 pm #1021677

mikeh
Participant

Hello,
I am hoping to use OpenEars + Rejecto. For my situation, when a user issues a command/letter/word, the options are pretty small. Can anyone shed some light on possible strategies to help with recognition performance:

1. What are the tradeoffs of switching to different (but smaller) languages (word arrays) dynamically? (stop listening, load the new language, start listening). If we are in a situation where we know there are really only 3 options, it it worth the effort to switch to a smaller recognition set? And continually do this as the user drills down the hierarchy of possible choices.

2. How do we interpret the hypothesis array: Here’s the result of me saying the letter ‘H’ {
Hypothesis = “DEE ___REJ_CH”;
Score = “-32346”;
},
Does this mean DEE was the best guess and CH was a guess that was rejected? Sometimes I get a set of only rejected items.

3. For recognizing letters, I seem to get better recognition if I use the spelling of the letter (‘EYE’ for I ; ‘PEA’ for P ; ‘CUE’ for Q ; etc). Is this just happenstance or should I always do this?

If I have a single letter in my word set are the phonemes automatically added? I have no sound recognition experience, so I don’t know if the concepts of phonemes are even used here.

thanks so much!

mike

June 15, 2014 at 10:34 am #1021680

Halle Winkler
Politepix

Hi Mike,

1. What are the tradeoffs of switching to different (but smaller) languages (word arrays) dynamically? (stop listening, load the new language, start listening). If we are in a situation where we know there are really only 3 options, it it worth the effort to switch to a smaller recognition set? And continually do this as the user drills down the hierarchy of possible choices.

There are really no tradeoffs here. You don’t need to stop and start since OpenEars can switch language models while recognition is in progress so it doesn’t noticeably consume time or resources. If you have some kind of OOV rejection like Rejecto, it is always better to present the most appropriate vocabulary choices without anything extraneous.

2. How do we interpret the hypothesis array: Here’s the result of me saying the letter ‘H’ {
Hypothesis = “DEE ___REJ_CH”;
Score = “-32346″;
},
Does this mean DEE was the best guess and CH was a guess that was rejected? Sometimes I get a set of only rejected items.

Well, this isn’t the format that OpenEars delivers this info in, and it doesn’t deliver the rejected phonemes at all unless you specifically request that they are delivered by configuration. So, I think there may be some confusion here between the way that OpenEars returns this information and the way this particular app has been implemented, also underscored by the other question in which there is some audio session behavior that looks like it emerges from app elements that aren’t related to OpenEars. It would be difficult to answer these kinds of questions which are about the particular elements of this app’s OpenEars implementation without knowing anything about that. But the result shown above between the curly braces isn’t how hypotheses are returned out of the box using OpenEars with Rejecto.

3. For recognizing letters, I seem to get better recognition if I use the spelling of the letter (‘EYE’ for I ; ‘PEA’ for P ; ‘CUE’ for Q ; etc). Is this just happenstance or should I always do this?

I think in this case perceived differences are coincidental – both “EYE” and “I” pick up the identical pronunciations from the master phonetic dictionary in the acoustic model. I believe there should be no difference between using “P” and “PEA” since they will both be phonetically transcribed as “P IY” with no alternate pronunciations by LanguageModelGenerator, and this should be the case for all letter sounds (that they are pulled from the same source, not that they have no alternative pronunciations – some may).

The biggest recognition issue you are going to encounter is the nature of this single letter recognition task, which is extremely challenging for all speech recognition engines. If you search this forum for the keyword “letters” you can read a number of past discussions of it.

June 16, 2014 at 7:47 pm #1021686

mikeh
Participant

That hypothesis is coming from this method (via the sample project):
– (void) pocketsphinxDidReceiveNBestHypothesisArray:(NSArray *)hypothesisArray
{ // Pocketsphinx has an n-best hypothesis dictionary.
NSLog(@”hypothesisArray is %@”,hypothesisArray);
}

Thanks for the tip. After searching there is reference to being able to modify (by hand?) your dictionary.

1. It seems that this file is written each time the app starts. Is there a way to edit it and load it — rather than just write it?

2. Where can I learn about the format of this file? The ‘IY’ suffix seems to be added to many of my words. What does IY mean>

Thanks for handling my basic questions.

June 16, 2014 at 8:10 pm #1021687

Halle Winkler
Politepix

That hypothesis is coming from this method (via the sample project):
– (void) pocketsphinxDidReceiveNBestHypothesisArray:(NSArray *)hypothesisArray
{ // Pocketsphinx has an n-best hypothesis dictionary.
NSLog(@”hypothesisArray is %@”,hypothesisArray);
}

OK, thank you for clarifying. I don’t recommend using n-best in combination with Rejecto since n-best is designed to return most-confident hypotheses in order of confidence, while Rejecto deals with confidence differently, by not returning results for low-confidence recognitions. Using them together will tend to undermine both of their separate benefits so I’d pick one or the other for good results.

1. It seems that this file is written each time the app starts. Is there a way to edit it and load it — rather than just write it?

It would be better to edit the master phonetic dictionary so that it contains the phonetic transcriptions you want to have loaded into your dictionary file, and to continue to generate your language models dynamically so that they pick up the modifications from the master phonetic lookup dictionary. Here is some info about doing that, which should also shed some light on your second question:

https://www.politepix.com/2012/12/04/openears-tips-and-tricks-5-customizing-the-master-phonetic-dictionary-or-using-a-new-one/
Author

Posts

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.