Reply To: Performance and Recognition Tips needed for small vocabularies

June 15, 2014 at 10:34 am #1021680

Politepix

Hi Mike,

1. What are the tradeoffs of switching to different (but smaller) languages (word arrays) dynamically? (stop listening, load the new language, start listening). If we are in a situation where we know there are really only 3 options, it it worth the effort to switch to a smaller recognition set? And continually do this as the user drills down the hierarchy of possible choices.

There are really no tradeoffs here. You don’t need to stop and start since OpenEars can switch language models while recognition is in progress so it doesn’t noticeably consume time or resources. If you have some kind of OOV rejection like Rejecto, it is always better to present the most appropriate vocabulary choices without anything extraneous.

2. How do we interpret the hypothesis array: Here’s the result of me saying the letter ‘H’ {
Hypothesis = “DEE ___REJ_CH”;
Score = “-32346″;
},
Does this mean DEE was the best guess and CH was a guess that was rejected? Sometimes I get a set of only rejected items.

Well, this isn’t the format that OpenEars delivers this info in, and it doesn’t deliver the rejected phonemes at all unless you specifically request that they are delivered by configuration. So, I think there may be some confusion here between the way that OpenEars returns this information and the way this particular app has been implemented, also underscored by the other question in which there is some audio session behavior that looks like it emerges from app elements that aren’t related to OpenEars. It would be difficult to answer these kinds of questions which are about the particular elements of this app’s OpenEars implementation without knowing anything about that. But the result shown above between the curly braces isn’t how hypotheses are returned out of the box using OpenEars with Rejecto.

3. For recognizing letters, I seem to get better recognition if I use the spelling of the letter (‘EYE’ for I ; ‘PEA’ for P ; ‘CUE’ for Q ; etc). Is this just happenstance or should I always do this?

I think in this case perceived differences are coincidental – both “EYE” and “I” pick up the identical pronunciations from the master phonetic dictionary in the acoustic model. I believe there should be no difference between using “P” and “PEA” since they will both be phonetically transcribed as “P IY” with no alternate pronunciations by LanguageModelGenerator, and this should be the case for all letter sounds (that they are pulled from the same source, not that they have no alternative pronunciations – some may).

The biggest recognition issue you are going to encounter is the nature of this single letter recognition task, which is extremely challenging for all speech recognition engines. If you search this forum for the keyword “letters” you can read a number of past discussions of it.