February 17, 2013 at 9:02 pm #1015656
I’ve tried implementing the RapidEars demo. I did as instructed and it seems to work.
on “rapidEarsDidDetectLiveSpeechAsWordArray” method I NSLog the phrase being spoken. Basically in my app I need to recognise one word each time.
What I noticed is that this method is fired more then once for one word (i.e. if I say “WORD” and then silence in the console I see this method is fired 2-3 times).
How can I make it so it will fire once for each word being spoken?
ThanksFebruary 17, 2013 at 9:21 pm #1015657
RapidEars works a bit differently than stock OpenEars and needs a slightly different approach to programming. With OpenEars, it waits until speech is complete and returns a single hypothesis, so you can just assume that there will be a one to one relationship with receiving a hypothesis and your programmatic reaction to receiving a hypothesis. RapidEars is continuously processing when it has started to detect speech, meaning that it will keep reasserting the present hypothesis until it changes or the utterance ends, because it is continuously re-scoring the hypothesis (seeing if it becomes more confident in a different hypothesis, or more confident in the current hypotheses, and seeing if more words are spoken after the current hypotheses, etc).
You can work with this style in a couple of ways. One way is just to react to your keyword the first time you catch it and not worry about the rest of the hypotheses , i.e. short circuit the listening process when you get a match. Another way (I think this is closer to your request) would be to call a “keywordDetected:” method in the callback, but only call it for a new word (meaning you store the hypothesis and only forward the method if the new hypothesis doesn’t match the stored hypothesis, meaning it is a different hypothesis. Does that make sense?February 17, 2013 at 9:38 pm #1015659
Thanks for the quick reply.
I’m trying to build something that the user will have to repeat the same word for more then one time (for example saying “WORD” one time and on a different opportunity he will say it 2 time “WORD WORD”). but it’s always the same one word.
So my problem is that when this method is called, I have no idea whether it was called for one word or because the user said that word more then once…
Is there a method I can call that once that method is entered to tell RapidEars to stop listening and stop improving the current hypothesis ?
ThanksFebruary 17, 2013 at 9:44 pm #1015660
A hypothesis should appear as “word word” if they say “word word” and just as “word” if they just say “word”. So in that case, I think you’d be fine just suspending listening when you catch and react to the hypothesis you are waiting for the first time. Would that fit your requirement?February 17, 2013 at 9:46 pm #1015661
It doesn’t. If I say “WORD WORD” it prints out something like this (varies from time to time):
WORD WORDFebruary 17, 2013 at 10:01 pm #1015662
That’s what I would expect – first the first word in the utterance is spoken, then the second, and the hypothesis grows along with the number of words spoken. Your utterance starts with a single word and then a second is added, so the hypothesis matches that.
Something you could try if you want fewer callbacks is to experiment with the following settings in your initialization code for the PocketsphinxController+RapidEars object:
To use a slightly-less “live” method you can use these settings:
Alternately, you can ignore partial hypotheses (like your initial ones that just say “WORD” once) and wait for the final hypotheses in the rapidEarsDidDetectFinishedSpeechAsWordArray callback, but request them faster using these settings:
[self.pocketsphinxController TRUE];February 17, 2013 at 10:06 pm #1015663
If I just wait for the final hypothesis then it’s too slow and I love the live recognition effect – in that case I might as well just use OpenEars without the plugin. But that doesn’t serve my goal to have real time speech… kinda catch 21 sit…February 17, 2013 at 10:16 pm #1015664
Understood, but I’ve also recommended a couple of other approaches — you could keep using partials but use the higher-quality algorithm which will return fewer times (but be more accurate) by setting:
Those will still return partials, so that isn’t the OpenEars-style approach.
Or you could just ignore partials that don’t match the utterance you are trying to detect, and short circuit your listening at the time that you receive a partial that matches the utterance you are trying to detect. There’s no requirement to display every partial to the user or run a method based on every one; you can also just check in the callback to see if it is the matching hypothesis and only react to the first one that is.
My other suggestion was to compare an incoming hypothesis to the previous one and only display/invoke new methods when it no longer matches the hypothesis that came before, i.e. throwing out any repeated hypotheses for logical purposes.
So there are a few ways that you can get the results you are looking for — it’s really a question of what is the best approach for your application goals.
- You must be logged in to reply to this topic.