Optimization for short utterances

This topic has 9 replies, 4 voices, and was last updated 11 years, 7 months ago by Halle Winkler.

Viewing 10 posts - 1 through 10 (of 10 total)

Advertisement: “Don't want to wait for pauses before receiving speech recognition results? try RapidEars!”

Author

Posts
August 2, 2012 at 10:21 am #10724

Elsa
Participant

Hello,
I’m using Open Ears to recognize essentially words one by one and no complete sentences. Thus, I regularly have short utterances (less than 1-2 seconds) and sometimes pocketsphinx isn’t going into the decoding process or it’s not very responsive (starts decoding a bit late).
I’m aware that my use case is not the optimal one for pocketsphinx, but I was wondering if it was possible to optimize it for this type of utterances ?
I know that in earlier version of Open Ears it was possible to set kSecondsOfSilenceToDetect so that pocketsphinx would get into decoding faster, but I can’t find it in the last version.
Thank you for your help!

August 2, 2012 at 4:56 pm #10725

Halle Winkler
Politepix

Sure, check out the float property of PocketsphinxController “secondsOfSilenceToDetect”. I just moved it into the class so you could set it programmatically.

August 6, 2012 at 9:55 am #10775

Elsa
Participant

Cool thank you ! It is definitely faster now.
Do you have any other advices to optimize for short utterances ? Sometimes it’s hard to get Sphinx into the decoding process, I have to repeat several times the same word or to speak very close to the microphone. Maybe it’s a microphone configuration issue ?
My app runs on iPad.

August 6, 2012 at 12:15 pm #10776

Halle Winkler
Politepix

You could try RapidEars and see if it helps if you’re open to non-free solutions. If I recall correctly, your implementation isn’t a supported method, so you might have audio session problems.

August 6, 2012 at 1:35 pm #10778

Elsa
Participant

Ok thank you, I’ll give it a try !

August 26, 2012 at 10:30 pm #10884

woodyard
Participant

I’m doing something similar – what value would you recommend and what values are acceptable? The default is one correct? Can you use something like .5?

August 27, 2012 at 6:43 am #10887

Halle Winkler
Politepix

I would recommend reducing it and doing some user testing to see what the minimum is for your application before you have an issue with utterances being cut off.

September 27, 2012 at 10:50 pm #11362

tarantoga
Participant

Was trying to lower secondsOfSilenceToDetect to very low values but it doesnt seem to work at all.
In log there is always:
2012-09-27 23:47:18.423 TestOpenEars[1650:907] Pocketsphinx has detected a second of silence, concluding an utterance.
And I would really like to have only half second delay or maybe even 0.33
Is it possible? Or to get it paid plugin is needed?

September 28, 2012 at 6:41 am #11367

Halle Winkler
Politepix

The log always says “a second of silence” because that’s just what an NSLog statement says in the sample app. It isn’t related to the functionality of the property secondsOfSilenceToDetect and the log statement doesn’t come from the framework.

secondsOfSilenceToDetect defaults to .7 seconds currently and if you change it it will be shorter or longer, but the difference between .7 seconds and for instance .33 isn’t going to be a big perceptual difference (although the very short delay can cause issues since any intermittent noise followed by a pause can trigger recognition) because you will still have the following sequence of events which all use time: the speech continuing until to completion, the silence after the complete speech, and then the time to process the complete speech.

RapidEars doesn’t use a period of silence at all because it recognizes speech while the speech is in-progress rather than performing recognition on a completed statement (for instance, if you say “go right” it will first return the live hypotheses “go” and then “go right” as you are in the process of speaking the phrase — RapidEars doesn’t wait for a silence period to recognize). For your goal of using OpenEars-style speech recognition that only happens after a silence but with a shorter silence period it isn’t necessary for you to use RapidEars. But, since OpenEars defaults to a short period of silence out of the box, the differences from shortening it more than the default aren’t going to be dramatic; expect it to be a smaller change in the user experience.

September 28, 2012 at 1:17 pm #11371

Halle Winkler
Politepix

I’ve fixed the NSLog statement for the next version so the sample app doesn’t create confusion about the framework behavior and updated the online documentation and tutorial.
Author

Posts

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic.