Optimization for short utterances

Home Forums OpenEars Optimization for short utterances

Viewing 10 posts - 1 through 10 (of 10 total)

  • Author
    Posts
  • #10724
    Elsa
    Participant

    Hello,
    I’m using Open Ears to recognize essentially words one by one and no complete sentences. Thus, I regularly have short utterances (less than 1-2 seconds) and sometimes pocketsphinx isn’t going into the decoding process or it’s not very responsive (starts decoding a bit late).
    I’m aware that my use case is not the optimal one for pocketsphinx, but I was wondering if it was possible to optimize it for this type of utterances ?
    I know that in earlier version of Open Ears it was possible to set kSecondsOfSilenceToDetect so that pocketsphinx would get into decoding faster, but I can’t find it in the last version.
    Thank you for your help!

    #10725
    Halle Winkler
    Politepix

    Sure, check out the float property of PocketsphinxController “secondsOfSilenceToDetect”. I just moved it into the class so you could set it programmatically.

    #10775
    Elsa
    Participant

    Cool thank you ! It is definitely faster now.
    Do you have any other advices to optimize for short utterances ? Sometimes it’s hard to get Sphinx into the decoding process, I have to repeat several times the same word or to speak very close to the microphone. Maybe it’s a microphone configuration issue ?
    My app runs on iPad.

    #10776
    Halle Winkler
    Politepix

    You could try RapidEars and see if it helps if you’re open to non-free solutions. If I recall correctly, your implementation isn’t a supported method, so you might have audio session problems.

    #10778
    Elsa
    Participant

    Ok thank you, I’ll give it a try !

    #10884
    woodyard
    Participant

    I’m doing something similar – what value would you recommend and what values are acceptable? The default is one correct? Can you use something like .5?

    #10887
    Halle Winkler
    Politepix

    I would recommend reducing it and doing some user testing to see what the minimum is for your application before you have an issue with utterances being cut off.

    #11362
    tarantoga
    Participant

    Was trying to lower secondsOfSilenceToDetect to very low values but it doesnt seem to work at all.
    In log there is always:
    2012-09-27 23:47:18.423 TestOpenEars[1650:907] Pocketsphinx has detected a second of silence, concluding an utterance.
    And I would really like to have only half second delay or maybe even 0.33
    Is it possible? Or to get it paid plugin is needed?

    #11367
    Halle Winkler
    Politepix

    The log always says “a second of silence” because that’s just what an NSLog statement says in the sample app. It isn’t related to the functionality of the property secondsOfSilenceToDetect and the log statement doesn’t come from the framework.

    secondsOfSilenceToDetect defaults to .7 seconds currently and if you change it it will be shorter or longer, but the difference between .7 seconds and for instance .33 isn’t going to be a big perceptual difference (although the very short delay can cause issues since any intermittent noise followed by a pause can trigger recognition) because you will still have the following sequence of events which all use time: the speech continuing until to completion, the silence after the complete speech, and then the time to process the complete speech.

    RapidEars doesn’t use a period of silence at all because it recognizes speech while the speech is in-progress rather than performing recognition on a completed statement (for instance, if you say “go right” it will first return the live hypotheses “go” and then “go right” as you are in the process of speaking the phrase — RapidEars doesn’t wait for a silence period to recognize). For your goal of using OpenEars-style speech recognition that only happens after a silence but with a shorter silence period it isn’t necessary for you to use RapidEars. But, since OpenEars defaults to a short period of silence out of the box, the differences from shortening it more than the default aren’t going to be dramatic; expect it to be a smaller change in the user experience.

    #11371
    Halle Winkler
    Politepix

    I’ve fixed the NSLog statement for the next version so the sample app doesn’t create confusion about the framework behavior and updated the online documentation and tutorial.

Viewing 10 posts - 1 through 10 (of 10 total)
  • You must be logged in to reply to this topic.