[Resolved] No recognition

Home Forums OpenEars [Resolved] No recognition

Viewing 20 posts - 1 through 20 (of 20 total)

  • Author
    Posts
  • #1023616
    neshe9
    Participant

    I integrated my app with the new version of OpenEars framework.
    I see that the recognition after the upgrade is not as good as with the previous version.
    Even though the microphone is on, the received hypothesis is always null.
    I have set the ‘setActive’ to ‘TRUE’.
    Have followed all the steps in the upgrade guide and even double checked.
    Cannot seem to pinpoint the problem.

    #1023617
    Halle Winkler
    Politepix

    Welcome,

    There is no issue with the new version which causes only null hypotheses, so this is liable to be an issue with the upgrade process. Please check out the post Please read before you post – how to troubleshoot and provide logging info here so you can see how to turn on and share the logging that provides troubleshooting information for this kind of issue.

    Another good first step would be to try out the sample app (on a real device of course) in order to see that it works, or alternately a new app made from the tutorial, and then compare with your app (also on a real device) and see what is different in the setup.

    #1023625
    neshe9
    Participant

    Now I see that I was suspending recognition in the ‘pocketsphinxDidCompleteCalibration’ method.
    Is there any method similar to that in v2.0?
    I didn’t find any methods replacing the above mentioned..

    #1023637
    Halle Winkler
    Politepix

    Correct, there is no more calibration in 2.0 so there is no more calibration callback. Take a look at the OEEventsObserver header, docs or sample app to see the current delegate methods for OEEventsObserver – there should be one that is suitable.

    Speaking in terms of the new architecture, I wouldn’t recommend starting the engine and immediately suspending it as a method of “priming” the engine so it was (sort of) ready to go whenever the user wanted to speak to the app. This was never a safe approach for recognition quality since it meant that the calibration would be for a different timeframe than when speech was actually allowed to enter the system and it also would probably result in clipped speech, but now it really has no upside since without having to wait for calibration, starting the engine is fast enough to just go ahead and start it when you want to start listening.

    #1023785
    neshe9
    Participant

    In the language array, I have around 20 strings including some phrases and small sentences.
    When I was using the previous version, there was no problems with recognition.
    Is this a limitation of the version 2.0 to recognize only words?

    #1023786
    Halle Winkler
    Politepix

    No, there is no limitation of that kind. I’m not clear on the issue you’re having.

    The initial report was that there were only null hypotheses, which turned out to be due to a dependency in your app on using the old calibration callbacks, as far as I understand it. Then we discussed where to see the other callbacks. I explained that you can’t immediately suspend after starting the engine, since it has always been bad for recognition but now it doesn’t have any point, so that is going to have a negative effect on recognition.

    Is this a new issue or a different issue, or the same issue deriving from starting and immediately suspending? Can you elaborate on it for me so I can understand what we’re specifically troubleshooting?

    #1023787
    Halle Winkler
    Politepix

    The only big difference which has the potential to affect overall recognition behavior and potentially provide some surprises between 1.x and 2.x is the new voice activity detection threshold setting, vadThreshold. Most issues with new recognition behavior will be solved by adjusting this threshold until it meets your expectations for what is recognized and what is ignored. Because the voice activity detection is more discerning now, it is also very likely the extremely low secondsOfSilence settings like .1 will cause more interruptions of speech in progress, since this was always a setting which ought to result in interrupted user speech, but now it is going to be more likely to interrupt speech because it is also doing noise suppression of silence periods.

    So, for any recognition issues where behavior seems very different from 1.x, the most important steps are to set a realistic secondsOfSilence (.5 is realistic, .7 is better) and then adjust vadThreshold higher or lower between the range of 1.8 to 3.9 until recognition behavior matches the old behavior (if you really want it to exactly match the old behavior, which should probably be freshly evaluated with the new engine behavior).

    Make sure that any quality-impacting workarounds for now-defunct features, such as suspending immediately after starting, are 100% removed from your app design, since they will definitely harm recognition quality now (although they were subtly harming it previously as well).

    #1023998
    davestrand
    Participant

    So, you are saying that we should no longer use suspend or resume? I looked in the sample app to see how those situations should be handled and I found [[OEPocketsphinxController sharedInstance] suspendRecognition]; and [[OEPocketsphinxController sharedInstance] resumeRecognition];

    Still, I have since removed suspend and resume in my app in favor of startListeningWithLanguageModelAtPath and stopListening, however I do find there seemed to be quicker response time with the suspend and resume function. The start listening seems to take about a second to actually start the listening loop… which is pretty fast, but not as fast as it used to be..

    My app does a lot of switching back and forth between playing audio and expecting voice commands.

    Any suggestions?

    #1023999
    Halle Winkler
    Politepix

    So, you are saying that we should no longer use suspend or resume?

    Not at all – please keep using suspend and resume. As far as I know, the recognition issue reported was occurring when using suspend immediately after starting listening as a way of “priming” or avoiding calibration time to give the user the impression of an instant start, which is always going to be bad for a voice activity detection system that is either calibrating or using ongoing sampling of noise levels since it is starting and then immediately removing the data that is needed to get the starting speech/silence threshold, then later suddenly performing recognition on speech that is separated from the original input by some time gap. I’ve always recommended against doing this as far as I can recall, but now it doesn’t even have the upside of giving the impression of a faster start because the startup time no longer includes a calibration period.

    It’s 100% fine to use suspend and resume intermittently for its intended purpose of suspending and resuming already-started recognition to prevent recognition being performed when it isn’t desired, such as during audio playback or TTS output. For a potentially very long period of suspension that isn’t intermittent (the user is entering some part of the interface where they might be working for a while without any need for a speech UI, for instance) you may see better VAD results with fully stopping and then starting later on demand, and I think the startup timeframe is short enough that that isn’t onerous as a UX. Whether this is advantageous should become clear with a little bit of testing.

    #1024000
    davestrand
    Participant

    Ok great. Thank you.

    #1024001
    davestrand
    Participant

    Would it be safe to begin to suspend and resume listening after the recognition loop has begun. I mean, is that the proper timing needed before starting any suspended listening?

    #1024002
    Halle Winkler
    Politepix

    Why do you need to suspend as the recognition loop begins?

    #1024005
    davestrand
    Participant

    Well, when the view loads there is audio (mp3) almost instantly playing. I don’t want the device to use voice recognition on any (mp3) audio it happens to hear at that point. I could delay the startListen until the first time it is needed, but that would mean that the first time using the voice recognition it may not start working until the loop has begun. Hence a small delay the first time it fires up. It’s a minor detail, but I wouldn’t want to give a bad first impression.

    #1024007
    Halle Winkler
    Politepix

    Startup is less than a half-second for me, is it longer for you?

    #1024009
    davestrand
    Participant

    That’s pretty fast. I’ll experiment with starting the listen right when it’s needed. Thanks, the voice recognition does seem to be much more responsive and forgiving with 2.0. Great job.

    #1024010
    Halle Winkler
    Politepix

    Thank you! I’m glad you’re getting good results with it.

    #1024019
    neshe9
    Participant

    Sorry for the confusion..
    Yes, initially it was about null hypothesis. That is solved now.
    Later, it is the issue of slow recognition. Sometimes it takes as long as even 2-3 minutes to return a hypothesis.(Which leads to memory warning followed by the app crashing).

    So I replaced my language array in the sample app and tried. Recognition is slow there too.. So I asked if it was a matter of the size of the vocabulary..

    As you suggested, I will try varying the vadThreshold and the secondsOfSilence to see any changes in the performance.
    Also, I have removed the ‘suspendRecognition’ call now.
    Thank you. I will keep you posted.

    #1024028
    Halle Winkler
    Politepix

    OK, I think this is probably going to be due to the secondsOfSilence being set to a value that is much shorter than an actual pause in speech, combined with vadThreshold being at a setting which is oversensitive for your application – this guarantees that speech is being endlessly recognized and recognition is being endlessly triggered.

    So I replaced my language array in the sample app and tried. Recognition is slow there too.. So I asked if it was a matter of the size of the vocabulary..

    You didn’t let me know about that, so that is a new piece of information. This is what you said:

    Is this a limitation of the version 2.0 to recognize only words

    You didn’t mention any of your experiences with the sample app or the idea that it might be due to a particularly large vocabulary, just that you had sentences in a relatively small vocabulary as you described it and wanted to know if sentences specifically were no longer supported.

    It is extremely surprising to me that you’re seeing a 3-minute recognition of an utterance from a 20-sentence vocabulary in the sample app without a single other change to the sample app besides the vocabulary.

    If this is the case, please send me the vocabulary and a recording of you saying the statement that takes 3 minutes to return in the sample app. Just put them up somewhere for me to download and send me a link via the contact form, thank you.

    #1024081
    neshe9
    Participant

    Hi,
    I was able to reproduce the same scenario..
    Please find the log below:

    2015-01-06 10:07:48.634 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx is now listening.
    2015-01-06 10:07:48.638 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx started.
    2015-01-06 10:07:49.116 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has detected speech.
    2015-01-06 10:07:52.392 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has detected a second of silence, concluding an utterance.
    2015-01-06 10:07:52.490 OpenEarsSampleApp[194:3912] Local callback: The received hypothesis is START with a score of -1208 and an ID of 0
    2015-01-06 10:07:52.670 OpenEarsSampleApp[194:3912] Local callback: Flite has started speaking
    2015-01-06 10:07:52.675 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has suspended recognition.
    2015-01-06 10:07:54.717 OpenEarsSampleApp[194:3912] Local callback: Flite has finished speaking
    2015-01-06 10:07:54.723 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has resumed recognition.
    2015-01-06 10:07:55.211 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has detected speech.
    2015-01-06 10:08:01.346 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has detected a second of silence, concluding an utterance.
    2015-01-06 10:08:01.459 OpenEarsSampleApp[194:3912] Local callback: The received hypothesis is CHANGE MODEL with a score of -17461 and an ID of 1
    2015-01-06 10:08:01.678 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx is now using the following language model:
    /var/mobile/Containers/Data/Application/0AF6B9CF-9B3B-4D17-AD08-FC2EF042CFE1/Library/Caches/SecondOpenEarsDynamicLanguageModel.DMP and the following dictionary: /var/mobile/Containers/Data/Application/0AF6B9CF-9B3B-4D17-AD08-FC2EF042CFE1/Library/Caches/SecondOpenEarsDynamicLanguageModel.dic
    2015-01-06 10:08:01.678 OpenEarsSampleApp[194:3912] Local callback: Flite has started speaking
    2015-01-06 10:08:01.683 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has suspended recognition.
    2015-01-06 10:08:01.736 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has detected speech.
    2015-01-06 10:08:04.190 OpenEarsSampleApp[194:3912] Local callback: Flite has finished speaking
    2015-01-06 10:08:04.195 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has resumed recognition.
    2015-01-06 10:08:04.421 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has detected speech.
    2015-01-06 10:08:17.086 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has detected a second of silence, concluding an utterance.
    2015-01-06 10:08:47.247 OpenEarsSampleApp[194:3912] Local callback: The received hypothesis is TO TO CAN YOU PLEASE REPEAT with a score of -14960 and an ID of 2
    2015-01-06 10:08:47.469 OpenEarsSampleApp[194:3912] Local callback: Flite has started speaking
    2015-01-06 10:08:47.474 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has suspended recognition.
    2015-01-06 10:08:50.353 OpenEarsSampleApp[194:3912] Local callback: Flite has finished speaking
    2015-01-06 10:08:50.359 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has resumed recognition.
    2015-01-06 10:08:50.862 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has detected speech.
    2015-01-06 10:09:21.102 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has detected a second of silence, concluding an utterance.
    2015-01-06 10:11:19.424 OpenEarsSampleApp[194:3912] Received memory warning. — HERE MEMORY USAGE WAS AROUND 550 MB
    2015-01-06 10:12:43.160 OpenEarsSampleApp[194:3912] Local callback: The received hypothesis is GO TO NEXT SLIDE with a score of -12202 and an ID of 3
    2015-01-06 10:12:43.394 OpenEarsSampleApp[194:3912] Local callback: Flite has started speaking
    2015-01-06 10:12:43.400 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has suspended recognition.
    2015-01-06 10:12:46.079 OpenEarsSampleApp[194:3912] Local callback: Flite has finished speaking
    2015-01-06 10:12:46.084 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has resumed recognition.
    2015-01-06 10:12:48.519 OpenEarsSampleApp[194:3912] Local callback: Pocketsphinx has detected speech.

    I will get through the recording and vocabulary to you soon.

    #1024091
    Halle Winkler
    Politepix

    Hello,

    I don’t have the recording from you so I haven’t been able to test your case, but please check out OpenEars version 2.01 out today, which has given improvements for the other two submitted test cases with similar symptoms, and let me know if it helps with the situation you’re seeing.

Viewing 20 posts - 1 through 20 (of 20 total)
  • You must be logged in to reply to this topic.