Recognition lag

Home Forums OpenEars Recognition lag

Viewing 22 posts - 1 through 22 (of 22 total)

  • Author
    Posts
  • #1023425
    nrbrook
    Participant

    I’ve upgraded to 2.0 in a new branch so I can easily switch between versions. I haven’t changed any implementation details other than those required to support the API changes. I’m having a number of problems which could all be related, they are hard to pin down. Here’s one problem I keep getting:

    An utterance appears to be stuck in listening mode. Exiting stuck utterance.

    Before this is logged speech recognition hangs for quite a few seconds.

    #1023428
    Halle Winkler
    Politepix

    Slow recognitions have been reported a couple of times and both times it was due to not all the steps in the upgrade guide having been completed.

    #1023429
    nrbrook
    Participant

    Which steps? I have followed all the steps and apart from step 1, step 2 and step 5 the project would not compile if I had not made the changes successfully

    #1023430
    Halle Winkler
    Politepix

    For this issue, my best guess would be that your app is still linking to the old acoustic models. As you mentioned, there are a lot of steps that won’t lead to compilation errors if you skip them, so it has to be done pretty carefully.

    #1023431
    nrbrook
    Participant

    I replaced them again with the versions from the archive, I’m still getting the message 25 seconds after I start listening, every time.

    Step 1 and 2 are replacing the bundles and frameworks. I’ve definitely done this.

    Step 5 is architectural changes. The pocketsphinx usage I have changed, I’m not generating a language model. I am using OEEventsObserver correctly.

    Thanks

    #1023432
    Halle Winkler
    Politepix

    If you can replicate it with isolated changes to the sample app I can take a look at it. There’s no known bug which does this, so without any logging info or device/version info or code to run I don’t have a way of helping out.

    As I mentioned in the other thread, Rejecto is not designed to create language models for import without Rejecto running in the app, so that should be fixed in order to remove it as a potential issue, and I would be concerned about the switching branches back and forth thing since 2.0 isn’t just an API change but also changes in assets. That ought to work fine in theory, but it sounds like a source of debugging complexity right now.

    I think that if I was seeing weird behavior like this, my steps would be as follows:

    1. Use Rejecto as designed or use an OELanguageModelGenerator to create your LM if you aren’t using Rejecto,
    2. Attempt to replicate in the sample app using the smallest-possible changes, and
    3. If it doesn’t replicate, compare the two apps to see what’s different, and if it replicates, share the minimal code alterations and device and iOS version that causes it to replicate with the sample app with me.

    #1023444
    nrbrook
    Participant

    I have a sample App which exhibits the problem, where should I send it?

    #1023445
    Halle Winkler
    Politepix

    Great. I only want the changed lines, not the entire app, but you can send them to the email address that you received your framework license email from.

    #1023446
    Halle Winkler
    Politepix

    Remember to also send me your device and iOS version where this replicates, thanks!

    #1023485
    Halle Winkler
    Politepix

    OK, the issues I see in your sample app version is that the vadThreshold is too low to operate (I would leave it on the current default of 2.0), which is going to create oversensitivity in combination with your lowered secondsOfSilenceToDetect, and it is using Rejecto in an unsupported way (see above, and our previous discussion). If fixing those issues doesn’t improve the slow search, it’s fine to send me a sample app version which uses Rejecto to dynamically generate your LM to demonstrate the issue.

    #1023489
    nrbrook
    Participant

    Ok, so if you adjust vadThreshold my other problem is ‘BACK’ is very rarely recognised. It worked much more often in v1 of the framework.

    I’m not sure how rejecto is being used in an unsupported way as in the sample it just creates the same files on-the-fly that I am using in the edited version? It’s just not generating them every time.

    #1023490
    nrbrook
    Participant

    It would only not be valid if rejecto generates different files each time?

    #1023493
    Halle Winkler
    Politepix

    It is designed (and supported) to be used to generate the files, not to create a file which is then imported.

    #1023494
    nrbrook
    Participant

    I am not sure what the difference is there, but I have changed it.

    So if you take the new project, drag in rejecto, import it, and change the language model generation line to

    NSError *error = [languageModelGenerator generateRejectingLanguageModelFromArray:@[@"BACK",@"NEXT",@"REPEAT"] withFilesNamed:@"FirstOpenEarsDynamicLanguageModel" withOptionalExclusions:nil usingVowelsOnly:NO withWeight:nil forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"]];

    It rarely recognises ‘back’

    #1023496
    Halle Winkler
    Politepix

    Thank you for changing it. Let’s first finish troubleshooting your initial report about the lag. Is that fixed now that you’ve changed the Rejecto usage?

    #1023497
    nrbrook
    Participant

    Yes, I would recommend adding a warning or error for the threshold bounds as you said elsewhere the default was 1.5 and lowering it would reduce the noise threshold, which is what I was trying as it wouldn’t recognise back.

    #1023498
    nrbrook
    Participant

    The error I initially reported in the first post could suggest the problem.

    #1023499
    Halle Winkler
    Politepix

    That’s great to hear, thanks for the info about what was causing your issue we’ve been investigating. The error above can indicate different issues, but I can point out in the instructions that very low vadThreshold values can cause overactive engine work which can result in delayed recognition.

    To troubleshoot the “BACK” issue we probably want to work from a recording that has been added to pathToTestFile so we can both observe the same fixed input for troubleshooting/replication simplicity. For your own troubleshooting, I would start by checking whether you have the same issue when not using Rejecto at all, since that is going to be the first important variable in recognition.

    The way voice activity detection works has significantly changed in this version, so it may be necessary to refine the use of Rejecto (maybe it needs a weight change or a change of phoneme set to get similar results, we’ll see). To level-set, first stop using it and see if you have similar results, using a recording that you will be able to share with me. I’ll be asking you to make the most-minor-possible changes to the sample app that replicates the issue with your audio and to share them with me here since we aren’t working with a secret language model. You can post the audio file somewhere for me to download.

    #1023543
    Halle Winkler
    Politepix

    Hi,

    You can test an audio file programmatically by setting the pathToTestFile property of OEPocketsphinxController to your file (there is a commented-out example in the sample app) right before starting recognition. You can read more about creating a file in the required format under OEPocketsphinxController’s header file and/or docs about its pathToTestFile property. The short version is that it needs to be a 16k/16-bit/mono PCM WAV and there are instructions on converting to that format in the property documentation.

    I’ll hold on to the file you sent me if that is the one you’d like to test against, but in order to troubleshoot this further, please create a version of the sample app for yourself that replicates the issue for you using your file and the pathToTestFile property, and then show me here what your changes to the sample app are for getting the same results (along with device and iOS version) so we have a reliable and very minimal replication case we can both see and refer to in the course of further discussion. Thanks!

    #1023544
    Halle Winkler
    Politepix

    Thanks, I’ll check it out when there is time and I’ll check back in once I have some info. This might involve some discussion with other related projects, so it might not be a very fast follow-up.

    #1023545
    nrbrook
    Participant

    Ok thanks.

    #1024087
    Halle Winkler
    Politepix

    OK, can you take a look at the new OpenEars 2.01 out today with a significant VAD behavior fix and see if it improves your issue? When I ran your test case, 2.01 correctly detected the speech that wasn’t simultaneous with the sample app TTS voice speech (a couple of the utterances in the recording occur during TTS speech so they wouldn’t be detected in either case). Let me know if you’re seeing an improvement.

Viewing 22 posts - 1 through 22 (of 22 total)
  • You must be logged in to reply to this topic.