December 11, 2014 at 7:21 pm #1023425
I’ve upgraded to 2.0 in a new branch so I can easily switch between versions. I haven’t changed any implementation details other than those required to support the API changes. I’m having a number of problems which could all be related, they are hard to pin down. Here’s one problem I keep getting:
An utterance appears to be stuck in listening mode. Exiting stuck utterance.
Before this is logged speech recognition hangs for quite a few seconds.December 11, 2014 at 7:36 pm #1023428
Slow recognitions have been reported a couple of times and both times it was due to not all the steps in the upgrade guide having been completed.December 11, 2014 at 7:39 pm #1023429
Which steps? I have followed all the steps and apart from step 1, step 2 and step 5 the project would not compile if I had not made the changes successfullyDecember 11, 2014 at 7:44 pm #1023430
For this issue, my best guess would be that your app is still linking to the old acoustic models. As you mentioned, there are a lot of steps that won’t lead to compilation errors if you skip them, so it has to be done pretty carefully.December 11, 2014 at 7:50 pm #1023431
I replaced them again with the versions from the archive, I’m still getting the message 25 seconds after I start listening, every time.
Step 1 and 2 are replacing the bundles and frameworks. I’ve definitely done this.
Step 5 is architectural changes. The pocketsphinx usage I have changed, I’m not generating a language model. I am using OEEventsObserver correctly.
ThanksDecember 11, 2014 at 8:05 pm #1023432
If you can replicate it with isolated changes to the sample app I can take a look at it. There’s no known bug which does this, so without any logging info or device/version info or code to run I don’t have a way of helping out.
As I mentioned in the other thread, Rejecto is not designed to create language models for import without Rejecto running in the app, so that should be fixed in order to remove it as a potential issue, and I would be concerned about the switching branches back and forth thing since 2.0 isn’t just an API change but also changes in assets. That ought to work fine in theory, but it sounds like a source of debugging complexity right now.
I think that if I was seeing weird behavior like this, my steps would be as follows:
1. Use Rejecto as designed or use an OELanguageModelGenerator to create your LM if you aren’t using Rejecto,
2. Attempt to replicate in the sample app using the smallest-possible changes, and
3. If it doesn’t replicate, compare the two apps to see what’s different, and if it replicates, share the minimal code alterations and device and iOS version that causes it to replicate with the sample app with me.December 11, 2014 at 8:55 pm #1023444
I have a sample App which exhibits the problem, where should I send it?December 11, 2014 at 8:56 pm #1023445
Great. I only want the changed lines, not the entire app, but you can send them to the email address that you received your framework license email from.December 11, 2014 at 8:59 pm #1023446
Remember to also send me your device and iOS version where this replicates, thanks!December 11, 2014 at 11:38 pm #1023485
OK, the issues I see in your sample app version is that the vadThreshold is too low to operate (I would leave it on the current default of 2.0), which is going to create oversensitivity in combination with your lowered secondsOfSilenceToDetect, and it is using Rejecto in an unsupported way (see above, and our previous discussion). If fixing those issues doesn’t improve the slow search, it’s fine to send me a sample app version which uses Rejecto to dynamically generate your LM to demonstrate the issue.December 12, 2014 at 1:17 am #1023489
Ok, so if you adjust vadThreshold my other problem is ‘BACK’ is very rarely recognised. It worked much more often in v1 of the framework.
I’m not sure how rejecto is being used in an unsupported way as in the sample it just creates the same files on-the-fly that I am using in the edited version? It’s just not generating them every time.December 12, 2014 at 1:18 am #1023490
It would only not be valid if rejecto generates different files each time?December 12, 2014 at 9:05 am #1023493
It is designed (and supported) to be used to generate the files, not to create a file which is then imported.December 12, 2014 at 9:45 am #1023494
I am not sure what the difference is there, but I have changed it.
So if you take the new project, drag in rejecto, import it, and change the language model generation line to
NSError *error = [languageModelGenerator generateRejectingLanguageModelFromArray:@[@"BACK",@"NEXT",@"REPEAT"] withFilesNamed:@"FirstOpenEarsDynamicLanguageModel" withOptionalExclusions:nil usingVowelsOnly:NO withWeight:nil forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"]];
It rarely recognises ‘back’December 12, 2014 at 9:49 am #1023496
Thank you for changing it. Let’s first finish troubleshooting your initial report about the lag. Is that fixed now that you’ve changed the Rejecto usage?December 12, 2014 at 9:54 am #1023497
Yes, I would recommend adding a warning or error for the threshold bounds as you said elsewhere the default was 1.5 and lowering it would reduce the noise threshold, which is what I was trying as it wouldn’t recognise back.December 12, 2014 at 9:55 am #1023498
The error I initially reported in the first post could suggest the problem.December 12, 2014 at 10:44 am #1023499
That’s great to hear, thanks for the info about what was causing your issue we’ve been investigating. The error above can indicate different issues, but I can point out in the instructions that very low vadThreshold values can cause overactive engine work which can result in delayed recognition.
To troubleshoot the “BACK” issue we probably want to work from a recording that has been added to pathToTestFile so we can both observe the same fixed input for troubleshooting/replication simplicity. For your own troubleshooting, I would start by checking whether you have the same issue when not using Rejecto at all, since that is going to be the first important variable in recognition.
The way voice activity detection works has significantly changed in this version, so it may be necessary to refine the use of Rejecto (maybe it needs a weight change or a change of phoneme set to get similar results, we’ll see). To level-set, first stop using it and see if you have similar results, using a recording that you will be able to share with me. I’ll be asking you to make the most-minor-possible changes to the sample app that replicates the issue with your audio and to share them with me here since we aren’t working with a secret language model. You can post the audio file somewhere for me to download.December 14, 2014 at 6:37 pm #1023543
You can test an audio file programmatically by setting the pathToTestFile property of OEPocketsphinxController to your file (there is a commented-out example in the sample app) right before starting recognition. You can read more about creating a file in the required format under OEPocketsphinxController’s header file and/or docs about its pathToTestFile property. The short version is that it needs to be a 16k/16-bit/mono PCM WAV and there are instructions on converting to that format in the property documentation.
I’ll hold on to the file you sent me if that is the one you’d like to test against, but in order to troubleshoot this further, please create a version of the sample app for yourself that replicates the issue for you using your file and the pathToTestFile property, and then show me here what your changes to the sample app are for getting the same results (along with device and iOS version) so we have a reliable and very minimal replication case we can both see and refer to in the course of further discussion. Thanks!December 14, 2014 at 6:50 pm #1023544
Thanks, I’ll check it out when there is time and I’ll check back in once I have some info. This might involve some discussion with other related projects, so it might not be a very fast follow-up.December 14, 2014 at 6:51 pm #1023545
Ok thanks.January 6, 2015 at 7:03 pm #1024087
OK, can you take a look at the new OpenEars 2.01 out today with a significant VAD behavior fix and see if it improves your issue? When I ran your test case, 2.01 correctly detected the speech that wasn’t simultaneous with the sample app TTS voice speech (a couple of the utterances in the recording occur during TTS speech so they wouldn’t be detected in either case). Let me know if you’re seeing an improvement.
- You must be logged in to reply to this topic.