Reply To: First long phrase missed

August 22, 2014 at 12:57 pm #1022330

Politepix

OK, to clarify just in case we have a future case where a test app is needed, I was asking for a test app which used pathToTestFile along with your normal startListeningWithLanguageModelAtPath method call, not runRecognitionOnWavFileAtPath (you can read more about this in the docs for pathToTestFile), but I made a new test app so that it was possible to recreate the results using your test recording with startListeningWithLanguageModelAtPath.

I’ve seen your issue. What you can do in order to work around this bad outcome on the first utterance is to change the #define value kExcessiveUtterancePeriod of ContinuousModel.m in OpenEars.xcodeproj to something a bit longer than 13 seconds.

kExcessiveUtterancePeriod is very important in that it prevents your app from ever having a circumstance in which the voice audio detection becomes stuck for the app session due to an extreme change in background levels in one direction or the other which occurs too quickly for the voice activity detection to smoothly adjust to. It is set to that number due to the maximum likely sentence which can be satisfactorily recognized in the field. kExcessiveUtterancePeriod is not applied to every utterance, but it is applied to any utterance which causes a rescaling of backpointer size in pocketsphinx, i.e. utterances with a particularly large search space, which in this case is happening with the first utterance only, due to unknown causes that need more looking into – it isn’t expected.

My advice is to increase kExcessiveUtterancePeriod by seconds until it is large enough to not interfere with your maximum utterance length, plus a little bit of buffer for slow speakers, but no longer than that, so that results in the field remain as good as possible in cases with abrupt and significant changes of background level. Maybe it should be something along the lines of 20 seconds for your app. Then recompile the framework project and this issue shouldn’t be in evidence for your app.

At the moment I’m developing the next version of OpenEars which uses a different voice activity detector which is meant to be more accurate and more noise-robust, which will ideally mean that kExcessiveUtterancePeriod will disappear altogether because the situation in which it is requested as a failsafe will no longer occur.