Reply To: [Resolved] Clarifications on the improved background noise cancellation feature

December 9, 2014 at 9:38 am #1023319

Politepix

I checked in with the CMU project and verified that this is correct (I’ll probably post the response later if I get permission to quote) – recalibration is definitely happening as you’ve seen, and 3.0 is probably the highest value that is correct to use. It is designed to be adaptive to changing environments but expects stationary noise, i.e. no dramatic oscillations that need to be reacted to in very short timeframes (this was also the case that would get the old VAD stuck, so we have an improvement if there’s no stuckness but recognition is sub-optimal).

It might be possible to change the VAD timeslice although it’s probably dangerous or possibly pointless to optimize in that area at the same time it continues to be developed by the Sphinx project.

If you feel like recompiling the framework, there are some config settings you can look at in OEPocketsphinxRunConfig.h related to VAD activity:

// #define kVAD_PRESPEECH //”-vad_prespeech”, int, default ARG_STRINGIFY(DEFAULT_PRESPCH_STATE_LEN), Num of speech frames to trigger vad from silence to speech.
// #define kVAD_POSTSPEECH //”-vad_postspeech”, int, default ARG_STRINGIFY(DEFAULT_POSTSPCH_STATE_LEN), Num of speech frames to trigger vad from speech to silence.

Or if the issue is that recognition is getting stuck, you can also reduce this check for a stuck utterance in OEContinuousModel.m to something lower than 25:

if(([NSDate timeIntervalSinceReferenceDate] – self.stuckUtterance) > 25.0)

Remember that the framework project has to be archived rather than just built or it won’t build a universal framework and you’ll get object errors with either a device or a simulator, depending.

Question, does your app play back audio or does it just take in mic audio?