Minimum volume threshold to detect speech

This topic has 5 replies, 2 voices, and was last updated 11 years, 5 months ago by Halle Winkler.

Viewing 6 posts - 1 through 6 (of 6 total)

Advertisement: “Did you know OpenEars™ can use rules-based grammars to recognize fixed phrases? And RuleORama lets you use them with RapidEars!”

Author

Posts
October 31, 2012 at 8:23 am #11770

nvrtd frst
Participant

Hi,

I am having a problem that in a very quiet area, pocketSphinx starts detecting very slight noises as speech (thinks like the hum of the fridge, very slight sounds). It then gets into this loop of detecting these noises as speech, which obviously hinders good recognition.

Is there a way to set a minimum absolute volume threshold just so it doesn’t pick up these things like the hum of a fridge?

Thanks

I am using OpenEars 1.2.2. NOTE: I have tweaked the build to use VoiceProcessingIO and have added code to set the audio session mode to “Video Recording.”

October 31, 2012 at 4:08 pm #11776

Halle Winkler
Politepix

Hiya,

Sorry, there is no trivial way to do this. You can only attempt to hack ContinuousADModule.mm with the proviso that wrong values forced into the VAD usually cause crashes. You might have better results going with the slightly less-sensitive default settings because the noise reduction might be causing an artificially wide differential between minor noises and noise-supressed quiet.

October 31, 2012 at 10:27 pm #11779

nvrtd frst
Participant

Hey Halle,

Thanks for the response. What do you mean by “less sensitive default settings?”

Thanks!

October 31, 2012 at 10:40 pm #11780

Halle Winkler
Politepix

Hiya,

I meant not overriding the audio session or audio unit settings, since I think the OpenEars defaults are less likely to zero out low noise buffers, meaning that there will be less of a difference between a low noise buffer and a not-that-much noise buffer, which theoretically might help the VAD to not overreact to not-that-much-noise buffers.

October 31, 2012 at 10:52 pm #11781

nvrtd frst
Participant

Thanks Halle. The reason I set the audio session mode to video recording is because I use voiceProcessingIO so users can give voice input while sound is playing, but this significantly decreases the output volume (when compared to remoteIO). I’ve found that by setting the session mode to “video recording,” the sound output volume is much better. But now I realize that there are some disadvantages as well.

November 1, 2012 at 10:03 am #11785

Halle Winkler
Politepix

It’s tricky — having noise cancellation and noise suppression is obviously a good thing, but the VAD in OpenEars is partial to non-noise-suppressed sources. I personally don’t use VoiceProcessingIO because it doesn’t seem to get the same degree of QA as RemoteIO (or it just has a lot more options that need QA attention) and I’ve had it stop working in a couple of minor OS updates on a couple of devices, which is a little bit too much of a needle in the haystack situation for maintaining a framework.

Actually, thinking of the VAD and its issues with noise-suppressed sources, I wonder if any of the command-line options in PocketsphinxRunConfig.h would help you. It might be worth a quick look in there to see if there is anything relevant you can turn on (maybe AGC or dithering or something).

Something I’ve noticed is that as a developer you will tend to try to find a test space with the least ambient noise possible because it’s unproductive to test speech applications in an uncontrolled environment, but real user environments are almost always noisier, so you might not need to worry too much about the corner case of an extremely quiet environment that has a slightly-less-quiet noise in it. I used to have a very similar issue when testing AllEars (which ended up providing the OpenEars code) that a sparrow would visit my balcony and the relatively quiet cheeping would totally ruin recognition, but I haven’t received user reports of similar scenarios.
Author

Posts

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.