Reply To: [Resolved] Noise problem even setting setVadThreshold openEars 2.0

January 7, 2015 at 8:14 pm #1024126

Politepix

Anyway, regarding your phrase “a vadThreshold as high as 3.5 will suppress actual user speech in testing” is enough to know that something is really wrong with 2.0.1 because here my test and you will see the results:

But this recording is not of noises, it is a recording of continuous, human-comprehensible speech without crosstalk which precedes the louder speech by a long time in the recording, so it is the reference point for distinguishing between speech and silence for the engine. It does make sense that it is being detected as speech rather than noise since it is single-speaker speech which is clear enough to be understood by a human listener. A user using an app from a distance could have their speech detected at this power level.

I agree with you that the behavior has changed from 1.70 (or more specifically Pocketsphinx .8), but I don’t see this is a sign of something being really wrong, since it is speech at a volume which could be user speech, and the recording begins with the quieter speech and carries on for long enough that it makes sense that it is recognized. It is a bit strange that it isn’t recognized at all in 1.70.

In any case, that isn’t the behavior you want for your app, which is reasonable. I did a large amount of testing today on your case using the old and new Pocketsphinx on some Ubuntu VMs and the old and new OpenEars, and I did notice that it looks a bit like for Spanish recognition, the vadThreshold values would be more useful if they went higher as you requested earlier. In 2.01 I had similar results to 1.70 when I used a vadThreshold of 4.4 or 4.3, which more similar to lower values with the English model (although I had better accuracy with 2.01). It seems possible that the ideal vadThreshold values may have some relationship to the acoustic models and when I’ve had more time to test it I will check in with the Sphinx project and see if they have some ideas about whether that is the case, at which point I may add some kind of vadThreshold multiplier to OpenEars.

For now, so you can get on with things, I’ve uploaded OpenEars 2.02 which has a maximum vadThreshold of 5.0. When I set it to 4.4 in my test of your audio, it recognized “MADRID” and then “ROMA” and nothing else. I hope this is helpful.