Background noise causing OpenEars missing words

This topic has 13 replies, 3 voices, and was last updated 8 years, 3 months ago by Halle Winkler.

Viewing 14 posts - 1 through 14 (of 14 total)

Advertisement: “NeatSpeech is great-sounding offline speech synthesis, compatible with iOS6.1, and you can even edit pronunciations!”

Author

Posts
September 3, 2015 at 3:30 am #1026693

billylo
Participant

Hi, I am looking for ideas to solve this problem.

OpenEars works 100% when there is no background noise. When used in a moving car, the road noise caused OpenEars to miss some words (especially for 2nd words in phrase).

I have created a minimal use case here using SaveThatWave. I tried different combinations of vadThreshold, removeSilence, removeNoise settings and am not having much luck. The console output for verbose pocketphinx is available here.

thanks,

September 3, 2015 at 8:39 am #1026694

Halle Winkler
Politepix

Hi,

Thanks, but it’s only necessary to give me a minimal use case when there is a potential bug that I can’t replicate, so that isn’t necessary in this case. I usually ask for it in response to something that I’m not seeing in my own testbed but it doesn’t need to be done in order to open a question here (I do ask for logging to automatically be shared in the question when there is something buggy happening, although this isn’t really such a case).

Speech recognition from a distance with a high level of persistent background noise is very difficult to perform accurately so that isn’t an unknown issue or something that can be considered a bug.

There has been a fair amount of discussion in these forums about noise so the most productive step will probably be to do some searching here. You can also look into trying the VoiceChat audio mode (please read up here and in the documentation about its shortcomings), using smaller models for more accuracy, and checking to make sure that it isn’t Rejecto rejecting intended speech rather than the vadThreshold. removeNoise and removeSilence don’t relate to this issue so it isn’t necessary or advisable to make any changes to them.

One approach is to start by reducing vadThreshold until all speech and noise is getting processed, and then adding in Rejecto with whatever weight successfully suppresses the noise and incidental speech.

In general, you’re encountering a practical limit of speech recognition so in this case you may wish to encourage your users to use a headset or similar so that there isn’t a combination of distant speech (speech into the phone mic but not from a phone conversion distance) and high environmental noise.

September 3, 2015 at 4:14 pm #1026706

billylo
Participant

You are quick!

I should have added some more context to my question. In general, openears works really well even with road noises. This specific production scenario has puzzled me for weeks.

1. An example successful recognition (with significant road noise, iPhone6 plus). This consistently works.
2. An unsuccessful one (not sure what caused the issue. This user has no luck when she is on the road, iPhone6). This consistently has trouble.

If you can shed some light on this, it would be great.

thx,

September 3, 2015 at 4:23 pm #1026707

Halle Winkler
Politepix

Ah, OK, now I get it, thanks for clarifying. I can take a look at your case and let you know if there is anything special about it that I notice, maybe we can figure it out. I haven’t listened to it yet, but when it comes down to a specific speaker under challenging conditions it can sometimes be related to gender (due to whether the speaking frequency is highly-represented in the speech database the model is created from) or accent. I’ll get back to you after I’ve had a chance to hear it.

September 21, 2015 at 1:39 pm #1026839

darasan
Participant

Hi Halle,

On a related note: is it possible to detect the threshold at which the level of background noise may start to impact on recognition, and then warn the user of that? At least then they would know that they should try to move to a quieter place if possible.

Of course we could just monitor the mic level in the background and find the threshold (in dB) based on trial and error, but I was wondering if you might have some tips or guidelines.

Thanks,

Daire

September 21, 2015 at 1:44 pm #1026840

Halle Winkler
Politepix

Hello,

No, I would expect that this would be too specific to the individual implementation to be able to generalize in this way. Have you tried using the VoiceChat audio mode?

September 21, 2015 at 2:03 pm #1026843

darasan
Participant

OK thanks, that makes sense.

No I’m not using VoiceChat mode at the moment, because it reduces the playback volume of other sounds in my app, as described here: https://www.politepix.com/forums/topic/openears-finch/

I can experiment with turning it on to see if it improves recognition with background noise, and if so then try to work around the low volume issue…

September 25, 2015 at 8:20 pm #1026866

billylo
Participant

Hi Halle, any luck with listening to/testing these audio files and offer some guidance on next steps? thx,

September 25, 2015 at 8:33 pm #1026867

Halle Winkler
Politepix

Hi Billy,

I’m sorry for the delay but this is also in the queue for after the next update – I will check it out and tell you about it just as soon as that is out.

November 3, 2015 at 4:21 am #1027160

billylo
Participant

How’s your next update progressing? Could really use your guidance here. Some users continue to struggle with road noise. Billy.

November 3, 2015 at 3:24 pm #1027185

Halle Winkler
Politepix

Hi Billy,

Thank you very much for your patience (and to everyone else with a case for me to check out as well). I apologize for the delay. I’ve decided to break it out into two updates so I can take a look at these issues without more delay. With some luck I will get the first part of the two-part update out this week, so I can then take some time to catch up with issues before starting on the second part.

December 29, 2015 at 7:47 pm #1027640

Halle Winkler
Politepix

Hi Billy,

Apologies again for the very long delay and thank you again for your patience. This has been an unusual situation with a large series of prerequisites to being able to make a release and a rare case in which smaller releases were not possible. So, I have been able to run your case. When I use this audio file as the test file with RapidEars, I receive about the results I would expect: mostly correct recognitions, but sometimes there are dropped recognitions when the same word is repeated and sometimes there are wrong recognitions due to the noise level (and also probably due to the fact that I suspect a high female voice is underrepresented in the data used to generate the acoustic model).

I have the impression that my results differ from yours because you said above that you had absolutely no success, so presumably no correct recognitions. I noticed a couple of things about your case that may have affected this. The first is that your example alternates between using the stock OEPocketsphinxController listening method and RapidEars, so it is possible that you were expecting a callback in a RapidEars callback but it was being returned in a normal and unlogged OEPocketsphinxController callback. The other thing is that there is a missing comma in the language model array in your case, which could affect your vocabulary since I think it causes the two strings with no comma to be appended to each other as a single word.

I would imagine due to the long delay that you are no longer working on this issue and have other workarounds, however, you can let me know if that is not the case. If your results were similar to mine and you saw a mix of working and non-working recognitions, I would probably have to say that this is the situation I mentioned above: noisy recognition is quite difficult and it is probably expected to have compromised recognition, particularly for cases where there is a voice that is particularly high, or may have an accent, or other outlying characteristics when compared with the majority of the acoustic model data. One way of addressing this could be by doing acoustic model adaptation; search the forums and site for more info on this.

December 30, 2015 at 11:43 am #1027647

billylo
Participant

Let me give it a shot after the X’mas break. We have chosen to disable voice recognition function for now until we have a better user experience. Thanks.

December 30, 2015 at 11:54 am #1027648

Halle Winkler
Politepix

Thanks, just let me know if/when you want to look into it further.
Author

Posts

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic.