Mimic Pocketsphinx's handling of background noise

Tagged: automatically detected keyword, Background noise

This topic has 9 replies, 3 voices, and was last updated 4 years, 6 months ago by Halle Winkler.

Viewing 10 posts - 1 through 10 (of 10 total)

Advertisement: “Don't want OpenEars™ to guess one of your vocabulary words when it hears an unknown word? Rejecto can help!”

Author

Posts
May 10, 2017 at 5:03 pm #1031836

JeroenNX
Participant

Dear Halle,

I am using OpenEars (with Rejecto) for keyword-spotting; the app is only listening for 1 single word; once this single word is detected, OpenEars stops listening, fires a series of actions, and then starts keyword-spotting again.
In a completely quiet environment this works great (after purchasing Rejecto, because without it almost every sound/word was detected as the ‘trigger-word’).
However, when I introduce the tiniest bit of background noise; detection drops to absolute 0.

Here is my setup:

– I have 1 Raspberry Pi 3 with CMUsphinx/pocketsphinx and some Python code for keywordspotting; I use all the default settings/code based on their example, only I changed the kws_threshold to 1e-12: self.config.set_float(‘-kws_threshold’, 1e-12)

– 1 iPad or iPhone with an app with OpenEars + Rejecto, English accoustic model, all settings default, 1 word (same word as on Raspberry Pi)

I now put both devices 2 feet away from me, side by side and utter the ‘trigger-word’: both devices consistently detect the phrase/wake up almost simultaneously (the Pi version is slightly quicker).

However, now I put an iPhone on my desk, playing music very very softly, 3 feet away from the mic of the iPad and the Pi; the Pi version now still detects the trigger word every single time; the iPad however seems completely deaf and never detects the phrase anymore, no matter how loud or close I speak. Even when I move the iPhone playing music much closer to the microphones and turn up the volume much louder, the Pi version keeps recognizing the trigger-word in a reliable fashion.
Right after turning off the music the iPad starts recognizing again as well.

I have tried playing around with the following settings:
1. [[OEPocketsphinxController sharedInstance] setRemovingNoise:NO];
2. [[OEPocketsphinxController sharedInstance] setVadThreshold:2.0];
3. [[OEPocketsphinxController sharedInstance] setRemovingSilence:NO];
And any combination of those; unfortunately to no avail.

1. This does not fix the problem, however this does sometimes lead to the situation where after saying the trigger word 10 times in a row, it suddenly detects all 10; for example, if the trigger-word is Caroline, it will not respond (with very soft music in background) but after saying it X number of times, it is sometimes randomly detected, and then I see in the log “The received hypothesis is Caroline Caroline Caroline Caroline Caroline Caroline”; whereas normally, under quiet conditions, it would immediately fire after recognizing it once.
2. Values of 2.0, 2.5, 3.0, 3.5 have no effect on the described problem.
3. When I set this to NO, detection stops working completely, also in complete silence.

Now my question: is there any way, using options/settings, to mimic the behavior of Pocketsphinx (C/python/Pi) as closely as possible on iPhone/iPad/iOS?

Thanks for your reply,

Jeroen

May 11, 2017 at 8:45 am #1031837

Halle Winkler
Politepix

Hello,

Yes, I have heard of a similar quiet-noise issue with iPads before with the version of the pocketsphinx VAD used in OpenEars. Please don’t use setRemovingNoise/setRemovingSilence in this case. Which language is this with, and please share your Rejecto settings.

May 11, 2017 at 8:38 pm #1031838
JeroenNX
Participant
Thanks for your reply.

Language: English.

Rejecto settings/relevant code snippets:
```
OELanguageModelGenerator *lmGenerator = [[OELanguageModelGenerator alloc] init];
NSString *name = @"NameIWantForMyLanguageModelFiles";
NSArray *words = [NSArray arrayWithObjects:@"CAROLINE", nil];
NSError *err = [lmGenerator generateRejectingLanguageModelFromArray:words withFilesNamed:name withOptionalExclusions:nil usingVowelsOnly:FALSE withWeight:nil forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"]];
if (err == nil)
{
    lmPath = [lmGenerator pathToSuccessfullyGeneratedLanguageModelWithRequestedName:name];
    dicPath = [lmGenerator pathToSuccessfullyGeneratedDictionaryWithRequestedName:name];
        
} else
{
    NSLog(@"Error: %@",[err localizedDescription]);
}
    
self.openEarsEventsObserver = [[OEEventsObserver alloc] init];
[self.openEarsEventsObserver setDelegate:self];
[[OEPocketsphinxController sharedInstance] setActive:TRUE error:nil];
[[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:lmPath dictionaryAtPath:dicPath acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"] languageModelIsJSGF:NO];
```
May 12, 2017 at 9:37 am #1031839

Halle Winkler
Politepix

OK, I recommend temporarily removing Rejecto, turning the vadThreshold up to 4.5 and reducing it by increments of .1 until you find the highest value which perceives your word and doesn’t react to the music. Once this is established, re-add Rejecto to reject OOV human speech.

May 13, 2017 at 3:50 am #1031840

JeroenNX
Participant

I have tried as you suggested, but unfortunately this does not provide a solution.
I have tried every value between 4.5 and 1.5 for vadThreshold; but this does not change anything. For values above 4.1 it never perceives my word, and for values below 4.2 it properly perceives my word, but only in very quiet surroundings.
Maybe you misinterpreted the issue in my original post, but the problem is not that it reacts to music; the problem is that it never perceives/reacts to my word anymore once I introduce very soft background noise (for example music, or wind, or anything else; music was just an example/easy to reproduce); so it does not react to the music and it also does not react to my voice/the trigger-word.

In summary: without any background noise (for example very soft music) my trigger-word is detected (almost) every time; however, as soon as I introduce a very soft background noise (for example music playing from an iPhone 3 feet away on very low volume), the iPad/OpenEars becomes completely deaf and never detects my trigger-word anymore, regardless of how loud or close I get.
I do not have this issue when I try the exact same thing with a Raspberry Pi with Pocketsphinx (see opening post); even when I move the iPhone playing music much closer to the mic and turn up the volume, my trigger-word is still detected when I say it.

May 13, 2017 at 11:01 am #1031841

Halle Winkler
Politepix

Hi,

Sorry, to clarify, the spotting of a single trigger word is not actually an intended feature of OpenEars or described as a goal here or in its documentation – this would use a newer part of the Sphinx project which hasn’t been implemented in OpenEars. It does get used that way and I try to help with this situation when it comes up, but Rejecto was designed with the intention to reject OOV for vocabularies with multiple words. Pocketsphinx uses its own keyword spotting API so it isn’t an unexpected result that the outcomes are different. This may be a case in which you’d prefer to simply use Pocketsphinx built for an iOS target, which is supported by that project to the best of my knowledge.

Regardless, I’m happy to continue to try to help you get better results. It isn’t clear to me from your response whether you took my first requested step of not using Rejecto while troubleshooting the VAD. It isn’t an expected result that a word found in your vocabulary that is significantly louder than the background isn’t heard at any VAD setting when Rejecto isn’t on. Is it possible that you’re using a different acoustic model with the pi version?

May 16, 2017 at 7:16 pm #1031844
JeroenNX
Participant
Hi Halle,

Yes I did temporarily disable Rejecto like you asked, by replacing this:
```
NSError *err = [lmGenerator generateRejectingLanguageModelFromArray:words withFilesNamed:name
                                                 withOptionalExclusions:nil
                                                        usingVowelsOnly:FALSE
                                                             withWeight:nil
                                                 forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"]];
```
With this:
NSError *err = [lmGenerator generateLanguageModelFromArray:words withFilesNamed:name forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"]];

The difference is noticeable instantly; because if the trigger word was for example ‘Caroline’, (in silence) wíth Rejecto, the app (mostly) only responds to ‘Caroli’ and ‘Caroline’, but without Rejecto it also responds to ‘Care’, ‘Caro’, ‘Carol’, etc. However, in both cases, with and without Rejecto, as soon as a bit of background noise (for example music) is introduced, it stops triggering completely.

For both pocketsphinx (Pi) and OpenEars (iOS) I am using the default accoustic model that comes with the package.

I’ll try if I can reproduce the issue with the OpenEarsSampleApp.
May 17, 2017 at 9:25 am #1031845

Halle Winkler
Politepix

Hi,

I would need the audio and the unaltered code settings with which the audio was recorded (with my request that VAD is set to the maximum limit where it still can perceive the trigger word when Rejecto is off), according to the instructions in this post:

https://www.politepix.com/forums/topic/how-to-create-a-minimal-case-for-replication/

In your case I would also need to know the distance from the iPad mic to the human speaker and to the music source.

As mentioned, this is not necessarily something where the result is going to be the same behavior between those two implementations because they are not using the same API methods in Sphinx, but I don’t mind taking a look and seeing if there is something to suggest.

October 15, 2019 at 9:43 am #1032978

amitsingh
Participant

Hello,

I am facing issue with the background noise. OpenEars automatically detect the specific keyword from the background without recognition of the same keyword from the background. I am using vadThreshold to 3. Please let me know that how can we reduce the keyword detection from background.

October 15, 2019 at 9:44 am #1032979

Halle Winkler
Politepix

Hi Amit, this is in the FAQ, it will help you to give it a read, thanks: https://www.politepix.com/openears/support
Author

Posts

Viewing 10 posts - 1 through 10 (of 10 total)

The topic ‘Mimic Pocketsphinx's handling of background noise’ is closed to new replies.