Mimic Pocketsphinx's handling of background noise

Home Forums OpenEars Mimic Pocketsphinx's handling of background noise

Viewing 10 posts - 1 through 10 (of 10 total)

  • Author
    Posts
  • #1031836
    JeroenNX
    Participant

    Dear Halle,

    I am using OpenEars (with Rejecto) for keyword-spotting; the app is only listening for 1 single word; once this single word is detected, OpenEars stops listening, fires a series of actions, and then starts keyword-spotting again.
    In a completely quiet environment this works great (after purchasing Rejecto, because without it almost every sound/word was detected as the ‘trigger-word’).
    However, when I introduce the tiniest bit of background noise; detection drops to absolute 0.

    Here is my setup:

    – I have 1 Raspberry Pi 3 with CMUsphinx/pocketsphinx and some Python code for keywordspotting; I use all the default settings/code based on their example, only I changed the kws_threshold to 1e-12: self.config.set_float(‘-kws_threshold’, 1e-12)

    – 1 iPad or iPhone with an app with OpenEars + Rejecto, English accoustic model, all settings default, 1 word (same word as on Raspberry Pi)

    I now put both devices 2 feet away from me, side by side and utter the ‘trigger-word’: both devices consistently detect the phrase/wake up almost simultaneously (the Pi version is slightly quicker).

    However, now I put an iPhone on my desk, playing music very very softly, 3 feet away from the mic of the iPad and the Pi; the Pi version now still detects the trigger word every single time; the iPad however seems completely deaf and never detects the phrase anymore, no matter how loud or close I speak. Even when I move the iPhone playing music much closer to the microphones and turn up the volume much louder, the Pi version keeps recognizing the trigger-word in a reliable fashion.
    Right after turning off the music the iPad starts recognizing again as well.

    I have tried playing around with the following settings:
    1. [[OEPocketsphinxController sharedInstance] setRemovingNoise:NO];
    2. [[OEPocketsphinxController sharedInstance] setVadThreshold:2.0];
    3. [[OEPocketsphinxController sharedInstance] setRemovingSilence:NO];
    And any combination of those; unfortunately to no avail.

    1. This does not fix the problem, however this does sometimes lead to the situation where after saying the trigger word 10 times in a row, it suddenly detects all 10; for example, if the trigger-word is Caroline, it will not respond (with very soft music in background) but after saying it X number of times, it is sometimes randomly detected, and then I see in the log “The received hypothesis is Caroline Caroline Caroline Caroline Caroline Caroline”; whereas normally, under quiet conditions, it would immediately fire after recognizing it once.
    2. Values of 2.0, 2.5, 3.0, 3.5 have no effect on the described problem.
    3. When I set this to NO, detection stops working completely, also in complete silence.

    Now my question: is there any way, using options/settings, to mimic the behavior of Pocketsphinx (C/python/Pi) as closely as possible on iPhone/iPad/iOS?

    Thanks for your reply,

    Jeroen

    #1031837
    Halle Winkler
    Politepix

    Hello,

    Yes, I have heard of a similar quiet-noise issue with iPads before with the version of the pocketsphinx VAD used in OpenEars. Please don’t use setRemovingNoise/setRemovingSilence in this case. Which language is this with, and please share your Rejecto settings.

    #1031838
    JeroenNX
    Participant

    Thanks for your reply.

    Language: English.

    Rejecto settings/relevant code snippets:

    OELanguageModelGenerator *lmGenerator = [[OELanguageModelGenerator alloc] init];
    NSString *name = @"NameIWantForMyLanguageModelFiles";
    NSArray *words = [NSArray arrayWithObjects:@"CAROLINE", nil];
    NSError *err = [lmGenerator generateRejectingLanguageModelFromArray:words withFilesNamed:name withOptionalExclusions:nil usingVowelsOnly:FALSE withWeight:nil forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"]];
    if (err == nil)
    {
        lmPath = [lmGenerator pathToSuccessfullyGeneratedLanguageModelWithRequestedName:name];
        dicPath = [lmGenerator pathToSuccessfullyGeneratedDictionaryWithRequestedName:name];
            
    } else
    {
        NSLog(@"Error: %@",[err localizedDescription]);
    }
        
    self.openEarsEventsObserver = [[OEEventsObserver alloc] init];
    [self.openEarsEventsObserver setDelegate:self];
    [[OEPocketsphinxController sharedInstance] setActive:TRUE error:nil];
    [[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:lmPath dictionaryAtPath:dicPath acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"] languageModelIsJSGF:NO];
    #1031839
    Halle Winkler
    Politepix

    OK, I recommend temporarily removing Rejecto, turning the vadThreshold up to 4.5 and reducing it by increments of .1 until you find the highest value which perceives your word and doesn’t react to the music. Once this is established, re-add Rejecto to reject OOV human speech.

    #1031840
    JeroenNX
    Participant

    I have tried as you suggested, but unfortunately this does not provide a solution.
    I have tried every value between 4.5 and 1.5 for vadThreshold; but this does not change anything. For values above 4.1 it never perceives my word, and for values below 4.2 it properly perceives my word, but only in very quiet surroundings.
    Maybe you misinterpreted the issue in my original post, but the problem is not that it reacts to music; the problem is that it never perceives/reacts to my word anymore once I introduce very soft background noise (for example music, or wind, or anything else; music was just an example/easy to reproduce); so it does not react to the music and it also does not react to my voice/the trigger-word.

    In summary: without any background noise (for example very soft music) my trigger-word is detected (almost) every time; however, as soon as I introduce a very soft background noise (for example music playing from an iPhone 3 feet away on very low volume), the iPad/OpenEars becomes completely deaf and never detects my trigger-word anymore, regardless of how loud or close I get.
    I do not have this issue when I try the exact same thing with a Raspberry Pi with Pocketsphinx (see opening post); even when I move the iPhone playing music much closer to the mic and turn up the volume, my trigger-word is still detected when I say it.

    #1031841
    Halle Winkler
    Politepix

    Hi,

    Sorry, to clarify, the spotting of a single trigger word is not actually an intended feature of OpenEars or described as a goal here or in its documentation – this would use a newer part of the Sphinx project which hasn’t been implemented in OpenEars. It does get used that way and I try to help with this situation when it comes up, but Rejecto was designed with the intention to reject OOV for vocabularies with multiple words. Pocketsphinx uses its own keyword spotting API so it isn’t an unexpected result that the outcomes are different. This may be a case in which you’d prefer to simply use Pocketsphinx built for an iOS target, which is supported by that project to the best of my knowledge.

    Regardless, I’m happy to continue to try to help you get better results. It isn’t clear to me from your response whether you took my first requested step of not using Rejecto while troubleshooting the VAD. It isn’t an expected result that a word found in your vocabulary that is significantly louder than the background isn’t heard at any VAD setting when Rejecto isn’t on. Is it possible that you’re using a different acoustic model with the pi version?

    #1031844
    JeroenNX
    Participant

    Hi Halle,

    Yes I did temporarily disable Rejecto like you asked, by replacing this:

    NSError *err = [lmGenerator generateRejectingLanguageModelFromArray:words withFilesNamed:name
                                                     withOptionalExclusions:nil
                                                            usingVowelsOnly:FALSE
                                                                 withWeight:nil
                                                     forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"]];

    With this:
    NSError *err = [lmGenerator generateLanguageModelFromArray:words withFilesNamed:name forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"]];

    The difference is noticeable instantly; because if the trigger word was for example ‘Caroline’, (in silence) wíth Rejecto, the app (mostly) only responds to ‘Caroli’ and ‘Caroline’, but without Rejecto it also responds to ‘Care’, ‘Caro’, ‘Carol’, etc. However, in both cases, with and without Rejecto, as soon as a bit of background noise (for example music) is introduced, it stops triggering completely.

    For both pocketsphinx (Pi) and OpenEars (iOS) I am using the default accoustic model that comes with the package.

    I’ll try if I can reproduce the issue with the OpenEarsSampleApp.

    #1031845
    Halle Winkler
    Politepix

    Hi,

    I would need the audio and the unaltered code settings with which the audio was recorded (with my request that VAD is set to the maximum limit where it still can perceive the trigger word when Rejecto is off), according to the instructions in this post:

    https://www.politepix.com/forums/topic/how-to-create-a-minimal-case-for-replication/

    In your case I would also need to know the distance from the iPad mic to the human speaker and to the music source.

    As mentioned, this is not necessarily something where the result is going to be the same behavior between those two implementations because they are not using the same API methods in Sphinx, but I don’t mind taking a look and seeing if there is something to suggest.

    #1032978
    amitsingh
    Participant

    Hello,

    I am facing issue with the background noise. OpenEars automatically detect the specific keyword from the background without recognition of the same keyword from the background. I am using vadThreshold to 3. Please let me know that how can we reduce the keyword detection from background.

    #1032979
    Halle Winkler
    Politepix

    Hi Amit, this is in the FAQ, it will help you to give it a read, thanks: https://www.politepix.com/openears/support

Viewing 10 posts - 1 through 10 (of 10 total)
  • The topic ‘Mimic Pocketsphinx's handling of background noise’ is closed to new replies.