wfilleman

Forum Replies Created

Viewing 48 posts - 1 through 48 (of 48 total)

  • Author
    Posts
  • in reply to: Muting audio from background apps #1026632
    wfilleman
    Participant

    Hi Halle,

    It’s Wes again :) It’s been awhile, but my users are still getting great value out of OE.

    Upgraded to 2.041 to address the memory growth issue some of my users were seeing and I got bit by something I “fixed” in previous releases. Not sure if I ever posted about this, but in relation to this post about how the AVAudioSession works on init, there’s an undesirable side effect in OEContinuousAudioUnit.m when the app resumes from background.

    Namely what happens is on resume from background, OE sees an audioSession interruption Type Ended which could be due to something in my code (not sure, but it doesn’t matter). The problem is if the user never has turned on OE (voice recognition) OE just goes ahead and set the active audio session in the handleInterruption call of OEContiniuousAudioUnit.m

    What happens is that the background music the user was playing is killed and they are confused because they never turned on voice recognition.

    How I fixed it for myself, and I’m presenting it here because I think this should be merged into the main release, is to see if the self.audioUnitState is kAudioUnitIsStarted before setting the AVAudioSession to Active. This prevents OE from stepping on the AudioSession if it’s not actually turned on.

    Here’s the code. Note that I surrounded the additions with /* MOBILINC ADDITIONS

    – (void)handleInterruption:(NSNotification *)notification {

    NSInteger interruptionPhase = [[[notification userInfo] valueForKey:AVAudioSessionInterruptionTypeKey] intValue];

    if (interruptionPhase == AVAudioSessionInterruptionTypeBegan) {
    if(openears_logging == 1) NSLog(@”The Audio Session was interrupted.”);

    NSDictionary *userInfoDictionary = @{@”OpenEarsNotificationType”: @”AudioSessionInterruptionDidBegin”}; // Send notification to OEEventsObserver.
    NSNotification *notification = [NSNotification notificationWithName:@”OpenEarsNotification” object:nil userInfo:userInfoDictionary];
    [[NSNotificationCenter defaultCenter] performSelectorOnMainThread:@selector(postNotification:) withObject:notification waitUntilDone:YES];

    }

    if (interruptionPhase == AVAudioSessionInterruptionTypeEnded) {

    if(openears_logging == 1) NSLog(@”The Audio Session interruption is over.”);

    NSDictionary *userInfoDictionary = @{@”OpenEarsNotificationType”: @”AudioSessionInterruptionDidEnd”}; // Send notification to OEEventsObserver.
    NSNotification *notification = [NSNotification notificationWithName:@”OpenEarsNotification” object:nil userInfo:userInfoDictionary];
    [[NSNotificationCenter defaultCenter] performSelectorOnMainThread:@selector(postNotification:) withObject:notification waitUntilDone:YES];

    /*
    * MOBILINC ADDITIONS
    */
    if (self.audioUnitState == kAudioUnitIsStarted) {
    /*
    * MOBILINC ADDITIONS
    */

    NSError *error = nil;

    error = [self setAllAudioSessionSettings];

    [[AVAudioSession sharedInstance] setActive:YES error:&error];

    if (error != nil) NSLog(@”AVAudioSession set active failed with error: %@”, error);

    /*
    * MOBILINC ADDITIONS
    */
    }
    /*
    * MOBILINC ADDITIONS
    */

    }
    }

    wfilleman
    Participant

    Thanks Halle!

    Wes

    wfilleman
    Participant

    Sure, nothing speaks against it now since it should be semantically and behaviorally very similar to the mixing opt-out. I’ll add a ticket and barring any unexpected discoveries related to the feature that would cause me to revise my idea that it is straightforward, I will add it at the same time.

    Thanks Halle! My code is super straight forward to add in the BOOL. The painful side for me is I have to touch several layers of OE code to get the BOOL where it needs to go each time. Glad I won’t have to do that anymore and can use the OE code as-is with this feature addition.

    Wes

    wfilleman
    Participant

    Thanks Halle, I actually tried that and ducking to see if it made any difference. Unfortunately there wasn’t any difference in volume. Still very low.

    I agree, I suspect it’s the SDK that’s doing this. It looks like the volume output is more consistent across devices, but it’s just VERY low on VoiceChat. So, looks like I’ll need to go back and stick with the Default setting unless you have any other ideas to try.

    Since you are going to be adding in new API’s here, can I request an API to disable Bluetooth? I have to add in my custom code into the OE source to do this each time there’s a new OE release. It’s just a BOOL to set whether the audiosession should include the AVAudioSessionCategoryOptionAllowBluetooth flag.

    Wes

    wfilleman
    Participant

    Ah, excellent. Ok, then glad I’m not just “hearing” what I wanted to hear.

    BTW: I went back to “Default” for the audioMode because it looks like VoiceChat was killing the volume output too much for my user’s taste (more so than the older 1.7x OE release). Not sure if there’s a way to adjust this, but thought I’d mention it.

    Keep up the great work!
    Wes

    wfilleman
    Participant

    Hi Halle,

    Just had a chance to check out the 2.03. Looks good to me! In fact this seems to fix a problem I was seeing where the voice recognition loop was taking a while to exit and then failing to deliver a valid hypothesis. I’m seeing better recognition overall and in these conditions where the recognition pauses for a bit and then exits with a more reliable hypothesis.

    Could be in my head (and limited testing) but an improvement. I’ll be rolling this out to my users over the next couple of weeks in the next update.

    Thanks Halle!
    Wes

    wfilleman
    Participant

    Thanks Halle,

    I’ll check it out!

    Wes

    wfilleman
    Participant

    Sure thing Halle!

    Agreed, it’s not ideal, but it does work. Hey, at the very least, 2.0 didn’t break anything I did for the 1.x line. I can’t say that for other libraries and their major upgrades I’ve had to worked with ;)

    Wes

    wfilleman
    Participant

    Hi Halle,

    Yup, I gave this a try again with iOS 8 + OE 2.0. Same problem I had before under OE 1.x and iOS 7. I couldn’t play system sounds using the system sound player. I had to fall back to use AVAudioPlayer. This combo may be fine for some folks. I just couldn’t use it because of the inherent delay in AVAudioPlayer. The only way I can get fast sound output (system sound path) with OE is to place OE into the VoiceChat mode. This is true for OE 1.x and 2.0 under iOS 7 and iOS 8.

    Wes

    wfilleman
    Participant

    Thanks Halle, my apologies on not mentioning that earlier.

    Your comment did give me an idea to try. I went back to setting the audio mode as Default, but actually, I didn’t notice any difference with regards to the VAD or the background noise root issue that I’m investigating.

    The reason I need to use the audioMode value of “VoiceChat” is so I can issue system sounds while the mic is on. With using the “default” value for the audioMode, I have to use an AVAudioPlayer object which causes a lag or delay in the issuance of the sound which won’t work in my application.

    My IP Camera streaming code uses AudioUnits and that works regardless of which audioMode I use.

    Regardless of which option I use, the behavior of the voice recognition and treatment of the background noise seems to be the same to me.

    I ended up setting the VAD to the default of 2.0 and giving the user the option to increase this up to 3.5 according to the user’s variable noise environment based on feedback and my testing.

    I think I’ve exhausted the options at this point and it’s as good as it’s going to get with the current implementation from CMU. Overall this is an improvement especially since I can offer the use a way to tune to their variable noise environment…something we didn’t have access to before.

    I’m considering this topic closed at this point. If I find anything of interest I’ll post back. Otherwise, looking forward to any continual feature enhancements!

    Wes

    wfilleman
    Participant

    Hi Halle,

    I’m actually using the VoiceChat mode almost exclusively since the app can also stream IP Camera video/audio. That seems to work as it did before.

    I tried adjusting the kVAD_PRESPEECH value from 10 to 20, 50, 100 and it seems to really only change the time it takes to recognize speech. I didn’t notice anything different with the noise suppression, but based on the notes in the header file, I didn’t really expect it to.

    This didn’t help me since I need to respond to short one word commands, but for other’s that are looking to activate and track on longer voice patterns, it might help since it would force OpenEars to listen longer for a voice before triggering. This could be beneficial to avoid frequent “Listening” states.

    Also I’m finding that 2.0 for VAD is still too sensitive for me as it’s triggering on the slightest noise in the room. I’m starting at 2.5 and giving my users the option to scale up to 3.5 if they have more variable noise in their environments. This seems to be working better for me in my tests.

    I’ll post back if I discover anything else of interest.

    wfilleman
    Participant

    Thanks Halle,

    Based on their response, how would you expect the pre and post speech values to change and their impacts to overall speech detection? I’m not sure I’m following the link between the code change suggestion and the CMU response. It sounds to me that their sliding 5 sec average is fixed?

    Wes

    wfilleman
    Participant

    Thanks Halle,

    Ok, that’s good. That all makes sense with what I’m seeing. No problem with the framework rebuild. I already rebuilt it yesterday to add back in a custom feature I need in my app to be able to disable the bluetooth input option with OE via a BOOL variable on the PocketSphinxController.

    I’ll play around with these settings and post back with what I find. The 25 seconds needs to come down for my use cases. Thanks for pointing me in the right direction there.

    Yes, my app plays audio as well. One of the features is an IP Camera streaming option that can play video/audio from IP Cameras. While not in use 100% of the time, it’s possible a user could have voice recognition ON while watching their camera. I did look at this yesterday and it appeared to work like the 1.x framework. So, no concerns from me on that front.

    Wes

    wfilleman
    Participant

    I’m looking at this a little deeper and I *think* what I’m actually seeing is the OE framework adjusting to the different volume levels quite rapidly. For example, if I have a steady tone as background noise, OE pretty quickly sees this as noise and ignores it. I can then issue speech and it does pretty well.

    If I’m playing music with various beat levels, I see OE struggle a little bit trying to determine what to ignore as noise since it’s seeing the threshold cross all over the place with the beat of the music.

    I’m wondering if there’s a way to level this auto-adjustment out by increasing the number of frames OE considers for the “noise level” if that makes sense. For example, if OE only looks at a few frames, then the “noise” level would be rapidly changing from low to high and back. If OE looks at a larger group of frames as a moving average, then these intermediate spikes of noise could be leveled out and ignored.

    Just my guess, but I think that’s what I’m actually seeing. Adjusting the vadThreshold is a way to work around this issue by forcing a larger discrepancy, but if (what I suspect) is a single frame of louder noise, it punches through the vadThreshold since the low/high detection appears to be pretty tight in terms of a low number of frames to analyze.

    There’s no easy answer here as what I’m suggesting would have other tradeoffs as well if I’m even close to the issue.

    Wes

    wfilleman
    Participant

    Thanks Halle,

    I was digging into the framework when I saw that vadThreshold was how you describe is as a relative speech/silence threshold. That’s good to know.

    I’ve set the vadTheshold to 3.0 and ran a couple of tests with background music from the radio at different volume levels. Overall it seems to be a little better. Now that I know what I’m looking for I can see that it is indeed adjusting to the various sound levels. When the music levels are above what I would say is just background music, it’s really tough to get OE to process the speech, but again, I’m asking a lot of the engine to throw out louder than background music and pull out my speech.

    You are right, it’s a fine balance between upping the threshold and keeping it within speech detecting tolerance.

    I may offer my users an option to say if they are installing this in a noisy room. If YES then I can set the vadThreshold to 3.0. If no, leave it as the default. What do you think?

    Wes

    in reply to: [Resolved] Small bug when running on iOS 8 #1023307
    wfilleman
    Participant

    Great work Halle! Bluetooth works for me perfectly in OE 2.0.

    I’ve got an unrelated question that I’ll start a new topic on.

    Wes

    in reply to: [Resolved] Small bug when running on iOS 8 #1022716
    wfilleman
    Participant

    Hi Halle,

    I just tried this test on iOS 8.1 beta 2. Same results as my previous post. I had heard there were some BT bugs in iOS 8, fixed in 8.1, but the fixes didn’t change what I’m seeing in the OpenEars sample app. I was hoping that would have been the answer, but unfortunately it’s not going to be that easy.

    Wes

    in reply to: [Resolved] Small bug when running on iOS 8 #1022645
    wfilleman
    Participant

    Thanks Halle, two more data points for you:

    1. I tried these tests again on iOS 8.0.2. Same results. iOS 8.0.2 didn’t fix what I’m seeing.

    2. I noticed something this morning that I’m sure now is likely why getDecibels is going to inf. In the AudioUnitRenderCallback, the inNumberFrames is usually 2048, except with bluetooth in the failure scenarios:

    – Using the internal mic, I see 2048 frames in each callback. Everything works as intended.

    – With the bluetooth headset connected, on initial startup I see 4096 frames in the AudioUnitRenderCallback UNTIL Flite says “Welcome to OpenEars”. Then I see 2048 frames in each callback. I can then say “CHANGE” into the bluetooth headset and have it recognized. After the recognized speech and AudioUnitRenderCallback is fired continuously again, I see the number of frames jump back to 4096 and getDecibels goes to inf. This is the failure scenario.

    Hopefully this helps, but there is a correlation between 4096 inframes in the AudioUnitRenderCallback and failing to recognize speech. When AudioUnitRenderCallback is producing 2048 inframes then everything is working fine.

    Also, just to confirm, when flite speaks, I’ll see the inframes go back to 2048 from 4096. So there is something that flite is doing that’s positively impacting the number of frames going into the AudioUnitRenderCallback when a bluetooth headset is connected.

    Wes

    in reply to: [Resolved] Small bug when running on iOS 8 #1022634
    wfilleman
    Participant

    Yes, it is quite odd.

    Summary of tests:
    Test Case 1:
    Comment out all cases of “say”.
    Start sample app with NO bluetooth connected mic.
    Wait for “listening”
    Say “change”, app recognizes “change”
    Say “model”, app recognizes “model”.

    Test Case 2:
    Comment out all cases of “say”.
    Start sample app WITH bluetooth connected mic.
    Wait for “listening”
    Say “change”, nothing happens. Decibel value is -120.
    Say “model”, nothing happens. Decibel value is -120.

    Test Case 3:
    Comment out all cases of “say” except for the first “Welcome to OpenEars”
    Start sample app WITH bluetooth connected mic.
    Wait for “listening”
    Say “change”, app recognizes “change”.
    Say “model”, nothing happens. Decibel value is -120.
    Decibel value stays at -120 (internally it’s inf).

    Test Case 4:
    Leave in all cases of “say”.
    Start sample app WITH bluetooth connected mic.
    Wait for “listening”
    Say “change”, app recognizes “change”.
    Say “model”, nothing happens. Decibel value is -120.
    Decibel value stays at -120 (internally it’s inf).

    in reply to: [Resolved] Small bug when running on iOS 8 #1022632
    wfilleman
    Participant

    It’ll recognize any of the words in the sample app. Example: “MODEL” or “CHANGE”

    in reply to: [Resolved] Small bug when running on iOS 8 #1022628
    wfilleman
    Participant

    Ok, more results:

    Interestingly enough, it looks like there’s some back room communication between Flite and Pocketsphinx when Flite is speaking as the suspend call isn’t coming from the ViewController.

    Anyways, I took out all the calls to Flite to speak text and while pocketsphinx now never gets suspended, the result was, the mic stream never configures (always inf for decibel values) when using a bluetooth headset. Built-in mic works fine.

    When I added back in the initial Flite speak text @”Welcome to OpenEars.” then the bluetooth connected mic configures but then fails as described above where the first recognized speech seems to work and then fails after with the decibel values going to inf.

    So, it seems like it’s necessary to have some audio output to get the mic to configure. That’s quite strange. Not sure what to make of that result.

    in reply to: [Resolved] Small bug when running on iOS 8 #1022627
    wfilleman
    Participant

    Hold on…Checking now.

    in reply to: [Resolved] Small bug when running on iOS 8 #1022625
    wfilleman
    Participant

    Yes, it does. The first one through works. After that, it dies as described in my previous post. That’s what’s odd. It does absolutely initially work. But after the first one it fails.

    in reply to: [Resolved] Small bug when running on iOS 8 #1022622
    wfilleman
    Participant

    Sure thing:

    In the sample app when you speak, the Pocketsphinx Input Levels will stop while Flite is speaking the recognized speech. After Flite is done speaking, I’ll see the Pocketsphinx Input Levels bounce around according to the DB levels of the mic input.

    This all looks normal. Don’t want to try to recognize Flite speech.

    With the Bluetooth mic attached, after Flite is done speaking on the first recognition, the Pocketphinx Input Levels goes to -120db and stays there. Meanwhile under the hood my custom debug statements are showing “inf” for the decibel levels.

    in reply to: [Resolved] Small bug when running on iOS 8 #1022620
    wfilleman
    Participant

    Thanks Halle, but unfortunately this didn’t have any effect.

    I replaced the code as instructed, rebuilt and tested, but the result was the same.

    What’s a little odd is that it works initially so that leads me to believe that the initial setup is correct or on the right track. It’s after pausing during the recognition state where it usually doesn’t come back (I get the inf values in getDecibels).

    As soon as Apple gets iOS 8.0.1 figured out I’ll test on 8.0.1 to see if the issue persists.

    in reply to: [Resolved] Small bug when running on iOS 8 #1022610
    wfilleman
    Participant

    Ok, got some info for you.

    The first failure occurs in the find_thresh(cont_ad_t * r) function.

    The issue here is the detected max input levels are way above the defined max_noise level of 70. The Bluetooth connected headset is coming in at 98 or so. So, the first thing I did was to up CONT_AD_MAX_NOISE to 100 up from 70.

    That got me though the sound calibration, but there’s another problem and this one I have no idea how to solve.

    The first vocal input seems to work, but after the first recognition, something happens to the input stream from the mic. The function getDecibels in ContinuousAudioUnit.m starts reporting that the sampleDB value is “-inf”. Can’t say I’ve seen that before.

    The logic here in getDecibels is specifically filtering out for inf values so someone thought of this or has seen it before.

    If I turn off the headset everything goes back to normal and works.

    My assumption here is the inf value indicates that the mic data is trashed and shouldn’t be used. So, the big question is, any ideas on why that’s happening?

    I’ve tried this on an iPhone and an iPad Mini running iOS 8.0. Same results.

    Thanks Halle,
    Wes

    in reply to: [Resolved] Small bug when running on iOS 8 #1022605
    wfilleman
    Participant

    I agree, it’s not a useful log dump. Luckily the failure seems to be limited in the cont_ad_calib call in the framework. When I get some time I’ll see if I can dig into this function and figure it out.

    “I unfortunately don’t have a device that replicates it”
    – Does this mean your bluetooth test device works fine? Or does this mean, you don’t have a bluetooth headset to test with?

    From the user reports and my testing, I believe any bluetooth headset will expose the issue.

    Wes

    in reply to: [Resolved] Small bug when running on iOS 8 #1022603
    wfilleman
    Participant

    Thanks Halle,

    Don’t know if you’d prefer an email on this, so, just let me know, but here’s the entire log output with OpenEars 1.7 with OpenEarsLogging and verbosePockersphinx turned on with a bluetooth headset connected to an iPhone with iOS8:

    2014-09-22 10:11:40.085 OpenEarsSampleApp[197:5624] Starting OpenEars logging for OpenEars version 1.7 on 32-bit device: iPhone running iOS version: 8.000000
    2014-09-22 10:11:40.090 OpenEarsSampleApp[197:5624] acousticModelPath is /private/var/mobile/Containers/Bundle/Application/16FB9F4F-0683-499B-A759-FB6928A35CFC/OpenEarsSampleApp.app/AcousticModelEnglish.bundle
    2014-09-22 10:11:40.123 OpenEarsSampleApp[197:5624] Starting dynamic language model generation
    2014-09-22 10:11:40.131 OpenEarsSampleApp[197:5624] Able to open /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/FirstOpenEarsDynamicLanguageModel.corpus for reading
    2014-09-22 10:11:40.133 OpenEarsSampleApp[197:5624] Able to open /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/FirstOpenEarsDynamicLanguageModel_pipe.txt for writing
    2014-09-22 10:11:40.133 OpenEarsSampleApp[197:5624] Starting text2wfreq_impl
    2014-09-22 10:11:40.142 OpenEarsSampleApp[197:5624] Done with text2wfreq_impl
    2014-09-22 10:11:40.142 OpenEarsSampleApp[197:5624] Able to open /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/FirstOpenEarsDynamicLanguageModel_pipe.txt for reading.
    2014-09-22 10:11:40.144 OpenEarsSampleApp[197:5624] Able to open /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/FirstOpenEarsDynamicLanguageModel.vocab for reading.
    2014-09-22 10:11:40.144 OpenEarsSampleApp[197:5624] Starting wfreq2vocab
    2014-09-22 10:11:40.147 OpenEarsSampleApp[197:5624] Done with wfreq2vocab
    2014-09-22 10:11:40.148 OpenEarsSampleApp[197:5624] Starting text2idngram
    2014-09-22 10:11:40.163 OpenEarsSampleApp[197:5624] Done with text2idngram
    2014-09-22 10:11:40.169 OpenEarsSampleApp[197:5624] Starting idngram2lm

    2014-09-22 10:11:40.183 OpenEarsSampleApp[197:5624] Done with idngram2lm
    2014-09-22 10:11:40.183 OpenEarsSampleApp[197:5624] Starting sphinx_lm_convert
    2014-09-22 10:11:40.190 OpenEarsSampleApp[197:5624] Finishing sphinx_lm_convert
    2014-09-22 10:11:40.193 OpenEarsSampleApp[197:5624] Done creating language model with CMUCLMTK in 0.069508 seconds.
    2014-09-22 10:11:40.239 OpenEarsSampleApp[197:5624] I’m done running performDictionaryLookup and it took 0.034399 seconds
    2014-09-22 10:11:40.246 OpenEarsSampleApp[197:5624] I’m done running dynamic language model generation and it took 0.156091 seconds
    2014-09-22 10:11:40.247 OpenEarsSampleApp[197:5624] Dynamic language generator completed successfully, you can find your new files FirstOpenEarsDynamicLanguageModel.DMP
    and
    FirstOpenEarsDynamicLanguageModel.dic
    at the paths
    /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/FirstOpenEarsDynamicLanguageModel.DMP
    and
    /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/FirstOpenEarsDynamicLanguageModel.dic
    2014-09-22 10:11:40.247 OpenEarsSampleApp[197:5624] acousticModelPath is /private/var/mobile/Containers/Bundle/Application/16FB9F4F-0683-499B-A759-FB6928A35CFC/OpenEarsSampleApp.app/AcousticModelEnglish.bundle
    2014-09-22 10:11:40.253 OpenEarsSampleApp[197:5624] Starting dynamic language model generation
    2014-09-22 10:11:40.260 OpenEarsSampleApp[197:5624] Able to open /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/SecondOpenEarsDynamicLanguageModel.corpus for reading
    2014-09-22 10:11:40.262 OpenEarsSampleApp[197:5624] Able to open /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/SecondOpenEarsDynamicLanguageModel_pipe.txt for writing
    2014-09-22 10:11:40.262 OpenEarsSampleApp[197:5624] Starting text2wfreq_impl
    2014-09-22 10:11:40.271 OpenEarsSampleApp[197:5624] Done with text2wfreq_impl
    2014-09-22 10:11:40.271 OpenEarsSampleApp[197:5624] Able to open /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/SecondOpenEarsDynamicLanguageModel_pipe.txt for reading.
    2014-09-22 10:11:40.273 OpenEarsSampleApp[197:5624] Able to open /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/SecondOpenEarsDynamicLanguageModel.vocab for reading.
    2014-09-22 10:11:40.273 OpenEarsSampleApp[197:5624] Starting wfreq2vocab
    2014-09-22 10:11:40.276 OpenEarsSampleApp[197:5624] Done with wfreq2vocab
    2014-09-22 10:11:40.277 OpenEarsSampleApp[197:5624] Starting text2idngram
    2014-09-22 10:11:40.293 OpenEarsSampleApp[197:5624] Done with text2idngram
    2014-09-22 10:11:40.311 OpenEarsSampleApp[197:5624] Starting idngram2lm

    2014-09-22 10:11:40.323 OpenEarsSampleApp[197:5624] Done with idngram2lm
    2014-09-22 10:11:40.323 OpenEarsSampleApp[197:5624] Starting sphinx_lm_convert
    2014-09-22 10:11:40.328 OpenEarsSampleApp[197:5624] Finishing sphinx_lm_convert
    2014-09-22 10:11:40.330 OpenEarsSampleApp[197:5624] Done creating language model with CMUCLMTK in 0.076958 seconds.
    2014-09-22 10:11:40.373 OpenEarsSampleApp[197:5624] The word QUIDNUNC was not found in the dictionary /private/var/mobile/Containers/Bundle/Application/16FB9F4F-0683-499B-A759-FB6928A35CFC/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/LanguageModelGeneratorLookupList.text/LanguageModelGeneratorLookupList.text.
    2014-09-22 10:11:40.373 OpenEarsSampleApp[197:5624] Now using the fallback method to look up the word QUIDNUNC
    2014-09-22 10:11:40.373 OpenEarsSampleApp[197:5624] If this is happening more frequently than you would expect, the most likely cause for it is since you are using the English phonetic lookup dictionary is that your words are not in English or aren’t dictionary words, or that you are submitting the words in lowercase when they need to be entirely written in uppercase.
    2014-09-22 10:11:40.377 OpenEarsSampleApp[197:5624] Using convertGraphemes for the word or phrase QUIDNUNC which doesn’t appear in the dictionary
    2014-09-22 10:11:40.409 OpenEarsSampleApp[197:5624] I’m done running performDictionaryLookup and it took 0.072901 seconds
    2014-09-22 10:11:40.420 OpenEarsSampleApp[197:5624] I’m done running dynamic language model generation and it took 0.172638 seconds
    2014-09-22 10:11:40.421 OpenEarsSampleApp[197:5624] Dynamic language generator completed successfully, you can find your new files SecondOpenEarsDynamicLanguageModel.DMP
    and
    SecondOpenEarsDynamicLanguageModel.dic
    at the paths
    /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/SecondOpenEarsDynamicLanguageModel.DMP
    and
    /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/SecondOpenEarsDynamicLanguageModel.dic
    2014-09-22 10:11:40.421 OpenEarsSampleApp[197:5624]

    Welcome to the OpenEars sample project. This project understands the words:
    BACKWARD,
    CHANGE,
    FORWARD,
    GO,
    LEFT,
    MODEL,
    RIGHT,
    TURN,
    and if you say “CHANGE MODEL” it will switch to its dynamically-generated model which understands the words:
    CHANGE,
    MODEL,
    MONDAY,
    TUESDAY,
    WEDNESDAY,
    THURSDAY,
    FRIDAY,
    SATURDAY,

    SUNDAY,
    QUIDNUNC
    2014-09-22 10:11:40.430 OpenEarsSampleApp[197:5624] User gave mic permission for this app.
    2014-09-22 10:11:40.430 OpenEarsSampleApp[197:5624] Leaving sample rate at the default of 16000.
    2014-09-22 10:11:40.431 OpenEarsSampleApp[197:5624] The audio session has never been initialized so we will do that now.
    2014-09-22 10:11:40.431 OpenEarsSampleApp[197:5624] Checking and resetting all audio session settings.
    2014-09-22 10:11:40.432 OpenEarsSampleApp[197:5624] audioCategory is incorrect, we will change it.
    2014-09-22 10:11:40.432 OpenEarsSampleApp[197:5624] audioCategory is now on the correct setting of kAudioSessionCategory_PlayAndRecord.
    2014-09-22 10:11:40.432 OpenEarsSampleApp[197:5624] bluetoothInput is incorrect, we will change it.
    2014-09-22 10:11:40.433 OpenEarsSampleApp[197:5624] bluetooth input is now on the correct setting of 1.
    2014-09-22 10:11:40.434 OpenEarsSampleApp[197:5624] Output Device: HeadsetBT.
    2014-09-22 10:11:40.435 OpenEarsSampleApp[197:5624] preferredBufferSize is incorrect, we will change it.
    2014-09-22 10:11:40.435 OpenEarsSampleApp[197:5624] PreferredBufferSize is now on the correct setting of 0.128000.
    2014-09-22 10:11:40.435 OpenEarsSampleApp[197:5624] preferredSampleRateCheck is incorrect, we will change it.
    2014-09-22 10:11:40.436 OpenEarsSampleApp[197:5624] preferred hardware sample rate is now on the correct setting of 16000.000000.
    2014-09-22 10:11:40.454 OpenEarsSampleApp[197:5624] AudioSessionManager startAudioSession has reached the end of the initialization.
    2014-09-22 10:11:40.454 OpenEarsSampleApp[197:5624] Exiting startAudioSession.
    2014-09-22 10:11:40.458 OpenEarsSampleApp[197:5683] setSecondsOfSilence value of 0.000000 was too large or too small or was NULL, using default of 0.700000.
    2014-09-22 10:11:40.459 OpenEarsSampleApp[197:5683] Project has these words or phrases in its dictionary:
    BACKWARD
    CHANGE
    FORWARD
    GO
    LEFT
    MODEL
    RIGHT
    TURN
    2014-09-22 10:11:40.459 OpenEarsSampleApp[197:5683] Recognition loop has started
    INFO: file_omitted(0): Parsing command line:
    \
    -lm /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/FirstOpenEarsDynamicLanguageModel.DMP \
    -beam 1e-66 \
    -bestpath yes \
    -dict /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/FirstOpenEarsDynamicLanguageModel.dic \
    -hmm /private/var/mobile/Containers/Bundle/Application/16FB9F4F-0683-499B-A759-FB6928A35CFC/OpenEarsSampleApp.app/AcousticModelEnglish.bundle \
    -lw 6.500000 \
    -samprate 16000

    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -argfile
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-66
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -bghist no no
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -debug 0
    -dict /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/FirstOpenEarsDynamicLanguageModel.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm /private/var/mobile/Containers/Bundle/Application/16FB9F4F-0683-499B-A759-FB6928A35CFC/OpenEarsSampleApp.app/AcousticModelEnglish.bundle
    -input_endian little little
    -jsgf
    -kdmaxbbi -1 -1
    -kdmaxdepth 0 0
    -kdtree
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lextreedump 0 0
    -lifter 0 0
    -lm /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/FirstOpenEarsDynamicLanguageModel.DMP
    -lmctl
    -lmname default default
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf -1 -1
    -maxnewoov 20 20
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-5 1.000000e-05
    -pl_window 0 0
    -rawlogdir
    -remove_dc no no
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -usewdphones no no
    -uw 1.0 1.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02

    INFO: file_omitted(0): Parsing command line:
    \
    -nfilt 20 \
    -lowerf 1 \
    -upperf 4000 \
    -wlen 0.025 \
    -transform dct \
    -round_filters no \
    -remove_dc yes \
    -svspec 0-12/13-25/26-38 \
    -feat 1s_c_d_dd \
    -agc none \
    -cmn current \
    -cmninit 47 \
    -varnorm no

    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 47
    -dither no no
    -doublebw no no
    -feat 1s_c_d_dd 1s_c_d_dd
    -frate 100 100
    -input_endian little little
    -lda
    -ldadim 0 0
    -lifter 0 0
    -logspec no no
    -lowerf 133.33334 1.000000e+00
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 20
    -remove_dc no yes
    -round_filters yes no
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -smoothspec no no
    -svspec 0-12/13-25/26-38
    -transform legacy dct
    -unit_area yes yes
    -upperf 6855.4976 4.000000e+03
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wlen 0.025625 2.500000e-02

    INFO: file_omitted(0): Parsed model-specific feature parameters from /private/var/mobile/Containers/Bundle/Application/16FB9F4F-0683-499B-A759-FB6928A35CFC/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/feat.params
    INFO: file_omitted(0): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’current’, VARNORM=’no’, AGC=’none’
    INFO: file_omitted(0): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: file_omitted(0): Using subvector specification 0-12/13-25/26-38
    INFO: file_omitted(0): Reading model definition: /private/var/mobile/Containers/Bundle/Application/16FB9F4F-0683-499B-A759-FB6928A35CFC/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/mdef
    INFO: file_omitted(0): Found byte-order mark BMDF, assuming this is a binary mdef file
    INFO: file_omitted(0): Reading binary model definition: /private/var/mobile/Containers/Bundle/Application/16FB9F4F-0683-499B-A759-FB6928A35CFC/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/mdef
    2014-09-22 10:11:40.488 OpenEarsSampleApp[197:5678] Audio route has changed for the following reason:
    2014-09-22 10:11:40.495 OpenEarsSampleApp[197:5678] There has been a change of category
    2014-09-22 10:11:40.495 OpenEarsSampleApp[197:5678] The previous audio route was HeadphonesBT
    2014-09-22 10:11:40.496 OpenEarsSampleApp[197:5678] This is not a case in which OpenEars performs a route change voluntarily. At the close of this function, the audio route is HeadsetBT
    INFO: file_omitted(0): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
    INFO: file_omitted(0): Reading HMM transition probability matrices: /private/var/mobile/Containers/Bundle/Application/16FB9F4F-0683-499B-A759-FB6928A35CFC/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/transition_matrices
    INFO: file_omitted(0): Attempting to use SCHMM computation module
    INFO: file_omitted(0): Reading mixture gaussian parameter: /private/var/mobile/Containers/Bundle/Application/16FB9F4F-0683-499B-A759-FB6928A35CFC/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/means
    INFO: file_omitted(0): 1 codebook, 3 feature, size:
    INFO: file_omitted(0): 256×13
    INFO: file_omitted(0): 256×13
    INFO: file_omitted(0): 256×13
    INFO: file_omitted(0): Reading mixture gaussian parameter: /private/var/mobile/Containers/Bundle/Application/16FB9F4F-0683-499B-A759-FB6928A35CFC/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/variances
    INFO: file_omitted(0): 1 codebook, 3 feature, size:
    INFO: file_omitted(0): 256×13
    INFO: file_omitted(0): 256×13
    INFO: file_omitted(0): 256×13
    INFO: file_omitted(0): 0 variance values floored
    INFO: file_omitted(0): Loading senones from dump file /private/var/mobile/Containers/Bundle/Application/16FB9F4F-0683-499B-A759-FB6928A35CFC/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/sendump
    INFO: file_omitted(0): BEGIN FILE FORMAT DESCRIPTION
    INFO: file_omitted(0): Using memory-mapped I/O for senones
    INFO: file_omitted(0): Maximum top-N: 4 Top-N beams: 0 0 0
    INFO: file_omitted(0): Allocating 4115 * 20 bytes (80 KiB) for word entries
    INFO: file_omitted(0): Reading main dictionary: /var/mobile/Containers/Data/Application/CC4883AD-BF78-460E-A31A-91D93BECC7BD/Library/Caches/FirstOpenEarsDynamicLanguageModel.dic
    INFO: file_omitted(0): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: file_omitted(0): 8 words read
    INFO: file_omitted(0): Reading filler dictionary: /private/var/mobile/Containers/Bundle/Application/16FB9F4F-0683-499B-A759-FB6928A35CFC/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/noisedict
    INFO: file_omitted(0): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: file_omitted(0): 11 words read
    INFO: file_omitted(0): Building PID tables for dictionary
    INFO: file_omitted(0): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
    2014-09-22 10:11:40.537 OpenEarsSampleApp[197:5624] Pocketsphinx is starting up.
    INFO: file_omitted(0): Allocated 30200 bytes (29 KiB) for word-final triphones
    INFO: file_omitted(0): Allocated 30200 bytes (29 KiB) for single-phone word triphones
    INFO: file_omitted(0): No \data\ mark in LM file
    INFO: file_omitted(0): Will use memory-mapped I/O for LM file
    INFO: file_omitted(0): ngrams 1=10, 2=16, 3=8
    INFO: file_omitted(0): 10 = LM.unigrams(+trailer) read
    INFO: file_omitted(0): 16 = LM.bigrams(+trailer) read
    INFO: file_omitted(0): 8 = LM.trigrams read
    INFO: file_omitted(0): 3 = LM.prob2 entries read
    INFO: file_omitted(0): 3 = LM.bo_wt2 entries read
    INFO: file_omitted(0): 2 = LM.prob3 entries read
    INFO: file_omitted(0): 1 = LM.tseg_base entries read
    INFO: file_omitted(0): 10 = ascii word strings read
    INFO: file_omitted(0): 8 unique initial diphones
    INFO: file_omitted(0): 0 root, 0 non-root channels, 12 single-phone words
    INFO: file_omitted(0): Creating search tree
    INFO: file_omitted(0): before: 0 root, 0 non-root channels, 12 single-phone words
    INFO: file_omitted(0): after: max nonroot chan increased to 145
    INFO: file_omitted(0): after: 8 root, 17 non-root channels, 11 single-phone words
    INFO: file_omitted(0): fwdflat: min_ef_width = 4, max_sf_win = 25
    2014-09-22 10:11:40.579 OpenEarsSampleApp[197:5683] Starting openAudioDevice on the device.
    2014-09-22 10:11:40.579 OpenEarsSampleApp[197:5683] Audio unit wrapper successfully created.
    2014-09-22 10:11:40.591 OpenEarsSampleApp[197:5683] Set audio route to HeadsetBT
    2014-09-22 10:11:40.593 OpenEarsSampleApp[197:5683] There is no CMN plist so we are using the fresh CMN value 47.000000.
    2014-09-22 10:11:40.594 OpenEarsSampleApp[197:5683] Checking and resetting all audio session settings.
    2014-09-22 10:11:40.595 OpenEarsSampleApp[197:5683] audioCategory is correct, we will leave it as it is.
    2014-09-22 10:11:40.596 OpenEarsSampleApp[197:5683] bluetoothInput is correct, we will leave it as it is.
    2014-09-22 10:11:40.596 OpenEarsSampleApp[197:5683] Output Device: HeadsetBT.
    2014-09-22 10:11:40.597 OpenEarsSampleApp[197:5683] preferredBufferSize is incorrect, we will change it.
    2014-09-22 10:11:40.599 OpenEarsSampleApp[197:5683] PreferredBufferSize is now on the correct setting of 0.128000.
    2014-09-22 10:11:40.600 OpenEarsSampleApp[197:5683] preferredSampleRateCheck is correct, we will leave it as it is.
    2014-09-22 10:11:40.600 OpenEarsSampleApp[197:5683] Setting the variables for the device and starting it.
    2014-09-22 10:11:40.601 OpenEarsSampleApp[197:5683] Looping through ringbuffer sections and pre-allocating them.
    2014-09-22 10:11:42.219 OpenEarsSampleApp[197:5683] Started audio output unit.
    2014-09-22 10:11:42.220 OpenEarsSampleApp[197:5683] Calibration has started
    2014-09-22 10:11:42.220 OpenEarsSampleApp[197:5624] Pocketsphinx calibration has started.
    2014-09-22 10:11:44.423 OpenEarsSampleApp[197:5683] cont_ad_calib failed, stopping.
    2014-09-22 10:11:44.425 OpenEarsSampleApp[197:5624] Setting up the continuous recognition loop has failed for some reason, please turn on [OpenEarsLogging startOpenEarsLogging] in OpenEarsConfig.h to learn more.

    wfilleman
    Participant

    Hi Halle,

    I rebuilt my app and ran a few test cases (including an overnight test) and I can confirm that 1.66 is looking great!

    Just for documentation completeness, I see the CPU bounce between 10-20 percent when in the presence of loud repetitive background noise. As soon as the background noise stops, I see the CPU drop back to 0%.

    Thank you for staying on top of this issue and, more importantly, finding a solution!

    Wes

    wfilleman
    Participant

    Hi Halle,

    Ok, I’ve sent over the demo with a WAV file that when played into the sample app will show you the 100% CPU issue.

    FYI: due to the nature of getting the 100% CPU issue to show itself, play the embedded 30 second WAV file through a set of speakers so the sample app can hear it when running on an iOS device. After a couple of cycles between recognizing speech and listening you should see the CPU ramp up to 100% CPU. In my tests it only takes about 10 seconds to show itself.

    Wes

    wfilleman
    Participant

    while the one that resolves in 10-15 seconds could be a VAD correction working as designed, since that’s about the timeframe in which that happens.

    Ah, ok. I will pay attention to that then.

    Is the CPU expected to hit 100% continuously during repetitive background noise or is this not what is supposed to happen?

    Still working on getting a proper demo setup to send over to you.

    wfilleman
    Participant

    Thanks Halle, I will do that.

    Ok, I did as you suggested with the #define and there is a different behavior now.

    I can still reliably get the CPU to peg at 100% in my background noise reproduction, but the initial impression I’m seeing is soon after the background repetitive noise stops (maybe 10 seconds later) the CPU does drop back to 0%. Before it was pretty easy to get it to sit at 100% indefinitely.

    To further clarify: With the #define uncommented, the CPU will sit at 100% while the repetitive background noise is in effect. It will stay at 100% for sometimes up to 10-15 seconds after the background noise has stopped.

    So, not sure if this is fixed in your mind, but it does appear to be slightly better in a sense.

    Ok, next up is to get you a test build that can reproduce it…

    Wes

    wfilleman
    Participant

    Hi Halle,

    I don’t think I tried that. I’ve always done the dynamic generation of language files and that’s what I stuck with in the sample app to show the issue.

    I can try adding my words to the built in set and retry.

    Wes

    wfilleman
    Participant

    Hi Halle,

    Yes, changing the model was necessary to load in my set of words. With my set of words and the background noise it would consistently peg the CPU to 100% for extended periods of time.

    I’ll take a look at the #define today and give it a try and let you know what happens, thanks.

    Wes

    in reply to: [Resolved] Seeing an issue with long-term voice recognition #1020394
    wfilleman
    Participant

    Hi Halle,

    A little more background: a delay due to suddenly-increasing background noise is expected behavior, because that means that the voice activity detection doesn’t have a way of distinguishing the speech/silence transition anymore, since the calibration values became irrelevant inside of a single utterance. Under these conditions, it should notice that this happened and sort itself out in about 14 seconds (this can be made a bit shorter but there are other tradeoffs to doing so, so if it is an uncommon occurrence this timeframe is probably about right).

    Ok, thanks. I can’t say if I’ve seen this or not. Possibly, and didn’t know why it wasn’t responding to my commands, but then sorted itself out. I’ll pay more attention here.

    Sometimes completely normal searches can take 1-2 seconds and use 99% CPU, so just seeing a strenuous search isn’t a bug on its own.

    Agreed. 1-2 seconds is fine.

    Ok, got good news for you. I’ve got it 100% reproducible in the sample app and it’s exposing itself with Rejecto and the word set I’m using.

    Test Cases:
    – I tried just the OpenEars beta and the stock words. Can’t get it to fail with my background noise.
    – I loaded up my set of words in my app in the sample app (“CHANGE MODEL”). Can’t get it to fail with my background noise.
    – Loaded the Rejecto demo and used the stock words. Can’t get it to fail with my background noise.
    – Loaded the Rejecto demo and switched over to my set of words from my app (“CHANGE MODEL”). I can easily get the CPU to peg 100% for 10-20 seconds. Sometimes it’ll go for a minute or never exit until the Rejecto demo times out.

    So, there’s a combination between Rejecto and my set of words where my background noise causes the CPU peg to occur.

    The background noise I’m doing is my spoon in my coffee mug mixing up my coffee + sugar in the morning :) Total fluke that I happened to spot the correlation.

    I’ve got the sample project with my test wav files zipped up and I’ll be sending you the link here soon.

    One thing I had a hard time with, even though I could record the wave files, playing them back through the path directive didn’t show the bug because I had to “CHANGE MODEL” over to my set of words to expose the problem. Can’t do that with the path directive. Hopefully this is enough info that you can use to debug.

    in reply to: [Resolved] Seeing an issue with long-term voice recognition #1020390
    wfilleman
    Participant

    OK, so my understanding is that the beta represents an improvement for you because it means you aren’t permanently losing the reactivity of the UI, or at least you are not seeing any new manifestations of that behavior, but it is also (obviously) not optimal yet because of the remaining issue with the long searches, is that correct?

    Yes, so far. I’ve been running for 3+ hours and OpenEars is still going strong. I couldn’t get this far before, so, yes, at least in my single test case the beta appears to have addressed what I saw earlier in 1.65 as noted at the top of this post.

    Ah, great! I didn’t know about SaveThatWave. I was thinking about how I’d accomplish that. Ok, let me experiment around with it and see if I can capture the session that pegs the CPU to 100%.

    What’s the best way to send you the sample project with the wav file?

    Wes

    in reply to: [Resolved] Seeing an issue with long-term voice recognition #1020388
    wfilleman
    Participant

    Thanks Halle,

    I can see about capturing the background noise that’s causing the high CPU if you are interested. I seem to have a way of reproducing it.

    I agree, this is a different behavior than what I saw on 1.65. I’m not sure it was “stuck” in ngram_search_lattice(), but it definitely didn’t proceed to ngram_search_bestpath().

    XCODE could have been misleading on the CPU usage, but the iPad wasn’t hot like it had been running at 100% for hours on end.

    So far, so good, minus the high CPU on occasion. Let me know if you’d like me to capture the reproducible noise that causes the CPU to peg.

    Wes

    in reply to: [Resolved] Seeing an issue with long-term voice recognition #1020386
    wfilleman
    Participant

    Hi Halle,

    Early report: Been running for about an hour and observing sound in the room and how OpenEars is handling it.

    I’m seeing cases where long sustained background noise does cause OpenEars to take a long time to process where the CPU shoots to 100%. I’ve seen take as long as 60-90 seconds before it eventually returns to listening and all is normal again (0% CPU and listening normally).

    This is slightly different than what I was seeing on 1.65 where OpenEars wasn’t returning in my test case but the CPU was 0% (never saw it peak at 100%)

    Is this consistent with what you see on your end?
    Is this considered “normal”?
    Obviously having the CPU shoot to 100% for long periods of time isn’t desirable, but wanted your comments on this.

    Leaving my test case running. Will check back in later.

    Wes

    in reply to: [Resolved] Seeing an issue with long-term voice recognition #1020384
    wfilleman
    Participant

    Hehe, I’ll expect the same from OpenEars as my customers do from me. Total 100% reliable operation. :)

    Actually, from the debug logs, it looks quite clean and is working well, but yes, there is a weird bug here at play. Sounds like you might be on to it.

    To answer your points, I’m using Rejecto, but only in the default mode. Meaning, I’m not changing the weighting. I left it as the default (presumably 1.0).

    I have no data points without Rejecto, as Rejecto was critical in my application.

    It is happening because the search space on these searches is too big for some reason (my early impression is that the reason is a very long utterance due to some persistent noises being taken for an extended speech utterance

    This *sounds* right to me. I could see this happening over long periods of listening where OpenEars is exposed to the environment sound for too long and extended speech is triggered.

    I’ll try out the beta release and let it run for hours and see if it behaves any differently and let you know later today.

    Thanks!
    Wes

    in reply to: With OpenEars on and listening, Airplay is disabled #1020340
    wfilleman
    Participant

    Thanks Halle,

    I did a little experimenting, and AVAudioPlayer works on the local iPad device and when Airplaying to another source, however, AVAudioPlayer has a bit of a delay (maybe 0.2 seconds) in playing the audio even after calling prepareToPlay. Seems like that’s the nature of this mechanism.

    In my case that wasn’t acceptable to convert over 100%, so I’ve gone back to using AudioServicesCreateSystemSoundID which give a nice immediate sound effect when only on the iOS device (no airplay).

    The workaround for anyone else reading is I look at the number of screens connected to the iOS device by calling [UIScreen screens]. If more than one (Airplay) then I configure PocketSphinx as the “Default” audioMode and use AVAudioPlayer to play my sounds. If there is only one screen then I configure PocketSphinx as the “VoiceChat” audioMode and use the AudioServicesCreateSystemSoundID method to create and play sounds.

    Not a perfect solution, but I figure that a little delay on the rarely used AirPlay mirroring is acceptable. 99% of the time the users would never run into this case and they get the optimal sound performance.

    Wes

    in reply to: With OpenEars on and listening, Airplay is disabled #1020309
    wfilleman
    Participant

    Hi Halle,

    Ok, found a partial workaround from: https://developer.apple.com/library/ios/qa/qa1803/_index.html

    Looks like iOS doesn’t let us do Airplay if we’re using the “VoiceChat” OpenEars option. Any of the other 4 options will allow Airplay, but then I lose the ability to play sounds from the app using the AudioServicesCreateSystemSoundID method. Specifically, iOS disables Airplay when using the AudioSession property: AVAudioSessionModeVoiceChat

    For now this will work for my demo video. The sound from the app isn’t necessary.

    If you happen to know of a combination of settings the .audioMode property that allows for speech recognition + sound playback + Airplay, I’m all ears, so to speak. My guess is that I’d need to change how I’m playing sounds and move away from the AudioServicesCreateSystemSoundID method.

    Thanks Halle,
    Wes

    wfilleman
    Participant

    Sounds reasonable. I’m happy to have stumbled into it. :)

    wfilleman
    Participant

    UPDATE:

    For clarity, I did have to set the .audioMode flag of the PocketsphinxController object to @”VoiceChat” in order to get iOS to play and record from the mic.

    Wes

    wfilleman
    Participant

    Hi Halle,

    Perfect, thanks. I can confirm that using AVAudioPlayer works as well as another method I found where you make a call to:

    OSStatus error = AudioServicesCreateSystemSoundID((CFURLRef)aFileURL, &aSoundID);

    with your sound URL to create a system sound ID that can be played back later using:

    AudioServicesPlaySystemSound(_soundID);

    Just for completeness, while both worked, I found that the AudioServices method worked the best in my scenario.

    Thanks Halle!
    Wes

    in reply to: arm64 Framework Slice? #1019179
    wfilleman
    Participant

    Great work Halle!

    I’m checking it out now and at first glance it appears to be more snappy and is doing a better job at recognizing speech.

    I’m now moving forward on integrating this with my app. Again, great job. I’m sure the arm64 conversion wasn’t too much fun.

    Wes

    in reply to: arm64 Framework Slice? #1018669
    wfilleman
    Participant

    No argument here. I’m sure every project will be different. Here were my stats:

    Main app source:
    400+ source files
    ~170k LOC

    COTS library to support real-time video streaming decoding (I have to compile this on my own):
    Thousands of source files
    ??? LOC. Easily 400k+ LOC

    I didn’t find any issues with the wider bit definitions for integers, etc. In fact I could have just done a recompile and left the hundreds of warnings in place and the app worked perfectly. The majority of the time spent was cleaning up the warnings (mostly NSInteger casting).

    Now, if you are doing some intense bit shifting work where you need to detect the end of the 32-bit register, then I could see where moving to 64 might cause you some headaches.

    Hopefully it won’t be too bad for you; I’m sure every project will have a different experience.

    Good luck! (and I’m eagerly anticipating arm64 support :)
    Wes

    in reply to: arm64 Framework Slice? #1018667
    wfilleman
    Participant

    Hi Halle,

    Agreed. arm7/7s/32-bit still works on the iPhone 5S. However, taking the time to convert to arm64 really impressed me with how easy it was to do (maybe 2 days tops) and the performance gains I got from running native 64-bit code on the new iPhone.

    Hopefully the OpenEars code base won’t give you too much trouble to compile as arm64.

    Wes

    in reply to: arm64 Framework Slice? #1018665
    wfilleman
    Participant

    Hi Halle,

    Thanks for the response and congrats on your new iPhone 5S!

    Yes, agreed on all points. I could remove arm64 from the valid architectures list and release, however, here’s my reasons for supporting the arm64 slice:

    – All current and future high-end iPhones and iPads (maybe even the mini at some point) will be using the new 64-bit chip.
    – 64-bit uses the the new, redesigned 64-bit iOS frameworks. Object creation is about half the time and there are many fixes (read fixed memory leaks) here still affecting the 32-bit frameworks.
    – If your app supports 64-bit, iOS doesn’t have to load in the 32-bit framework. This give more memory availability to your app.
    – My app deals with heavy real-time video processing from multiple sources. The performance boost I’m seeing with arm64 is very significant.
    – My user base is already at > 75% iOS7. iOS 5 is less than 2%. My next app update will only support iOS 7. Older users can always download the latest available for their device. But any new features (like voice) will be an iOS 7 only feature for my app.

    I do use a host of other libraries where most have converted to offering an arm64 slice (however, I am still waiting on one other library to make the conversion). I’ve seen these other libraries do it a multitude of ways. Some have found a way to support back to 4.3 in one library.a file. Other’s have opted to release 2 frameworks, one that targets iOS 5 and below and another that targets iOS 6 and up (including arm64).

    With arm64, there are some great reasons to support it, but I do sympathize and agree that supporting a new slice with the backwards compatibility restrictions can be difficult.

    Thanks for all that you do. I’ve been playing with the demos (including Rejecto) and I’ve VERY impressed with what this on-device speech recognizer can do and I’m really looking forward to offering this as an upsell feature to my apps.

    Thanks!
    Wes

Viewing 48 posts - 1 through 48 (of 48 total)