AVCaptureSession audio combined with speech recognition

Home Forums OpenEars AVCaptureSession audio combined with speech recognition

Viewing 13 posts - 1 through 13 (of 13 total)

  • Author
  • #1022188

    Hi there… I’m evaluating OpenEars with PocketsphinxController for my App (iOS 7.1, iPhone4s)

    I’m recording video and audio (with AVFoundation->AVCaptureSession — AVCam sample code) and in parallel I use pocketsphinxcontroller to recognise voice for starting/stopping the video recording.

    The video/audio recording starts just fine, but after few seconds, while i’m speaking some commands the audio suddenly stops in the video capture and doesn’t come back unless I start the video capture session again.

    The pocketsphinxcontroller recognition works just fine.

    Here are my settings (not using flite or any other audio out playback):
    pocketsphinxController.returnNbest = YES;
    pocketsphinxController.nBestNumber = 5;
    self.pocketsphinxController.audioSessionMixing = YES;
    self.pocketsphinxController.outputAudio = NO;

    tried: self.pocketsphinxController.audioMode = @”VoiceChat”; too but no success.

    This is the debug log. It seems that the audio breaks approximatively around:

    2014-08-13 01:35:53.954 [5950:8707]


    2014-08-13 01:35:53.884 [5950:60b] Validating ROAD
    2014-08-13 01:35:53.892 [5950:60b] Command ROAD Not Found in Keywords
    2014-08-13 01:35:53.888 [5950:8707] audioCategory is correct, we will leave it as it is.
    2014-08-13 01:35:53.894 [5950:60b] Hypothesis 1: ROAD with a score of -29048
    2014-08-13 01:35:53.894 [5950:8707] bluetoothInput is incorrect, we will change it.
    2014-08-13 01:35:53.896 [5950:60b] Hypothesis 2: EVIDENCE with a score of -29137
    2014-08-13 01:35:53.902 [5950:60b] Hypothesis 3: SPEED with a score of -49907
    2014-08-13 01:35:53.904 [5950:60b] Hypothesis 4: with a score of -50079
    2014-08-13 01:35:53.910 [5950:60b] Hypothesis 5: STOP with a score of -50234
    2014-08-13 01:35:53.927 [5950:8707] bluetooth input is now on the correct setting of 1.
    2014-08-13 01:35:53.927 [5950:8707] bluetooth input is now on the correct setting of 1.
    2014-08-13 01:35:53.954 [5950:8707] Output Device: SpeakerAndMicrophone.
    2014-08-13 01:35:53.960 [5950:8707] categoryDefaultToSpeaker is correct, we will leave it as it is.
    2014-08-13 01:35:53.965 [5950:8707] OverrideCategoryMixWithOthers is incorrect, we will change it.
    2014-08-13 01:35:53.970 [5950:8707] OverrideCategoryMixWithOthers is now on the correct setting of 1.
    2014-08-13 01:35:53.978 [5950:8707] preferredBufferSize is incorrect, we will change it.
    2014-08-13 01:35:53.982 [5950:8707] PreferredBufferSize is now on the correct setting of 0.128000.
    2014-08-13 01:35:53.989 [5950:8707] preferredSampleRateCheck is incorrect, we will change it.
    2014-08-13 01:35:54.736 [5950:8707] preferred hardware sample rate is now on the correct setting of 16000.000000.
    2014-08-13 01:35:54.738 [5950:8707] Setting the variables for the device and starting it.
    2014-08-13 01:35:54.740 [5950:8707] Looping through ringbuffer sections and pre-allocating them.
    2014-08-13 01:35:54.853 [5950:8707] Started audio output unit.
    2014-08-13 01:35:54.862 [5950:8707] Listening.

    I suspect that the adjustments that are done and the restart of the audio unit is the reason of breaking the audio capture but what could be the reason of that?


    Halle Winkler


    Yes, this is a known issue due to AV objects having as stringent audio session requirements as OpenEars. You can search for the keywords audio coexistence, or video, in these forums to read much more about it.


    @Halle: Thanks for the feedback. Didn’t found anything that could fix the issue on the fly but while experimenting different values I got it working by doing:

    self.pocketsphinxController.audioMode = @”VideoRecording”;
    self.pocketsphinxController.audioSessionMixing = YES;
    self.pocketsphinxController.outputAudio = NO;

    before doing startListeningWithLanguageModelAtPath

    Now it still happens very seldom, so I restart the session automatically from the pocketSphinxContinuousSetupDidFail function.

    Halle Winkler

    OK, good to know. I am working on some new coexistence code right now, so if you’d like to send me a sample app to test against that shows the unwanted behavior, future versions of OpenEars may handle this without any issues. But the sample app has to be extremely simple, as simple as possible to demonstrate the issue –– everything in a single view controller and only a few methods. If you’d like to send it, send me a note via the contact form.


    Thanks for your feedback. Tested the same code with iOS 8 and got another issue when switching video recording to another file. Apparently, this time the video recording can’t start anymore (or the delegate doesn’t fire correctly).

    Generally OpenEars collides somehow with the AVCaptureSession (or other way around).

    When I’ll get some time I will try to add Voice Commands to the AVCam sample from Apple (https://developer.apple.com/library/ios/samplecode/AVCam/Introduction/Intro.html) and hopefully I will be able to reproduces these conflicts.


    Looked into the debug info and found that after I get this debug message: “Stopping audio unit” the AVCapture doesn’t work stable.

    Apparently, if I speak when stooping/restarting video recording, erratically, “Stopping audio unit” appears and the problems occurs.

    If I turn off the OpenEars completely everything woks fine with the recording so it is definitively a conflict.

    Forgot the say… On iOS8 if I apply the “patch” the video session is even more unstable.

    Do you know what really means “Stopping audio unit.” and what could be the reason of this?

    Here’s excerpt of the debug log:
    2014-09-24 00:07:25.365 Streetcorder[4816:624437] Stopping audio unit.
    2014-09-24 00:07:25.365 Streetcorder[4816:624371] Pocketsphinx has detected a period of silence, concluding an utterance.
    2014-09-24 00:07:25.408 Streetcorder[4816:624437] Audio Output Unit stopped, cleaning up variable states.
    2014-09-24 00:07:25.409 Streetcorder[4816:624437] Processing speech, please wait…

    Halle Winkler

    Yes, this is due to two conflicting audio sessions with different sample rates and bitrates. You are welcome to send me a very simple sample app demonstrating the issue if you would like to see this as a possible future feature.

    Halle Winkler

    This kind of video object coexistence ought to work by default with OpenEars 2.0, but I’m waiting to hear some feedback about it from the developers who have these features in their apps.


    Hi Halle, Thanks for the new release!
    I will upgrade to 2.0 these days and I’ll come back with some feedback about this issue.

    Halle Winkler

    Super, no rush at all but I will be interested in your results. Remember to first remove any workarounds you put in to get this working, so they don’t interfere with the default behavior.

    This feature is a work in progress and probably needs a fair amount of feedback to catch lots of cases, so don’t be discouraged if it doesn’t work ideally for your case yet, just let me know what is happening when it doesn’t do the right thing and if possible give me a replication case so I can look into it.


    Got some time to upgrade to 2.0, removed the hacks and tested the app in the field. The good news is that the video capture session doesn’t crash anymore on iOS8. So I can record fine the video but now the old problem is back. In the 95% of the cases the audio recording doesn’t start, so the video gets recorded w/o audio. Maybe I should put back the hack and try again on iOS8 and see if is better?

    Regarding the example that could reproduce…

    You just need to add OpenEars/Pocketsphinx end enable voice commands before starting/stopping video recording on this example from Apple:

    Halle Winkler

    Super, thank you for giving me a replication case. I’m going to check this out as there is time. Glad to hear there’s a small step forward and we’ll see what more can be done.

    Halle Winkler

    Do you get different results (again, without any hacks or workarounds) depending on whether you start video acquisition first or start speech recognition first?

Viewing 13 posts - 1 through 13 (of 13 total)
  • You must be logged in to reply to this topic.