HomeForumsOpenEarsCan't recognize speech after playing video with AVPlayer

This topic has 4 voices, contains 15 replies, and was last updated by  HackNFly 114 days ago.

Viewing 16 posts - 1 through 16 (of 16 total)
Author Posts
Author Posts
July 5, 2011 at 4:27 pm #7053

Aleksey

If I instantiate an AVPlayer object, OpenEars fails to hear or recognize an speech (on device). If I do not instantiate an AVPlayer object, everything works fine.

Interestingly, on the simulator, it works as expected. It is only a problem on the device (iPhone 4 in this case). There are no instances of audio session in my code.

I’ve tried calling [AudioSessionManager startAudioSession] a second time, thinking this problem was related to a similar UIWebView problem posted in the forum. But that did not change anything.

I suspect AVPlayer is clobbering something but I have no idea what it could be. Any help much appreciated!

Using OpenEars 0.911:

2011-07-05 08:09:11.702 FaceOn[1629:6303] OPENEARSLOGGING: Starting dynamic language model generation
2011-07-05 08:09:11.714 FaceOn[1629:1903] OPENEARSLOGGING: Starting dynamic language model generation
2011-07-05 08:09:11.707 FaceOn[1629:6403] OPENEARSLOGGING: Starting dynamic language model generation
2011-07-05 08:09:11.711 FaceOn[1629:1603] OPENEARSLOGGING: Starting dynamic language model generation
2011-07-05 08:09:11.717 FaceOn[1629:6203] OPENEARSLOGGING: Starting dynamic language model generation
2011-07-05 08:09:11.723 FaceOn[1629:6303] OPENEARSLOGGING: Running MITLM
2011-07-05 08:09:11.724 FaceOn[1629:6403] OPENEARSLOGGING: Running MITLM
2011-07-05 08:09:11.724 FaceOn[1629:1903] OPENEARSLOGGING: Running MITLM
2011-07-05 08:09:11.725 FaceOn[1629:1603] OPENEARSLOGGING: Running MITLM
2011-07-05 08:09:11.726 FaceOn[1629:6203] OPENEARSLOGGING: Running MITLM
2011-07-05 08:09:13.947 FaceOn[1629:1903] OPENEARSLOGGING: I’m done running dynamic language model generation and it took 2.224121 seconds
2011-07-05 08:09:14.043 FaceOn[1629:6303] OPENEARSLOGGING: I’m done running dynamic language model generation and it took 2.320925 seconds
2011-07-05 08:09:14.133 FaceOn[1629:6403] OPENEARSLOGGING: I’m done running dynamic language model generation and it took 2.409022 seconds
2011-07-05 08:09:14.157 FaceOn[1629:1603] OPENEARSLOGGING: I’m done running dynamic language model generation and it took 2.431679 seconds
2011-07-05 08:09:14.236 FaceOn[1629:6203] OPENEARSLOGGING: I’m done running dynamic language model generation and it took 2.509775 seconds
2011-07-05 08:09:20.556 FaceOn[1629:6207] OPENEARSLOGGING: Recognition loop has started
2011-07-05 08:09:20.716 FaceOn[1629:707] Pocketsphinx is starting up.

2011-07-05 08:09:21.185 FaceOn[1629:707] Ready to play video!
2011-07-05 08:09:21.366 FaceOn[1629:6207] OPENEARSLOGGING: Starting openAudioDevice on the device.
2011-07-05 08:09:21.369 FaceOn[1629:6207] OPENEARSLOGGING: Audio unit wrapper successfully created.
2011-07-05 08:09:21.389 FaceOn[1629:6207] OPENEARSLOGGING: Set audio route to SpeakerAndMicrophone
2011-07-05 08:09:21.394 FaceOn[1629:6207] OPENEARSLOGGING: Setting the variables for the device and starting it.
2011-07-05 08:09:21.396 FaceOn[1629:6207] OPENEARSLOGGING: Looping through ringbuffer sections and pre-allocating them.
2011-07-05 08:09:21.883 FaceOn[1629:6207] OPENEARSLOGGING: Started audio output unit.
2011-07-05 08:09:21.888 FaceOn[1629:6207] OPENEARSLOGGING: Calibration has started
2011-07-05 08:09:21.888 FaceOn[1629:707] Pocketsphinx calibration has started.
2011-07-05 08:09:26.097 FaceOn[1629:6207] OPENEARSLOGGING: Calibration has completed
2011-07-05 08:09:26.103 FaceOn[1629:6207] OPENEARSLOGGING: Project has these words in its dictionary:
FRIDAY
MONDAY
QUIDNUNC
SATURDAY
SUNDAY
THURSDAY
TUESDAY
WEDNESDAY
2011-07-05 08:09:26.104 FaceOn[1629:6207] OPENEARSLOGGING: Listening.
2011-07-05 08:09:26.099 FaceOn[1629:707] Pocketsphinx calibration is complete.
2011-07-05 08:09:26.109 FaceOn[1629:707] Pocketsphinx is now listening.
2011-07-05 08:09:26.130 FaceOn[1629:707] Pocketsphinx has suspended recognition.
2011-07-05 08:09:31.763 FaceOn[1629:707] Pocketsphinx has resumed recognition.

July 5, 2011 at 6:08 pm #7056

Halle

Hi Aleksey,

I sent you a note — I’ve heard about this issue before but I’ve never seen any code or logs that would help me replicate, so maybe you can show me some of your AVPlayer code via email or here if you don’t mind. Aside from that, what jumps out at me looking at the logs are the five instances of LanguageModelGenerator. Is that intentional, and is the statement “Ready to play video!” the indicator of where in the process the AVPlayer is being used?

It’s extremely likely that it has something to do with the audio session if it works on the simulator (no audio sessions there, totally different audio driver for Pocketsphinx) and doesn’t work on the device (audio driver for Pocketsphinx that is totally dependent on the audio session). But that is weird if running AudioSessionManager a second time doesn’t help. Is the logging you showed above the logging that you get when you try to run AudioSessionManager startAudioSession a second time after the AVPlayer is finished?

July 6, 2011 at 3:45 am #7061

Aleksey

The five instances of LanguageModelGenerator are just a result a five different threads that are created to create different language models. There is nothing magical about the number five. I’m using GCD.

Ready to Play video indicates that the video is ready to play, not that it has actually started to play.

The log above is not from running AudioSessionManager startAudioSession a second time. It didn’t do anything when I tried that, so I stopped calling startAudioSession a second time.

I emailed you a copy of your own sample project that uses AVPlayer. Please let me know if there is anything else I can do to help you (or myself!) to pinpoint the problem. Thanks!

July 6, 2011 at 11:19 am #7062

Halle

Hiya,

Thanks for sending me the example and for the helpful logging output. Having looked at the project, I misunderstood your question a bit from the title, since the issue is with running AVPlayer and recognition simultaneously, not consecutively.

There are a couple of AV objects that override the audio session no matter what, like a UIWebView with a video in it, AVPlayer, and I believe MPMediaPlayer as well. Presumably they need to plug in to a system resource in a particular way, and my understanding is that they change the audio session type from one which can record to one which can only play back, which always hoses subsequent uses of PocketsphinxController until this is fixed because you can’t open a mic input unless you are using a recording audio session type. You unfortunately can’t run these few kinds of objects simultaneously with speech recognition, because they do not release their audio session settings while they are instantiated. But, it shouldn’t be an issue to fix the audio session settings afterwards by running AudioSessionManager startAudioSession a second time after you have completely put them away and then starting recognition after that.

July 6, 2011 at 3:38 pm #7064

Aleksey

Thanks for getting back so quickly.

A colleague of mine was able to get video playing (with no sound) with an alternative ASR engine, simultaneously. In order to do that, he had to execute the following code:

NSError *setCategoryError = nil;
[[AVAudioSession sharedInstance] setCategory: AVAudioSessionCategoryPlayAndRecord
error: &setCategoryError];

That does not work with OpenEars, and you even recommend against it.

What are your thoughts about digging in and restructuring the audio input portion of OpenEars. Would you recommend it? Too many pitfalls? Any chance of success?

July 6, 2011 at 4:07 pm #7065

Halle

Yup, nothing is different when you change the audio session category to play and record because that is the category already for OpenEars.

The issue is that it is being changed to something else while your video is being played and it is the video player’s settings which are winning out. It is probably working with the other ASR because it isn’t using an audio driver that makes use of the audio session settings, and it probably doesn’t need very low-level audio access. Is using that ASR instead of OpenEars not an option?

I think the hardest thing to change in OpenEars would be its audio driver, or the relationship between its audio driver and the continuous loop, and it is the kind of thing I’d be unable to help you out with. But I also don’t want to be discouraging. Something entirely different you could attempt would be to see if you can just use the AudioQueue fallback driver that is used by the simulator. It isn’t as good as the audio unit driver in a couple of different ways, and it isn’t very tested, but if it works on the simulator despite the AVPlayer, it will probably work on the device with the same driver. You would basically need to look for all of the preprocessor conditionals that check for #if defined TARGET_IPHONE_SIMULATOR && TARGET_IPHONE_SIMULATOR and change it to some other constant that you define, so that it is always trying to use the Simulator driver.

July 7, 2011 at 12:35 am #7066

Aleksey

Thanks for the creative suggestion. Unfortunately there were compiler errors I could not resolve.

I noticed that Apple’s demo app, SpeakHere, works fine on a device while simultaneously playing video and recording audio. Knowing very little about audio drivers, do you think those drivers could be adapted? If you need more info, please let me know and I will email a modified sample project to you directly.

July 7, 2011 at 8:00 am #7067

Halle

Nope, there isn’t anything in the SpeakHere code that is applicable to .911 other than the fact that it uses AudioQueue Services.

If it would be worth the work for you in trying to adapt sample code, I would definitely recommend instead that you stick with trying to get the fallback driver working since it’s the cleanest possible approach for getting a drop-in driver that uses AudioQueue Services (like SpeakHere) but is designed for use with the OpenEars continuous listening loop.

Another option is to use the previous OpenEars version 0.902 (it can be downloaded from the top of the Getting Started page) which uses a simpler AudioQueue driver. That doesn’t necessarily mean it will work simultaneously with an AVPlayer (there’s a difference between what the SpeakHere audio queue has to be able to do, which is basically start an audio queue and stop it on button press, and what the OpenEars driver is responsible for), but it should be a bit less sensitive to the AVPlayer overriding the audio session.

But, I would also just say to use the other ASR you found that is compatible with your requirement since it’s already working. Those are the only three suggestions I have — hope one of them is an option for you.

July 7, 2011 at 2:33 pm #7068

Aleksey

Okay. Thanks for your suggestions and responsiveness. Much appreciated!

July 10, 2011 at 1:49 pm #7083

Halle

Hi Aleksey,

Just wanted to draw your attention to the fact that in the post above this one, there is a question from a developer who seems to be using AVPlayer simultaneously with PocketsphinxController, albeit with iPod song playback instead of silent video playback:

http://www.politepix.com/forums/topic/speech-recognition-during-play-ipod-music

Maybe this is a good opportunity to ask him any questions, since both threads are active.

July 11, 2011 at 7:21 am #7087

Aleksey

Halle,

I read the link and I didn’t find anything new or different that I could try to make AVPlayer work with PocketSphinxController. But I will ask anyways.

I did finally get PocketSphinxController working with AVPlayer on the device with the AudioQueue fallback driver. It seems to work but I need to do more testing as to the overall speech recognition quality.

For anyone else that may need to do the same, add the following code to the following files:

#ifdef AUDIO_DRIVER_FIX
#undef TARGET_IPHONE_SIMULATOR
#define TARGET_IPHONE_SIMULATOR 1
#endif

AudioQueueFallback.h
ContinuousAudioUnit.h
AudioQueueFallback.mm
ContinuousAudioUnit.mm
PocketsphinxController.mm

And then in the Target Info Settings, for “Other C Flags” add the following:

-D AUDIO_DRIVER_FIX

July 17, 2011 at 5:05 pm #7275

Halle

Hi Aleksey,

There is a new OpenEars version .912 out now which will reset the audio session successfully when [audioSessionManager startAudioSession] is run a second time after an AVPlayer or media player has completed playback. In some cases (specifically, when the audio session changes caused an interruption message to be sent to PocketsphinxController so that it exited its loop) it is then necessary to restart the listening loop — experiment to find out whether it is needed.

I have tried this with the sample app you sent me and I saw it resume listening on the device, with the audio unit driver, without needing a restart of the loop. I will email my modified version to you so you can check it out (but make sure to install a new .912 version in the library folder so there’s no altered .911 code in there). Inline resumption of listening as the video ends isn’t intended to be a feature of the library so there won’t be a lot of support for it (the intended and supported feature is to be able to restart the loop after stopping a playback object that has overridden the audio session), but it seems to be working in this case.

July 24, 2011 at 3:29 am #7370

Aleksey

Hi Halle,

I confirmed it does in fact work, but there is a negative side effect. Calling startAudioSession seems to either delay speech recognition, or if I call it earlier (well before resumeRecognition), silence the audio tracks of the video.

It’s definitely better, but still problematic for my application.

July 24, 2011 at 8:15 am #7373

Halle

Hi Aleksey,

Yup, you may need to restart the speech recognition loop after the AVPlayer instead of just resetting the audio session inline. Having them both function simultaneously despite the audio session mismatch would be nice if it worked but it isn’t a feature of the library.

August 3, 2011 at 9:26 am #7433

mill562

Please see my post titled: “Playing video & using speech detection workaround”

I believe you can use the regular audio unit by calling performRouteChange on AudioSessionManager after pocketsphinxDidStartListening is called.

January 25, 2012 at 4:34 am #8544

HackNFly

Thank you very much, that’s exactly what I needed. I noticed that the fallback approach was working in the simulator, so I wanted to try that too, just didn’t know a quick way to do it. Its working on my ipod 3rd gen now, so thank you.

~Santiago

Viewing 16 posts - 1 through 16 (of 16 total)

You must be logged in to reply to this topic.