HomeForumsOpenEarsAfter suspend, speech in buffer from before suspend call is recognized

This topic has 2 voices, contains 17 replies, and was last updated by  Halle 299 days ago.

Viewing 18 posts - 1 through 18 (of 18 total)
Author Posts
Author Posts
July 23, 2011 at 2:31 pm #7354

jimmyno

Hello, let’s see if I can explain my problem:)
In my app I have an animation that starts with a button. When the animation part of recall “suspendRecognition”. But if I was speaking shortly before, the data remains in memory and the animation lagg. Is there a way to throw that pocketshpinix caught without parsing it, before calling suspendRecognition?
Thanks

July 23, 2011 at 2:56 pm #7355

Halle

But if I was speaking shortly before

I’m not understanding why you can’t call suspendRecognition early enough so it is already off at the time when you are speaking shortly before — can you clarify that part?

July 23, 2011 at 3:08 pm #7356

jimmyno

because the user may press the button that starts the animation, and suspend the recognition while speaking. If this happens, the data listened remain dormant until the next resumerecognition, but this involves a lag of animation

July 23, 2011 at 3:43 pm #7357

Halle

OK, so let me read back the question and see if I am understanding it 100%:

You can’t run recognition during the animation playback because it causes a lag. So, when the user presses the “start animation” button, you suspend recognition. But speech that was spoken before the suspend call is in the buffer when you resume (which is the correct behavior for OpenEars as currently designed) and you want to throw it out before resuming recognition.

Is this right, or is part of your question about why you get a lag?

July 23, 2011 at 4:00 pm #7358

jimmyno

is absolutely right :)
i want to throw it out before resuming recognition.

July 23, 2011 at 4:15 pm #7359

Halle

There isn’t a built-in way to do this. You could try this hack but I have no idea if it will work:

1. Open ContinuousModel.mm and add the following two methods to the header and implementation:

Header:

- (void) stopDevice;
- (void) startDevice;

Implementation:

- (void) stopDevice {
    stopRecording(audioDevice);
}

- (void) startDevice {
    startRecording(audioDevice);
}

Having done that, you may be able to reset the buffer inline by running:

[self.pocketsphinxController.continuousModel stopDevice];
[self.pocketsphinxController.continuousModel startDevice];

while listening is suspended, although I’m not sure this will work to clear the buffer without any other side-effects. If there are side-effects, I’m afraid I won’t be able to spare more time right now to talk through them, but hopefully this will work.

July 23, 2011 at 4:28 pm #7360

jimmyno

thank you so much, just try and let you know :)

July 23, 2011 at 4:47 pm #7361

jimmyno

I’m sorry I make you another question: is it me, or the latest version is slightly slower in the recognition?

July 23, 2011 at 4:57 pm #7362

Halle

Latest meaning .912?

July 23, 2011 at 4:59 pm #7363

jimmyno

yes .911,
I can not understand why if I have done something wrong or is it a bit ‘slower. Why the two versions compare with my app have slightly different response times. nothing alarming

July 23, 2011 at 5:00 pm #7364

Halle

No, there are no changes that could affect recognition time.

July 23, 2011 at 5:05 pm #7365

Halle

Shouldn’t you be using .912 since you are using AVPlayer?

July 23, 2011 at 5:23 pm #7366

jimmyno

Perfect, thanks. then I’ll look for the mistake elsewhere.
The last question of the day and then I stop bothering you:) Is it possible to make the system to process only the very last part of a speech? My problem is that if the speech is very long, the response time becomes enormous. Is there a method to analyze only the very last part of a speech and throw the rest away? Like if the speech time is longer than a given time?
Is it someway possible?

July 23, 2011 at 5:32 pm #7367

Halle

Nope, no way to do that since “last part” is totally subjective and we don’t have any parts at all until the speech has been analyzed.

That kind of task management is something that you need to do with logical controls in your app design and user cues in your app UI. For instance, if you are transcribing a lot of speech and you don’t want to end up with a very long utterance, give them a visual hint that they should tell _one_ sentence, then convert it and give them feedback, then give them feedback to tell the next sentence, etc. Or, if the recognition is very slow because the language model is huge, use logical switching between smaller language models based on where the user is in the app logical flow, so recognition occurs with the smallest-possible language model.

July 23, 2011 at 5:39 pm #7368

jimmyno

I had imagined it. I thank you for the help as always :)

July 24, 2011 at 12:22 am #7369

jimmyno

hi, sorry to bother again on this matter but this thing is driving me crazy.
I have two devices running the same app. One of them uses the version I was using since today, 902. Another one has been upgraded to 912 since I needed some features (like the language model switch). Both devices have the same exact language model.
The app features continuous voice recognition and responds as soon as it hears silence after some speech of ANY lenght (of course). Now, with version .902, recognition and response time look to be absolutely unaffected by speech lenght: it is always blazing fast. On the other hand, version .912 does quick enough with very short speechs (like 3 or 4 seconds) but response time increases proportionally to how much the speech lasted. And it is often unsuitable for the work.
It’s been hours and I still don’t understand why.

I know my english is not very good, I hope I explained myself.

July 24, 2011 at 8:11 am #7372

Halle

Don’t know why you’re seeing that (you’ve mentioned a lot about animation, AVPlayer, MPMusicPlayerController, etc in your app being used alongside speech recognition, so there might just be a conflict between the amount of other things you are doing with audio session or sheer CPU usage and good continuous speech recognition), but use .902 if it matches the needs of your application better.

July 24, 2011 at 8:51 am #7374

Halle

Apologies, I just remembered one thing that changed from .902 to .91 that could slow down recognition of very long utterances.

I had a discussion on the Pocketsphinx forum about the OpenEars Pocketsphinx arguments and they suggested that one of the settings was too much of a trade-off of accuracy given the new faster driver. Changing that setting improved the recognition quality and in my test cases had no effect on speed when offset with the new driver, but I can imagine it showing up in very long utterances.

You can run .912 (please upgrade to the current version) with the same Pocketsphinx run settings as .902 by replacing the .902 PocketsphinxRunConfig.h with the .912 one. You will have less-accurate recognition but it will return faster.

Viewing 18 posts - 1 through 18 (of 18 total)

You must be logged in to reply to this topic.