| Author | Posts |
|---|---|
| Author | Posts |
| July 23, 2011 at 2:31 pm #7354 | |
|
jimmyno |
Hello, let’s see if I can explain my problem:) |
| July 23, 2011 at 2:56 pm #7355 | |
|
Halle |
I’m not understanding why you can’t call suspendRecognition early enough so it is already off at the time when you are speaking shortly before — can you clarify that part? |
| July 23, 2011 at 3:08 pm #7356 | |
|
jimmyno |
because the user may press the button that starts the animation, and suspend the recognition while speaking. If this happens, the data listened remain dormant until the next resumerecognition, but this involves a lag of animation |
| July 23, 2011 at 3:43 pm #7357 | |
|
Halle |
OK, so let me read back the question and see if I am understanding it 100%: You can’t run recognition during the animation playback because it causes a lag. So, when the user presses the “start animation” button, you suspend recognition. But speech that was spoken before the suspend call is in the buffer when you resume (which is the correct behavior for OpenEars as currently designed) and you want to throw it out before resuming recognition. Is this right, or is part of your question about why you get a lag? |
| July 23, 2011 at 4:00 pm #7358 | |
|
jimmyno |
is absolutely right :) |
| July 23, 2011 at 4:15 pm #7359 | |
|
Halle |
There isn’t a built-in way to do this. You could try this hack but I have no idea if it will work: 1. Open ContinuousModel.mm and add the following two methods to the header and implementation: Header: - (void) stopDevice; - (void) startDevice; Implementation:
- (void) stopDevice {
stopRecording(audioDevice);
}
- (void) startDevice {
startRecording(audioDevice);
}
Having done that, you may be able to reset the buffer inline by running: [self.pocketsphinxController.continuousModel stopDevice]; while listening is suspended, although I’m not sure this will work to clear the buffer without any other side-effects. If there are side-effects, I’m afraid I won’t be able to spare more time right now to talk through them, but hopefully this will work. |
| July 23, 2011 at 4:28 pm #7360 | |
|
jimmyno |
thank you so much, just try and let you know :) |
| July 23, 2011 at 4:47 pm #7361 | |
|
jimmyno |
I’m sorry I make you another question: is it me, or the latest version is slightly slower in the recognition? |
| July 23, 2011 at 4:57 pm #7362 | |
|
Halle |
Latest meaning .912? |
| July 23, 2011 at 4:59 pm #7363 | |
|
jimmyno |
yes .911, |
| July 23, 2011 at 5:00 pm #7364 | |
|
Halle |
No, there are no changes that could affect recognition time. |
| July 23, 2011 at 5:05 pm #7365 | |
|
Halle |
Shouldn’t you be using .912 since you are using AVPlayer? |
| July 23, 2011 at 5:23 pm #7366 | |
|
jimmyno |
Perfect, thanks. then I’ll look for the mistake elsewhere. |
| July 23, 2011 at 5:32 pm #7367 | |
|
Halle |
Nope, no way to do that since “last part” is totally subjective and we don’t have any parts at all until the speech has been analyzed. That kind of task management is something that you need to do with logical controls in your app design and user cues in your app UI. For instance, if you are transcribing a lot of speech and you don’t want to end up with a very long utterance, give them a visual hint that they should tell _one_ sentence, then convert it and give them feedback, then give them feedback to tell the next sentence, etc. Or, if the recognition is very slow because the language model is huge, use logical switching between smaller language models based on where the user is in the app logical flow, so recognition occurs with the smallest-possible language model. |
| July 23, 2011 at 5:39 pm #7368 | |
|
jimmyno |
I had imagined it. I thank you for the help as always :) |
| July 24, 2011 at 12:22 am #7369 | |
|
jimmyno |
hi, sorry to bother again on this matter but this thing is driving me crazy. I know my english is not very good, I hope I explained myself. |
| July 24, 2011 at 8:11 am #7372 | |
|
Halle |
Don’t know why you’re seeing that (you’ve mentioned a lot about animation, AVPlayer, MPMusicPlayerController, etc in your app being used alongside speech recognition, so there might just be a conflict between the amount of other things you are doing with audio session or sheer CPU usage and good continuous speech recognition), but use .902 if it matches the needs of your application better. |
| July 24, 2011 at 8:51 am #7374 | |
|
Halle |
Apologies, I just remembered one thing that changed from .902 to .91 that could slow down recognition of very long utterances. I had a discussion on the Pocketsphinx forum about the OpenEars Pocketsphinx arguments and they suggested that one of the settings was too much of a trade-off of accuracy given the new faster driver. Changing that setting improved the recognition quality and in my test cases had no effect on speed when offset with the new driver, but I can imagine it showing up in very long utterances. You can run .912 (please upgrade to the current version) with the same Pocketsphinx run settings as .902 by replacing the .902 PocketsphinxRunConfig.h with the .912 one. You will have less-accurate recognition but it will return faster. |
You must be logged in to reply to this topic.

OpenEars
Our Flying Friends