| Author | Posts |
|---|---|
| Author | Posts |
| November 20, 2011 at 12:50 am #8146 | |
|
culov |
Hi, I’ve done some testing to quantify the gap between the method calls of -pocketsphinxDidDetectFinishedSpeech when the utterance is concluded, and -pocketsphinxDidStartListening when we start listening again. Typically the gap is between 500ms and 800ms. I would like to close this gap as much as possible in my project and I’m trying to come up with the best way to do it. The naive solution seems to be to have 2 instances of the same class implemented OpenEarsEventsObserverDelegate, and to alternate which one is actively listening. For example, lets call the 2 instances listener_one and listener_two. When listener_one finishes detecting an utterance, listening is suspended on that instance, and is started on listener_two. When listener_one finishes processing the utterance that. When the next utterance is detected, listener_two has its listening suspended, and listening is resumed on listener_one. I saw this thread but it’s not exactly what I’m looking for. The method described above is the only way I’ve been able to imagine what could be a potential solution to the problem, but it seems to be a hacky workaround. Does anyone have any ideas on how it might be better implemented? Thanks |
| November 20, 2011 at 8:45 pm #8147 | |
|
Halle |
Hi there, I don’t think that’s going to work for a number of reasons. The first is that the gap is just due to processing the heard speech, so the CPU usage is probably too high for other processes to be underway. The second is that what would be heard by a second recognizer there would only be a fragment of speech, that won’t be resolvable into an utterance without the other speech that precedes and comes after, which the second recognizer wouldn’t have access to. The last reason is that it isn’t OpenEarsEventsObserver that is responsible for that, it is PocketsphinxController, and I don’t think you can run two instances of PocketsphinxController at all (although I’d never say never since I guess it’s something that might start working as hardware resources increase). I’m sort of curious where this specification comes from — if the speech is being processed, it’s because the speaker halted, correct? |
| November 20, 2011 at 9:37 pm #8148 | |
|
culov |
I spent quite a bit of time trying to get 2 PocketsphinxControllers to run in one project yesterday and didn’t have any luck. I couldn’t figure out exactly why I couldn’t do it because the stack traces I was seeing appeared rather esoteric to me. Being able to do something like this is actually something that is going to make or break the project that I’m working on, I believe. I’m making an app where I can assume that the user will have their phone in their pocket, and will be communicating with the app 100% via the voice interface with a headset/microphone. The user’s environment will not be the quietest, so there will be occasional ambient noise that OpenEars will pick up. However, since the user won’t be looking at the phone, they will not know when the app is processing useless noise and will assume that they are free to speak into the microphone any time. However, since we have these 500ms+ gaps occurring all the time, the user will speak a quick phrase into the microphone and part, or all of it, will go unheard because the phone was busy processing the previous detected utterance. So as per your second point, I think I miscommunicated in the OP — When the second listener starts listening, the previous bit of detected speech would be irrelevant because there was already a second of silence, and it can be safely process independently. The second listener would continue listening until it detects another utterance, at which point we would start listening with listener_one when listener_two begins processing the speech. The other two points you bring up are a bit disheartening for me since I was counting on being able to get this completed. Since I’ve already spent so much time on this project, I’d like to spend some more time trying to get 2 PocketsphinxControllers running before I give up. Do you have any further that I may find helpful? Thanks a lot, Halle. Ivan |
| November 20, 2011 at 10:18 pm #8149 | |
|
Halle |
Hi Ivan, Sorry to hear you’re having trouble there — FWIW it says in the docs that there can only ever be a single PocketsphinxController, but I know that doesn’t help you much after the fact. I do not think it’s a good line of inquiry since one of the most common reasons that users report crashes in Pocketsphinx turns out to be accidentally having instantiated two PocketsphinxControllers. Let me recommend a different direction for you: try to significantly reduce the processing time so it’s less of an issue for you. This is more of a question for the CMU Sphinx board, but you can basically feed the OpenEars implementation of Pocketsphinx nearly all of the runtime arguments that work with the linux version of Pocketsphinx. Several of them can be adjusted to reduce processing time at the expense of accuracy. Reduction of your language model size will also speed things up. What I do for hands-free audio-only UI is to give the user an audio cue when recognition has commenced (it could be a sound that they learn means “recognition is in progress, stop speaking” or it could be TTS speech or recorded speech saying “one moment, processing.” |
| November 22, 2011 at 4:44 am #8154 | |
|
culov |
Halle, I’m going to try and switch from ARPA to JSGF and I’m also going to try and reduce the size of the language model. I’m afraid that I can’t spare any reduced accuracy as is, but perhaps when I transition over to JSGF, there may be some wiggle room. Thanks |
| November 22, 2011 at 10:55 am #8158 | |
|
Halle |
Okeydoke. I’d again encourage using audio UI as a solution here, since even Apple considers it necessary to associate “processing now” with a specific tone in Siri, which is also intended to be used with no visual UI. It’s a pretty standard problem. |
You must be logged in to reply to this topic.

OpenEars
Our Flying Friends