HomeForumsOpenEarsForcing recognition to start right away

This topic has 3 voices, contains 12 replies, and was last updated by  Halle 134 days ago.

Viewing 13 posts - 1 through 13 (of 13 total)
Author Posts
Author Posts
March 5, 2011 at 11:24 pm #3899

teepusink

Hi,

Is there a method I can use to force openear to start recognition right away?

E.g openear is now listening, then when I click a button, it starts recognition up to the point when I click that button.

I tried suspendRecognition, but the problem is that it also stop the recognition. So in a way I want it to behave like suspendRecognition but still return me the recognized word before going to suspend mode.

Thanks,
Tee

March 5, 2011 at 11:44 pm #3900

Halle

Hello teepusink,

I don’t see a way to do that with OpenEars since it is really designed around continuous recognition (i.e., the important decision factor in its flow is perceived silence between statements, not a user-initiated UI interaction, and basically everything in the big loop in the continuous model is built around detecting that). But I think VocalKit would do this with no problem:

http://github.com/KingOfBrian/VocalKit

Maybe my suggestions in this thread will have some applicability to your question, not sure:

http://www.politepix.com/forums/topic/how-might-i-go-about-triggering-recognition-on-demand

Good luck,

Halle

March 6, 2011 at 12:06 am #3901

teepusink

Thx Hall.
Yes the biggest thing that I’m experiencing now is that sometimes the detection can get pretty sensitive and “sticky” and won’t stop detecting speech eventhough technically I’ve stopped talking. So I’m just trying to find a way to control the continuous loop. Be it timer driven after 2 seconds for example or through button click.

Common scenario when someone is giving the app a try is that they will start talking. Then stop talking, but the app is still detecting speech. User thinks their speech isn’t detected and talk again and the result will be totally off.

Will give ur suggestion from the other thread a spin an will post back if I find anything.

Thanks,
Tee

March 6, 2011 at 7:48 am #3902

Halle

Hi teepusink,

I can’t quite tell from your description whether the issue is that the user is getting confused due to not getting a UI cue that recognition of their statement is still in progress, or whether you are experiencing a bug in OpenEars where it isn’t detecting silences of more than a second, or whether the environmental noise level is too loud for the end of speech to be detected. Do you have the same issue if you use the sample app?

Thanks,

Halle

March 6, 2011 at 7:20 pm #3903

teepusink

Hi Halle,

The app actually works well. It sometimes recognizing a different word from what the user said is a separate thing I want to post on the forum.

It’s not really a bug with the sample app / app, but it’s more of confused expected app behavior. Especially when you have a room mate that likes to talk (I like my room mate tho).
I realized that making a more Bold UI cue might help clarify what’s going on in my app.
The other one is a little more tricky and not sure if you have a best practice guide. What usually happens is that I’ll say a word (i.e. LEFT), then my room mate will either talk to me or laugh etc. Often the “has detected speech input” state can last for quite a while and then the recognized word come up to be “RIGHT LEFT TURN LEFT RIGHT…”. What is the best way to tackle that? Just match the hypothesis to the first word?

So the initial question I asked was to see if there is a way I can have more control on when to stop audio input and start recognition.

I kinda get instantRecording + 1 second working now though.

Here is what I did:
1. I added a new method in the controller – (void) instantRecognition;
2. I added a new variable in continuous model – instantRecognition
3. when instantRecognition is TRUE, I stop incrementing timestamp so it will recognize after 1 second. My goal is to get that closer to instant, but if I quit that loop right away, the word it recognizes will be totally off. So I’m keeping that 1 second for now and will get back to it later.

Thanks,
Tee

March 6, 2011 at 9:20 pm #3904

Halle

Hi Tee,

I think for your first question, using the new JSGF feature might be a good solution for you because you can create a JSGF grammar that has an entry “LEFT” without any variations on that, and that will mean that nothing will be recognized other than the single word “left” even if it is said multiple times or mixed in with other sounds. Then it just remains to be seen whether the single word “left” will be picked out of the noise by Pocketsphinx. There’s an example of a simple JSGF grammar in the sample app.

What kind of input are you using to initiate your instantRecording + 1 approach? Does the user tap a button?

March 6, 2011 at 10:39 pm #3905

teepusink

Thanks Halle. I’ll take a look at the JSGF feature.

Yes I have a button that initiate instantRecording + 1 approach.
I like to call it the “walkie talkie” button. So the app is always in suspended mode. When I hold down the button, it change to resume mode and when I release, it calls instantRecording that basically just set continousModel.instantRecording = TRUE;

I also have an instantTimestamp within the loop so in case after you hold the button and release but you didn’t say anything, it will reset continousModel.instantRecording to FALSE.

Thanks,
Tee

March 6, 2011 at 11:20 pm #3906

Halle

OK, sounds like you have it working for your requirements — any other questions?

March 7, 2011 at 12:15 am #3907

teepusink

I think I’m all set for now. You’ll sure hear back from me if I have other questions.
Thanks Halle. Good stuff!!!

January 4, 2012 at 2:22 pm #8362

itay

Hi Guys
Its been a long time since you talked about it,
I am having the same needs in my app now – I want the user to stop the listening when he decides,
For example :
User is on the train, people occasionally talking besides him,
He starts speaking to the app – and 10-20 % the system will still remane on listening cause it hears the people.

I though about doing your trick and just put a flag (shouldStartRecognition) like so :

if ((continuousListener->read_ts – timestamp) > (kSamplesPerSecond * kSecondsOfSilenceToDetect) || shouldStartRecognition ) {
.
.
.
shouldStartRecognition = false;
break;
}

When the user click the button that he wants to start recognition I will change :
shouldStartRecognition = true;

Do you think it can cause problems on recognition ?

Thank

Itay

January 4, 2012 at 2:39 pm #8364

Halle

Do you think it can cause problems on recognition ?

Do you mean problems with multithreading/the driver/other code, or is there a particular recognition issue you’re concerned about? Technically, it should cause a problem with recognition because you’re trying to shut off recognition :) . Regarding whether it will create structural weirdness with the behavior of the rest of the continuous model and its threads, I would expect it to not cause problems because the whole thing is written to accept arbitrary exits from the loop at any time so that incoming calls or audio route changes behave acceptably. The only way to know is to verify with some testing, but this is a hack that I’d expect to have a good chance of working well.

January 5, 2012 at 11:58 am #8367

itay

Well, I did some kind of a mix to try and fullfil both condition :
1. start recognition on demand
2. giving this 1 seconds extra in order not to “mess up” with pocket sphinx recognition logic. (sometimes it messed the last word of the speech)

So I did entered the shouldStartRecognition = true on my controller (see earlier post) – but I called the setter in a delay of 1 second from the moment the user clicked the button.

e.g :
[pocketSphinxInitiator performSelector:@selector(setShouldStartRecognition) withObject:nil afterDelay:1.0];

After this I didn’t experience any issues with the last words.

Thanks Halle

January 5, 2012 at 12:14 pm #8368

Halle

Sounds good!

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic.