Question about letters

This topic has 5 replies, 2 voices, and was last updated 11 years ago by Halle Winkler.

Viewing 6 posts - 1 through 6 (of 6 total)

Advertisement: “Don't want to wait for pauses before receiving speech recognition results? try RapidEars!”

Author

Posts
March 27, 2013 at 4:45 pm #1016780

andrewjrangel
Participant

I am trying to create an app that uses single letters rather than words. Has anyone ever worked with just letters or single digits? I am having a really hard time getting it to recognize single letters, and it also attempts to string them together into a word. Is there anyway to limit what it tries to suggest to just one char or one string instance? I have tried putting words in the dictionary like “que” for Q, but some letters are still unwieldy. I am new to OpenEars, so I apologize if this question is redundant. Thanks for your help!

March 27, 2013 at 4:50 pm #1016781

Halle Winkler
Politepix

Welcome Andrew,

Yes, this is unfortunately a known limitation of speech recognition in general. Here are a few discussions about it:

http://www.google.com/search?hl=en&as_q=site%3Apolitepix.com%2Fforums+letters

March 27, 2013 at 5:26 pm #1016782

andrewjrangel
Participant

Thanks for that, I guess my searching didn’t go far back enough to see that entry. I am interested in if you can literally limit the result to one entry from the dictionary? So that it doesn’t try to string letters together for the words it hears. I feel like if I could force it to that I could get more success on the single letters. I am essentially trying to match their answers based on questions that come up, and then see if they are correct or not via OpenEars speech rather than the norm for selecting an answer via touch.

March 27, 2013 at 5:32 pm #1016784

Halle Winkler
Politepix

Not quite sure I’m following — can you elaborate on how only having a single letter in the dictionary would let you recognize spelling?

March 27, 2013 at 5:37 pm #1016785

andrewjrangel
Participant

So I want someone to read a question and answer “A” or “B” or “C” etc. So I was hoping to set it up so they press a button and say the letter and then press it again to stop OpenEars from processing their voice.

The issue is if I say B and then there is a noise in the background or I say something after that. It will try and list the output as “B A B A C B…”

So I want to force the output to only be limited to one entry. I say entry rather than letter because I may have “Que” for Q. Right now OpenEars tries to keep going as the person talks. Is there anyway to stop it from processing more than one “word” or “letter” or “voice entry”?

March 27, 2013 at 5:49 pm #1016786

Halle Winkler
Politepix

Well, the biggest issue you’re going to encounter is the one I linked to earlier — A and B and C all rhyme or nearly rhyme and have only one or two phonemes, and they have no surrounding context, so most speech recognition engines will get the utterance wrong much of the time.

OpenEars does continuous recognition, so it isn’t really designed for an approach like pressing a button to start recognition and pressing a button to stop it. This is kind of a tricky user experience in general because in order for the user to use touch to select the answer they would be tapping once, and in order for them to answer with the interaction you’ve described, they are tapping twice and speaking. I think this is a workaround that results from the problem recognizing letters, so I’d recommend not trying to recognize alphabet letters with speech recognition since it leads to generally difficult UX.

To answer your question, with stock OpenEars you can’t limit recognition in that way because it has to listen for a half-second of silence before it processes the speech, meaning that it will try to recognize speech that it perceives that slips in before that half-second of silence has been heard. The half-second is how it knows it isn’t interrupting the user in the middle of speech.

You can try Rejecto to reject speech that isn’t in your vocabulary if the primary issue is about extraneous sounds provoking recognition, or you could try RapidEars which can return a recognition hypothesis immediately instead of waiting a half-second (and then you stop your listening behavior immediately when the answer has been perceived). Alternately, you can just isolate the first speech out of the whole string and only respond to that (so in your example “B A B A C B” you’d just ignore everything after the first B).

But it’s all going to work a lot better if you pick something as the target speech besides alphabet letters.
Author

Posts

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.