Recognizer must be restarted after long utterances

This topic has 16 replies, 2 voices, and was last updated 9 years, 1 month ago by Halle Winkler.

Viewing 17 posts - 1 through 17 (of 17 total)

Advertisement: “RapidEars is an OpenEars™ plugin that lets you perform speech recognition while the user is still speaking!”

Author

Posts
February 26, 2015 at 8:16 pm #1024998

jugg1es
Participant

I’m stress-testing my app to see if it can handle a noisy environment on an iPad. I’m finding that when I have more than one long period of ‘listening’ (listening defined as when the recognizer detects speech until it detects a period of silence) together, that the recognizer seems to stop working all-together. I have to start and stop it to get it working again. It also results in memory warnings. Is there anything I can do to prevent this?

February 26, 2015 at 8:17 pm #1024999

jugg1es
Participant

And by long, I’m talking about over 20 seconds of just noise when it thinks someone is speaking

February 26, 2015 at 8:20 pm #1025000

Halle Winkler
Politepix

Hello,

This is the first report I’ve received of this so I would need a bit more info to investigate it further. For further troubleshooting, the next step would be for you to create a minimal replication case to share with me, so I can see the exact thing you are seeing in your local setup:

https://www.politepix.com/forums/topic/how-to-create-a-minimal-case-for-replication/

Would that be possible?

February 26, 2015 at 8:27 pm #1025001

jugg1es
Participant

Dang, I was hoping for an easy answer. I’ll do my best to see if I can get that to you.

February 26, 2015 at 8:37 pm #1025002

Halle Winkler
Politepix

Thanks!

February 26, 2015 at 8:40 pm #1025003

jugg1es
Participant

Meh, I can’t get it to fail using the sample app. If/when I do, I’ll send it. Thanks for responding.

February 26, 2015 at 8:45 pm #1025004

Halle Winkler
Politepix

OK, that’s part of the purpose of the replication case – it can also indicate when it is an interaction with a different part of the app, so you can find out whether the troubleshooting should be directed at an interaction within the app rather than something related to the library. Do you have other audio (or video) objects operating at the same time?

February 26, 2015 at 8:54 pm #1025005

jugg1es
Participant

Yea I have all kinds of stuff going on, but not when the recognizer is active. I definitely narrowed the problem down to the recognizer itself but I also can’t get it to fail every time, even in my app. Overall it works great, but occasionally it will totally crash out on me. I’m going to add some timeouts and features to detect excessive noise and leave it at that until I can pinpoint the situation where it happens.

February 26, 2015 at 8:58 pm #1025006

Halle Winkler
Politepix

OK, I’ll take a replication case whenever you have one for me.

February 26, 2015 at 9:07 pm #1025007

jugg1es
Participant

In case you were looking for some unsolicited advice, a paid plugin that is able to detect when the recognizer is listening but user isn’t trying to speak to the device would be very useful.

February 26, 2015 at 9:10 pm #1025008

Halle Winkler
Politepix

Sure, for what UI/UX purpose?

February 26, 2015 at 9:28 pm #1025009

jugg1es
Participant

My app involves using speech recognition to have a conversation with a simulated person to be used for training purposes. Like training a person how to interview for a job. The user has possibly dozens of prompts to choose from and I break each one into separate words and put them each into the language model. Then when the recognizer returns, I analyze the results and figure out which prompt they were actually trying to say. I did this way (as opposed to entering each prompt in it’s whole form into the model) for lots of reasons, the main one being that people often don’t read exactly what’s on the screen. It works great.

This means that there are a lot of possible things the recognizer can return as a hypothesis. Since it’s a training tool, there are often more than one person using it at once. It’s also common for it be used in a room with lots of other people talking. It would be nice if there was, for example, a input level threshold for when the recognizer thinks it’s being spoken to so it can tell if the user is speaking right at the device or whether it’s trying to listen to someone across the room.

Or if there are a lot of words spoken that aren’t in the language model, interspersed between words that ARE in the model, it will know that.

I tried Rejecto, but it was too strict and it wouldn’t return recognition events when it should have.

February 26, 2015 at 9:36 pm #1025010

Halle Winkler
Politepix

Did you know that you can turn down Rejecto’s weighting so it’s a bit less aggressive? Other options which don’t involve having to use a paid plugin necessarily are raising the vadThreshold value (this will exclude more incidental sounds) or using a grammar instead of a language model, which will only allow recognition on utterances that fit the grammar rules:

https://www.politepix.com/2014/04/10/openears-1-7-introducing-dynamic-grammar-generation/

This can be done with either stock OpenEars, or RuleORama for much faster grammars. Maybe one of these options can help you with getting a better user experience for your current app.

February 26, 2015 at 9:50 pm #1025011

jugg1es
Participant

Oh, I did not know about the vadThreshold. I will definitely play around with that.

I did play around with Rejecto’s weighting but I could never get it quite right. Since I’m doing a lot of processing on my end to determine whether a user is actually speaking to the software, I felt more comfortable having control than not receiving the event at all.

I’d love to use grammar rules, but, like I said, people rarely speak exactly what’s on the screen. So if the prompt is this:

Hello Molly, how are you doing today?

Users might actually say this (this happens way more often than you might think)

Molly Hello, how you doing today?

I were to use grammar rules, this wouldn’t be recognized.

Thanks for the tip on the vadThreshold, that might be just what I need.

February 26, 2015 at 9:55 pm #1025012

Halle Winkler
Politepix

You’re welcome, I hope it’s a helpful addition to your toolkit.

February 26, 2015 at 10:01 pm #1025013

jugg1es
Participant

Yea, that worked great. I’m really surprised you don’t charge at all for OpenEars and the support you give on these forums. Do you have a ‘donate’ button anywhere?

February 27, 2015 at 9:47 am #1025053

Halle Winkler
Politepix

I’m glad that helped! No worries – other people with similar questions will find this discussion or I can point them to it, so it provides a useful support resource, and it also might be the way they find out about a plugin that solves a problem for them, so it’s fine. If you want to do something nice, it’s always very helpful to get a shoutout here or there (Twitter, blog posts, whatever) so folks know that you’re having a good experience with the SDK.
Author

Posts

Viewing 17 posts - 1 through 17 (of 17 total)

You must be logged in to reply to this topic.