Tagged: single word
December 14, 2012 at 10:19 pm #14907
I am writing an app for kids that presents them with a single word, and then uses OpenEars to see if they repeat it correctly.
Currently if I use a headset and mic I get a successful recognition rate of about 91%. Results using a built in mic are close to 75%. I’m wondering if anyone has any tips on optimizing OpenEars for this type of application. For example, should I have the app create a single word dictionary prior to showing each word? Would that produce better results or is that an impractical approach.
Also – on a side note does anyone know if the dynamically created dictionary files can be redirected to another folder other than the “Documents” folder ( where they are visible via iTunes ).December 14, 2012 at 10:28 pm #14908
This is generally due to speaking too far away from the built-in device mic. Its optimal distance is telephoning distance so if the device is far away you won’t get as good results as with the headset mic.
What exactly is happening when the recognition is wrong, is it something like the kid said “cat” but it recognized “hat”, where both “cat” and “hat” are words that are in the language model, or more like the kid said something unrelated but it was recognized as either “cat” or “hat”?
The best advice I can give is to optimize a language model so it doesn’t have a lot of very similar-sounding short words in it, because that is the most challenging circumstance to get right. In that case I might want to use smaller language models and maybe try and see if Rejecto handles rejecting out of vocabulary speech (that’s only helpful if you are getting recognitions due to words which aren’t in the language model at all).
I will take the request about having an option for putting the language models elsewhere under advisement for the next version of OpenEars, you make a good point.January 23, 2013 at 5:48 am #1015452
I finally had a chance to do more testing today and found something unexpected. I used the code from the sample app to dump the mic levels to the console while pocket sphinx is working.
The typical patter was this: silence ( approx -100 DB ) followed by a peak of anywhere from -80 DB to -65 DB and then either a message from pocket sphinx saying “speech detected” followed by the hypothesis, etc. OR no message “speech detected” followed by no other messages from pocket sphinx.
At approximately -76 DB and above the results were good, below -76 DB is where consistency started to drop off.
So, to summarize the problem I’m seeing doesn’t seem to be the accuracy of the word identification, but rather the threshold for whatever causes pocket sphinx to generate a “speech detected” is higher than what I would like.
I looked through the API documentation again but did not see the option to adjust that. Any suggestions appreciated.January 23, 2013 at 12:14 pm #1015454
You can’t adjust the sensitivity of the voice activity detection, sorry. It sounds like it is too sensitive, is that correct? If the issue is that the users are speaking too quietly or from too far away for recognition to be accurate when it is triggered, this is generally something that is best addressed with user education: “MyApp works best if you speak clearly from no more than $DISTANCE away”. If the issue is that utterances are triggering recognition that are not user speech that relates to the app, Rejecto was designed to give a null result under that circumstance rather than a wrong hypothesis. If I have it backward and the issue is insufficient mic sensitivity, you could experiment with the suggestions in this thread:January 23, 2013 at 4:55 pm #1015456
It’s the latter – insufficient mic sensitivity. I looked at that thread, and it looks like I should attempt to follow the steps described by hartsteins in his May 25th post to nvrtd. Is that correct? The other parts of the thread seem to be about output routing. He mentions recompiling the framework after making the addition – does the framework get rebuilt when you do a clean->build in XCode, or are there special steps required to get the framework to recompile?January 23, 2013 at 5:05 pm #1015457
OK, yes the steps described in that post should help you with insufficient sensitivity, although please keep the downside in mind — if you increase the sensitivity you will also have more incidental noises triggering recognition (chirping birds, etc). Sometimes sensitivity issues are related to doing testing in a single environment and if you optimize for increasing or decreasing sensitivity based on one environment you will see a decline in performance on the other end of the spectrum (developers usually like to work in quiet environments but users like to do speech recognition in noisy ones). Just wanted to mention that issue in advance.
When you make a change to the framework, you need to clean and build the framework project (OpenEars.xcodeproj). Then the new framework should be picked up in your app the next time you build and clean. You can test this by selecting the framework file in your app project and selecting “view in finder” and then when it takes you to the finder file, do “get info” and see if its last modified date is the time that you built the framework project.January 24, 2013 at 7:26 pm #1015466
In this code snippet:
// kAudioSessionMode_Default = ‘dflt’,
// kAudioSessionMode_VoiceChat = ‘vcct’,
// kAudioSessionMode_VideoRecording = ‘vrcd’,
// kAudioSessionMode_Measurement = ‘msmt’
UInt32 audioModeClassification = kAudioSessionMode_VideoRecording; //set what you want it to be
UInt32 audioModeCheckSize = sizeof (UInt32);
UInt32 audioModeCheck = 999;
Is it necessary to remove the // in front of // kAudioSessionMode_VideoRecording = ‘vrcd’,?January 24, 2013 at 7:28 pm #1015467
Looks that way to me. I haven’t tested that code so you’re a bit on your own with it, but those look like constants so I expect that you’d want to uncomment the one corresponding to the recording type you want to use.
- You must be logged in to reply to this topic.