Reply To: Subpar success rate of speech recognition

Home Forums OpenEars Accuracy with Irish accent in speech recognition Reply To: Subpar success rate of speech recognition

Halle Winkler

OK, for future reference it would be helpful to lead with this information, or give it in response to the questions asked about it. It’s been some work for me to find out what accents you are really trying to recognize and what accuracy levels you are seeing. I would not expect great accuracy for Irish accent recognition with the default language model, which is entirely made up of US accents.

This sounds like a subjective report: “An Irish female tester reported that accuracy was as low as 3% when she was holding iPhone as one normally would. When she increased the distance between herself and device, the accuracy got better.”

The reason it sounds like a subjective report is that I doubt she tested 100 times (and if she did, is it 100 repetitions of the same phrase or 100 different phrases?), so 3% is more likely to be a qualitative statement for “recognition was bad in my testing round”. Reasons for this could be diverse — it could have something to do with your UI, it could have something to do with the environment she is testing in, it could have to do with the non-English (MEATH ) and misspelled (VETINARY ) words in your vocabulary which will have pronunciation entries in the dictionary which will never be spoken, it could have to do with her expectations of what can be recognized (most end users don’t realize the vocabulary is limited and/or that saying a lot of extra things that are outside of the vocabulary will affect recognition quality).

The symptom that recognition is worse when she is close to the phone is unlikely to be strictly true since closeness to the phone improves recognition under normal testing, so what is more likely is that the other variables mentioned above changed at the time that she got farther from the phone. I’m sure there is something to it but it is an isolated data point that is unexpected so it needs replication from your side.

I can’t really remote bugfix an issue you are receiving as a remote report — at some point, someone needs to make a first-hand observation of the issue and test it in an organized way and replicate. If something was wrong with her test session (she was saying words that aren’t in the vocabulary, or it was really noisy, or the UI in the app was giving her the impression it was listening at a time that it wasn’t listening) it’s harmful to try to adapt your approach to that limited data.

My recommendation is to obtain some WAV recordings of your speakers saying phrases that should be possible to recognize with your vocabulary and put them through PocketsphinxController to find out what the accuracy levels are for them. It is important to check the words in your OpenEarsDynamicGrammar.dic file that were not found in cmu07a.dic and make sure that the phonetic transcription in there is a real description of how someone would say those words, and to remove any typos, because if you have “VETINARY” and someone says “VETRINARY”, not only will you not recognize “VETRINARY” but it will hurt the recognition of any other words in the statement since you now have an out-of-vocabulary utterance in the middle of the statement.