Home › Forums › OpenEars › Custom dictionary speech recognition only kind of working. › Reply To: Custom dictionary speech recognition only kind of working.
OK, here is what jumps out at me right off the bat:
1. I looked at the ARPA model and it is being calculated correctly given its input, and the phonetic dictionary looks normal to me and doesn’t seem to be missing alternate pronunciation, so I don’t expect anything bad due to that. The files not existing or being malformed seems like something we can rule out since they seem to be mathematically and syntactically correct and as you’ve shown, they are found by the app and some of your utterances are recognized.
2. The commas that the words are being submitted with as a separator have no role in your language model and can only be confusing things, so I would remove them. The ARPA model shows many bigrams and trigrams (sequences of two or three words) with an extraneous comma separating the words in the utterance. They maaaybe have no negative effect, but they could be making things weird and they are definitely not doing anything useful such as providing contextual information to the engine.
3. Unless you are expecting a user to make the utterance “NO YES”, there shouldn’t be a phrase like “NO, YES” being submitted to LanguageModelGenerator, because it is a signal that “NO YES” is an expected utterance and the rest of the model math will include this assumption versus other possibilities. i.e. if it isn’t a real thing users will say, it is going to make your model less accurate overall. I think maybe this is happening unintentionally due to the text-munging that precedes creating these models with LanguageModelGenerator – it is probably supposed to separate your words into separate NSStrings but instead is making one big string with comma separators and giving that string to LanguageModelGenerator, which then treats the comma separators as important and treats all the words as a single sequential expected utterance.
Examples of this phenomenon from your ARPA model, which means that individual strings consisting of multiple words separated by a comma are getting submitted:
\2-grams: -0.6021 <s> NO, 0.0000 -0.6021 <s> PAUSE, 0.0000 -0.3010 INVENTORY, UNPAUSE 0.0000 <-- this means that this is highly-expected as a user utterance because it was submitted together -0.3010 NO, YES 0.0000 <-- this means that this is highly-expected as a user utterance because it was submitted together
4. Never test recognition using the Simulator – it can lead you into troubleshooting phantoms. Only test recognition and accuracy issues on a real device.
Suggested plan of action: fix #2 and #3 by making sure that your text preprocessing before submitting your NSArray to LanguageModelGenerator includes getting rid of comma separators and splitting words on either side of a comma into separate NSStrings. This may indirectly result in an improvement to the quality of the phonetic dictionary as well, we'll see. Then test on a physical device and see if the situation is improved, and let me know your results.