oldprogrammer

Forum Replies Created

Viewing 2 posts - 1 through 2 (of 2 total)

Advertisement: “Don't want OpenEars™ to guess one of your vocabulary words when it hears an unknown word? Rejecto can help!”

Author

Posts
May 12, 2016 at 5:01 pm in reply to: How to get the best results with JSGF #1030316

oldprogrammer
Participant

I started off using an ARPA model, but it seemed to require too much perfection to match. I tried to tweak the model by adding similar and dissimilar words (both metaphone2 and levenshtein distance) at various quantities, but my attempts never resulted in consistently better results, such that some scenarios would be good but others would have either too many false positives or false negatives. JSGF, because of the very limited number of words, *seems* to work better in general. Now, it could simply be that I wasn’t optimizing the ARPA model in the right way.

The small grammar that performed the worst was “(AN | ALERT | ANT | ALWAYS | AMAZES)”. “ALERT” would be falsely detected with just background noise given 5-10 seconds of listening. However, adding another 200+ words that start with different letters of the alphabet resulted in significantly better results.

Yes, good clarification on which part of the software generates the grammars and dictionaries. And when I mention performance, the biggest increase in performance was in pre-generating a model with 300-500 words with lmtool versus letting OpenEars do it at runtime. I don’t recall specifically, but it *felt* like it was 2-3 seconds faster on a mobile device (load time and corpus size being positively correlated).

May 12, 2016 at 3:07 pm in reply to: How to get the best results with JSGF #1030314

oldprogrammer
Participant

Your expertise is always appreciated, so thank you. Very interesting about the dictionary usage, and affirming to know that reducing the dictionary to the words in the grammar is not a bad thing to do.

Last night I did some testing and found that in my specific use case, one grammar with 250 words produced better accuracy (significantly fewer false positives) than 10 word grammars. Further, I noticed that 250 > 20 > 10 > 5. Five word grammars performed the worst, by far.

A specific point I forgot to mention in the original post is that at a given point in time I’m only expecting a single, specific word (out of the 10 or the 250) to be matched. When I receive a hypothesis I ignore everything except that specific word. So that is certainly a consideration for anyone who comes across this post.

Additionally, although performance of OpenEars (and pocketsphinx) is really good at generating grammar or language model data on the fly, I found that generating the grammar or language model files up front (build time) and switching between them as needed (at run time), was faster enough to make a perceptible difference to the end user.

Of course, as you often point out, every situation is different and everyone is doing something just different enough that it’s hard to have universal rules to apply across the board.
Author

Posts

Viewing 2 posts - 1 through 2 (of 2 total)