How to get the best results with JSGF

Home Forums OpenEars How to get the best results with JSGF

Viewing 6 posts - 1 through 6 (of 6 total)

  • Author
    Posts
  • #1030310
    oldprogrammer
    Participant

    I’m working on getting the best results with JSGF. I had a couple questions:

    1. Does the dictionary size matter? Should it be constrained to the words in the gram file? I read in another post that it would be reduced at runtime, but I wasn’t sure if that applied when supplying my own dictionary.

    2. If I know that at a given point in time, only 10 words, for example, should be recognized, but there are 250 words total, should I have 25 different gram files and switch between them? Or create one large gram file? It seems, in my case, that smaller gram files produce more false positives.

    3. Does it help to add similar or dissimilar words to either the dictionary or the gram file to improve accuracy?

    Thanks!

    #1030311
    Halle Winkler
    Politepix

    Welcome,

    1. Does the dictionary size matter?

    I’m told by the Sphinx project (whose JSGF implementation it is) that it doesn’t. When switching between grammars the dictionary will grow regardless because Sphinx doesn’t have a mechanism for switching between entirely new dictionaries in the current version, meaning that the new words are added to the existing dictionary. I believe that it shouldn’t matter, since the search is constrained to the items in the grammar and dictionary words outside of it shouldn’t be up for consideration in the search even if they appear in the dictionary.

    2. If I know that at a given point in time, only 10 words, for example, should be recognized, but there are 250 words total, should I have 25 different gram files and switch between them? Or create one large gram file? It seems, in my case, that smaller gram files produce more false positives.

    My expectation would be that it’s better to switch between smaller grammars, but your own testing is the last word. If you are getting less accuracy with smaller grammars, do what gives you more accuracy.

    3. Does it help to add similar or dissimilar words to either the dictionary or the gram file to improve accuracy?

    Nothing should be in the dictionary that isn’t in the grammar (in the case above with the growing dictionary, it’s unavoidable but it doesn’t provide any particular benefit). In my experience the grammar should just contain the items that are intended to be recognized.

    OpenEars supports self-written JSGF, but it isn’t really a topic I give a lot of in-depth support for, because the method for creating grammars in OpenEars is usually its grammar specification language (which can be output by OpenEars to multiple lower-level formats such as JSGF or the RuleORama model type). The advantage of using it is that it supports all of the features Sphinx JSGF supports, but it can be dynamically generated from Cocoa types at runtime and it’s easily human-readable, take a look if you have a moment: https://www.politepix.com/2014/04/10/openears-1-7-introducing-dynamic-grammar-generation/

    #1030314
    oldprogrammer
    Participant

    Your expertise is always appreciated, so thank you. Very interesting about the dictionary usage, and affirming to know that reducing the dictionary to the words in the grammar is not a bad thing to do.

    Last night I did some testing and found that in my specific use case, one grammar with 250 words produced better accuracy (significantly fewer false positives) than 10 word grammars. Further, I noticed that 250 > 20 > 10 > 5. Five word grammars performed the worst, by far.

    A specific point I forgot to mention in the original post is that at a given point in time I’m only expecting a single, specific word (out of the 10 or the 250) to be matched. When I receive a hypothesis I ignore everything except that specific word. So that is certainly a consideration for anyone who comes across this post.

    Additionally, although performance of OpenEars (and pocketsphinx) is really good at generating grammar or language model data on the fly, I found that generating the grammar or language model files up front (build time) and switching between them as needed (at run time), was faster enough to make a perceptible difference to the end user.

    Of course, as you often point out, every situation is different and everyone is doing something just different enough that it’s hard to have universal rules to apply across the board.

    #1030315
    Halle Winkler
    Politepix

    Good to know, I wonder if that is actually a bug that the smaller grammars are less accurate. What was the thought process behind opting for a grammar versus a language model in a case where you are looking for a single word from a set?

    Additionally, although performance of OpenEars (and pocketsphinx) is really good at generating grammar or language model data on the fly

    Thanks, just to clarify, Pocketsphinx doesn’t generate models or grammars. OpenEars generates grammars and dictionaries, and ARPA files are mostly done by CMUCLMTK with some modifications.

    #1030316
    oldprogrammer
    Participant

    I started off using an ARPA model, but it seemed to require too much perfection to match. I tried to tweak the model by adding similar and dissimilar words (both metaphone2 and levenshtein distance) at various quantities, but my attempts never resulted in consistently better results, such that some scenarios would be good but others would have either too many false positives or false negatives. JSGF, because of the very limited number of words, *seems* to work better in general. Now, it could simply be that I wasn’t optimizing the ARPA model in the right way.

    The small grammar that performed the worst was “(AN | ALERT | ANT | ALWAYS | AMAZES)”. “ALERT” would be falsely detected with just background noise given 5-10 seconds of listening. However, adding another 200+ words that start with different letters of the alphabet resulted in significantly better results.

    Yes, good clarification on which part of the software generates the grammars and dictionaries. And when I mention performance, the biggest increase in performance was in pre-generating a model with 300-500 words with lmtool versus letting OpenEars do it at runtime. I don’t recall specifically, but it *felt* like it was 2-3 seconds faster on a mobile device (load time and corpus size being positively correlated).

    #1030317
    Halle Winkler
    Politepix

    The small grammar that performed the worst was “(AN | ALERT | ANT | ALWAYS | AMAZES)”. “ALERT” would be falsely detected with just background noise given 5-10 seconds of listening. However, adding another 200+ words that start with different letters of the alphabet resulted in significantly better results.

    OK, so the issue is about non-speech being detected as a word. I would expect that could be an issue for a very small grammar of short words (this is the issue that Rejecto is designed to help with for models).

    I don’t recall specifically, but it *felt* like it was 2-3 seconds faster on a mobile device (load time and corpus size being positively correlated).

    Hmm, to create a model of this size in the format that lmtool creates should actually take less than a second (when I look at the last log that was submitted for an issue here I see a similarly-sized model generation taking about 0.2 seconds on a current device including the onetime caching of the acoustic model data). Do you have a log of the 2-3 second behavior?

Viewing 6 posts - 1 through 6 (of 6 total)
  • You must be logged in to reply to this topic.