Your expertise is always appreciated, so thank you. Very interesting about the dictionary usage, and affirming to know that reducing the dictionary to the words in the grammar is not a bad thing to do.
Last night I did some testing and found that in my specific use case, one grammar with 250 words produced better accuracy (significantly fewer false positives) than 10 word grammars. Further, I noticed that 250 > 20 > 10 > 5. Five word grammars performed the worst, by far.
A specific point I forgot to mention in the original post is that at a given point in time I’m only expecting a single, specific word (out of the 10 or the 250) to be matched. When I receive a hypothesis I ignore everything except that specific word. So that is certainly a consideration for anyone who comes across this post.
Additionally, although performance of OpenEars (and pocketsphinx) is really good at generating grammar or language model data on the fly, I found that generating the grammar or language model files up front (build time) and switching between them as needed (at run time), was faster enough to make a perceptible difference to the end user.
Of course, as you often point out, every situation is different and everyone is doing something just different enough that it’s hard to have universal rules to apply across the board.