The small grammar that performed the worst was “(AN | ALERT | ANT | ALWAYS | AMAZES)”. “ALERT” would be falsely detected with just background noise given 5-10 seconds of listening. However, adding another 200+ words that start with different letters of the alphabet resulted in significantly better results.
OK, so the issue is about non-speech being detected as a word. I would expect that could be an issue for a very small grammar of short words (this is the issue that Rejecto is designed to help with for models).
I don’t recall specifically, but it *felt* like it was 2-3 seconds faster on a mobile device (load time and corpus size being positively correlated).
Hmm, to create a model of this size in the format that lmtool creates should actually take less than a second (when I look at the last log that was submitted for an issue here I see a similarly-sized model generation taking about 0.2 seconds on a current device including the onetime caching of the acoustic model data). Do you have a log of the 2-3 second behavior?