Yes, recognizing numbers in isolation seems to be a difficult task for speech recognition engines.
1) Trying to create a better language model by using a different toolkit such as SRILM MITLM or IRSLM
3) Using LanguageModelGenerator
Most language modeling software uses a set or subset of a few existing algorithms, so I don’t think you need to do a lot of experimentation there. The LanguageModelGenerator uses another good package so you could probably just try out whether its output is preferable and then call it a day.
Build a acoustic model model with just the numbers 1-10
Don’t you need 1-100? But you might want to investigate this approach and/or adapting the existing model with your new data: http://cmusphinx.sourceforge.net/wiki/tutorialadapt
It seems like the task of creating an acoustic model that just recognizes 1-100 with a number of different voice contributors and accents is constrained enough to be feasible for an app project.
Using JSGF instead ARPA
In my opinion after some recent experimentation, JSGF is too slow for a good UX. Other developers do use it so as I said this is a matter of opinion. You can use the garbage loop approach for out of vocabulary rejection as well with ARPA as with JSGF: http://sourceforge.net/p/cmusphinx/discussion/help/thread/cefe4df3 which could be something that improves your results if the issue is too many false positives rather than too many false negatives or transposed recognitions.