In my opinion, Pocketsphinx can’t really be used to qualitatively judge children’s pronunciation in the way you’re describing, sorry – other factors including but not limited to environmental circumstances, device proximity, and the big differences between speech at different developmental stages will end up with results that are not clear enough to qualify in a way that is positive. It’s a requirement to work very closely with an academic or professional expert in childhood speech development when considering any software project with a goal of giving any kind of automated feedback on developing speech.
I can answer your questions in general terms that would apply to any app:
1. Is the default probabilistic model or JSGF model better for detecting a single word?
It doesn’t matter much for one word, but you can only use Rejecto with a language model.
2. Should I be limiting the language model to a single word, or does the language model need additional words to compare against? I’m assuming that this is kind of what Rejecto is doing for me, though at the phoneme level.
That’s right. If you want to detect one word, just use one word, and if you want to reject other words and sounds, Rejecto will handle that.
3. Are there any benefits to reducing the size of the dictionary? If I’m only looking to match a single word, should the dictionary (with the phonemes) also be limited to the single word?
Which dictionary? OpenEars creates your phonetic dictionary so you don’t need to do anything about that dictionary. If you are asking about the master lookup dictionary, you can’t reduce its content and it isn’t necessary.
4. Does it help to add similar sounding words to the language model, and then reject them when the hypothesis is received?
In cases where you have a high degree of confidence as the app designer about what specific other words will be detected, although this is not usually the case.
5. The Rejecto “withWeight” parameter: does a lower number mean less rejection? The wording in the documentation seemed precise, but I didn’t quite follow.
That’s right, higher weight means more will be rejected and vice versa.
6. I’ve read in other areas about training the speech detector. Is that something that can help? Any advice or pointers for going down that road?
In very general terms, it can be helpful to do adaptation when dealing with a userbase where you know in advance that they aren’t well-represented in the acoustic model data. However, in this case, this is one of those things that requires an expert in childhood speech development to work with in order to do the right thing.