Sorry about the tag removal, I haven’t yet figured out a way to let people paste their whole JSGF grammar without also allowing arbitrary HTML (which is a security issue).
Now my question is whether it’s better to supply the generateLanguageModelFromArray method with a list of the single words that comprise the sentences and let OpenEars/RapidEars figure out the sentences or should I give it a list of all the possible sentences?
I would give the entire sentences; this will cause LanguageModelGenerator to give increased probability to the sentence word sequence.
Is there a way of obtaining something similar to what I’m doing with JSGF using the LanguageModelGenerator?
Not exactly since JSGF and ARPA models address two very different design issues. Unfortunately RapidEars doesn’t currently support JSGF, but lately there has been a lot more usage of JSGF in OpenEars (I think this is because performance on the device has improved to the extent that the performance hit for using JSGF isn’t as arduous as it used to be) so I will give adding it to RapidEars some thought, even though JSGF will definitely be less rapid.
One thing that it occurs to me that you can try, and I guess this is how I would attempt to solve this, would be to change the probabilities in your language model by hand to raise them for bigrams and trigrams which represent the target sentences and lower them for unigrams that represent “loose” words. You’ll need to use the LanguageModelGenerator-produced .arpa file as your language model rather than the .DMP and open it up in a text editor. I would start by only changing the probability of the trigrams (the probability is the value at the start of the line). The probability is from a negative number to zero where zero is neutral and less than zero is lowered probability.