This is truly an amazing framework,
I have a question regarding the continuous speech loop. For my app needs, I have bursts of single words with bursts of silences. Ie: 1 word said, 2 seconds silence, 1 word, 1 second silence, 2 words said, 3 seconds silence, 1 word said, 1 second silence, etc… which works great using the default engine. Occasionally though, it causes a long delay (6+ seconds), and the words come back in a big chunk. I am assuming that the engine just lost detection of silence versus speaking.
So I am curious what these two values do, are these the max values for the considered phases or the expected time frames between words/phrases ? (secondsOfSpeechToDetect and secondsOfSilenceToDetect) I see the default in the commandArray is 0.1, but 100ms seems very ‘small’ for a spoken word?
Secondarily, I haven’t noticed any adverse effects, but was wondering if there is a difference between initializing the dictionary and actually starting the ‘listening’ phase with startListeningWithLanguageModelAtPath & suspendRecognition. Think of a push to talk. I start listening immediately on view load, but then directly call suspend.
ie:
[[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:pathLanguageModel dictionaryAtPath:pathDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@”AcousticModelEnglish”] languageModelIsJSGF:FALSE];
[[OEPocketsphinxController sharedInstance] suspendRecognition];