Reply To: When is the best time to call "changeLanguageModelToFile"?

December 1, 2015 at 11:05 pm #1027456

Participant

Is the bigger issue the memory overheard or the time to return a hypothesis?

The bigger issue was the time to return a hypothesis,

I figured I could get around the memory issue as it was only an issue at the time the grammar was created, and I could just create the grammars on the simulator and then have them pre-created on the device. Once the grammar was created the memory consumed went right back down from 300MB+ to ~30MB

But the speed is a huge part of the experience, in my app the voice is the *only* UI but I am providing visual feedback all the time; therefore even slightly sluggish response is undesirable.

have you tried this using a statistical model rather than a grammar?

Using a statistical model is certainly faster – but it comes with it’s own drawbacks – a lot of false positives on low phoneme words, utterances with internal rhymes or lots of alliteration can all trip it up.
e.g Air, Here, There, Her, are all pretty close together in the Fresh Prince rap, and I had to filter out very low phoneme words in the end, not just ‘a’ ‘the’ etc but also words that become low phoneme when said in a range of accents (dropped ‘h’, ‘t’ etc)

A grammar constructed like you advised to use at the bottom of this question : https://www.politepix.com/forums/topic/ruleorama-will-not-work-with-rapidears/

I construct a grammar per sentence; with an array containing the phrase; expanding sequentially.
Then at a natural punctuation point or the end of the sentence I switch to the next pre-created grammar for the next sentence.

This works really nicely, until it doesn’t. It’s necessary for the UI to display the paragraph, or group of sentences on screen for the user to read, and so my two failure points are as follows:

when somebody speaks so fast that they are a word or two into the next sentence before the language model switch is complete – at this point the new grammar is useless as you cannot ( I think ) start a language model with a hypothesis that you set yourself. I would love it if you could.
When somebody speaks so slowly that rapid ears detects a period of silence and then restarts the hypothesis midway through the expanding string array

I have degree of control and understanding of the context in which I receive the hypothesis so how I am proposing to get around this is to switch to a statistical model when in the fail states, and then at a natural boundary (end of the sentence or other punctuation point) switch back to using a grammar.

The statistical model should be small enough that I can get good enough results from it, before switching back to a grammar.
It should feel okay to the user, and not like I’ve missed anything, I hope.

A language model switch has to occur after some kind of final hypothesis return since it’s like a micro-restart

Thanks! That’s what I thought, what I will do then is play around with the secondsOfSilenceToDetect property until it feels about right to me – and use the `pocketsphinxDidDetectFinishedSpeech’ callback to figure out if I’m in an ‘error state’ or not.

If you’re still reading this Halle thank you – got my office to buy licenses today for RuleORama and RapidEars

Cheers
Liam