Reply To: Don't Receive Hypothesis on iPhone

January 15, 2014 at 12:17 pm #1019477

Politepix

Looking at the logging in your example above, there is something unusual about it. The lowest value that can be used for silence to detect is .1 seconds, and it’s generally inadvisable to reduce from the default of .7 which has been chosen for being the best value for most usage cases. In most cases in which a change is needed, it is a change to a longer silence detection period. Your log shows a silence detection period of 0.0, so that can’t work, and your longer detection periods you are testing with may still be too short for your usage case.

Sometimes when developers are first implementing OpenEars, they set the silence detection to a very low value on the theory that it will make OpenEars work like RapidEars, but RapidEars is a different library and its realtime detection has a (very) different design from stock OpenEars which pause-detection silence lengths are unrelated to. Giving OpenEars a silence detection period that is significantly shorter than a normal pause that people make while speaking will break it, since it has to wait for a pause at the end of a word or phrase in order to be able to judge the end of speech and submit the speech audio for recognition. If the pause is shorter than the length of a speech pause (which is around a half second or more), speech fragments will be submitted to the recognizer prematurely and this will/ought to result in null hypotheses.

If you actually have 0.0 seconds of silence to detect as shown in your logs, or close to 0.0 seconds, that will break speech detection. So it is somewhat possible that the reason you are seeing the result that it works on the simulator is that the simulator is incorrectly estimating speech/silence levels based on its worse driver, and that means that the ultra-short pauses aren’t being correctly noticed by the simulator so that when the simulator submits speech it is coincidentally not in fragments. In effect, the simulator is coincidentally operating the same as the device would if the device had a silence detection period of standard length.

I would probably leave the default speech detection silence period alone since it is pretty well tuned to the driver and it is already on the short side (the early versions of OpenEars used a full second since that is the suggestion from the CMU Sphinx project). If this doesn’t improve your issue, let me know some more specifics about the implementation and I will try to help.