- This topic has 5 replies, 2 voices, and was last updated 9 years, 2 months ago by Halle Winkler.
January 15, 2014 at 5:43 am #1019471weavercsParticipant
When I run my app on the simulator I detect speech and receive a hypothesis:
Pocketsphinx is now listening.
2014-01-14 23:34:37.256 VoiceFacts1[7419:70b] Pocketsphinx has detected speech.
2014-01-14 23:34:39.191 VoiceFacts1[7419:70b] Pocketsphinx has detected a period of silence, concluding an utterance. 0.000000 seconds of silence to detect
2014-01-14 23:34:47.049 VoiceFacts1[7419:70b] The received hypothesis is THIRTEEN with a score of -1816 and an ID of 000000000
2014-01-14 23:34:49.536 VoiceFacts1[7419:70b] Pocketsphinx has stopped listening.
But when I run the exact same code on my iPhone I get this log sequence and NO hypothesis:
Pocketsphinx is now listening.
2014-01-14 23:30:46.065 VoiceFacts1[404:60b] Pocketsphinx has detected speech.
2014-01-14 23:30:47.748 VoiceFacts1[404:60b] Pocketsphinx has detected a period of silence, concluding an utterance. 0.000000 seconds of silence to detect
2014-01-14 23:30:47.905 VoiceFacts1[404:60b] Pocketsphinx has stopped listening.
I have tried this with various values for secondsOfSilenceToDetect.
What can be the difference?January 15, 2014 at 9:46 am #1019472
Does that happen when you run the sample app and use words from the sample app’s vocabulary? Normally, if there is any difference, it is the simulator which is worse since it has a different and simpler driver. If there is a problem with recognition on an actual device, that will come down to an implementation issue so it might be helpful to share some background on how the app is set up. My main recommendation would be to go through the tutorial and documentation so you can be sure that your expectations are in line with the capabilities of the SDK.January 15, 2014 at 12:17 pm #1019477
Looking at the logging in your example above, there is something unusual about it. The lowest value that can be used for silence to detect is .1 seconds, and it’s generally inadvisable to reduce from the default of .7 which has been chosen for being the best value for most usage cases. In most cases in which a change is needed, it is a change to a longer silence detection period. Your log shows a silence detection period of 0.0, so that can’t work, and your longer detection periods you are testing with may still be too short for your usage case.
Sometimes when developers are first implementing OpenEars, they set the silence detection to a very low value on the theory that it will make OpenEars work like RapidEars, but RapidEars is a different library and its realtime detection has a (very) different design from stock OpenEars which pause-detection silence lengths are unrelated to. Giving OpenEars a silence detection period that is significantly shorter than a normal pause that people make while speaking will break it, since it has to wait for a pause at the end of a word or phrase in order to be able to judge the end of speech and submit the speech audio for recognition. If the pause is shorter than the length of a speech pause (which is around a half second or more), speech fragments will be submitted to the recognizer prematurely and this will/ought to result in null hypotheses.
If you actually have 0.0 seconds of silence to detect as shown in your logs, or close to 0.0 seconds, that will break speech detection. So it is somewhat possible that the reason you are seeing the result that it works on the simulator is that the simulator is incorrectly estimating speech/silence levels based on its worse driver, and that means that the ultra-short pauses aren’t being correctly noticed by the simulator so that when the simulator submits speech it is coincidentally not in fragments. In effect, the simulator is coincidentally operating the same as the device would if the device had a silence detection period of standard length.
I would probably leave the default speech detection silence period alone since it is pretty well tuned to the driver and it is already on the short side (the early versions of OpenEars used a full second since that is the suggestion from the CMU Sphinx project). If this doesn’t improve your issue, let me know some more specifics about the implementation and I will try to help.January 15, 2014 at 9:27 pm #1019497weavercsParticipant
Thank you for looking at this. The value of 0 for secondsOfSilenceToDetect means that I didn’t set it at all. It evidently is initialized to 0 and 0 must be taken as the default value of .7. It is now working fine tht way.
The reason it wasn’t getting any hypothesis for speech was that I was calling stopListening in the callback pocketsphinxDidDetectSpeech. For some reason this wasn’t bothering the simulator. I should have paid more attention to your caveat not to rely on the simulated version of the package.
I was finally able to get the demo app running and it helped me a lot to improve my program. I still have much to learn about OpenEars but now I have a good start.January 15, 2014 at 10:24 pm #1019507
That’s great! Yeah, unfortunately the job of making the simulator perform exactly like the device would be a major undertaking (since there are so many potential audio devices on a Mac) that would take time away from working on the stuff that is actually going to end up in front of an enduser, so I have to stick with a very minimal simulator driver and it has its differences. I’ll look into the bug with the seconds of silence being misreported in the logging.January 16, 2014 at 2:27 pm #1019758
So, I’m looking into the funny value for seconds of silence to detect and I can’t find any logging statements in OpenEars in the format “0.000000 seconds of silence to detect”. Do you know where that logged value is originating from?
- You must be logged in to reply to this topic.