This topic has 3 voices, contains 6 replies, and was last updated by sparkyreich 140 days ago.
| Author | Posts |
|---|---|
| Author | Posts |
| October 25, 2011 at 9:56 pm #7793 | |
|
Joseph S. Wisniewski |
I’ve been tinkering with getting Sphinx to perform recognition during speech instead of after detecting an end of speech from the VAD. I notice you defer the search in the listening loop in continuousModel by calling ps_process_raw with the no_search parameter set true, which means that Sphinx won’t do its three pass search until you get your timeout interval of cont_ad_read without VAD, and call ps_end_utt. I switched over to That, and an added method to poll for partial hypothesis, and I get to watch the recognition in realtime ;) |
| October 25, 2011 at 10:07 pm #7794 | |
|
Halle |
Sounds very cool. The only reason I have no_search = TRUE is because after a great deal of testing it turned out that it caused crashes when set to false: http://sourceforge.net/tracker/?func=detail&atid=101904&aid=3117704&group_id=1904 Another developer confirms the behavior in the discussion thread. But you will probably only see the crash in fewer than one session in ten IIRC (the 1 in 200 statement regards recognitions, not app sessions). I haven’t tested this against .7 to see if the bug was fixed. This has certainly been the day of discussing really obscure crashes :) . |
| October 27, 2011 at 4:06 pm #7806 | |
|
Joseph S. Wisniewski |
I’m seriously thinking of adding something that looks for a “stalled” Pocketsphinx and starts a new session. There’s more than one subtle way to freeze Pocketsphinx, it really needs a watchdog. I see the synchronous recognition as being among the most desirable of features. Sphinx can’t match the performance, reliability, or language coverage of the “big” commercial product, Nuance. So, one has to concentrate on what makes it “different”. Right now, Nuance is pushing the client-server approach. They’ve got the dictation and the assistant market sewn up. But client-server is “laggy”, and there’s still UI needs for near instantaneous response, whether infomatics, medical, gaming, or home automation. If one wants to carve a niche, that’s where to carve it, I think. |
| October 27, 2011 at 4:14 pm #7807 | |
|
Joseph S. Wisniewski |
I wonder what the stability implications are of letting Sphinx search when you’ve definitely got speech detected (the first ps_process_raw, when you’re getting VAD) but not in the “pump the tank empty” one you do after losing VAD, before the final ps_end_utt. Definitely an experiment to try. Especially if I put a “watchdog” around the Pocketsphinx thread. It would be nice to have a log that said: |
| October 27, 2011 at 5:36 pm #7815 | |
|
Halle |
Have you seen any freezes with the stock OpenEars distribution? |
| December 30, 2011 at 5:18 am #8336 | |
|
sparkyreich |
Joseph, I am very interested in partial hypothesis. I was able to change that TRUE to false like you said, but have no idea where to start to make a partial hypothesis method. Would you be able to share that chunk of code and where you put it? I would be very grateful. Thanks! |
| December 30, 2011 at 7:32 am #8337 | |
|
sparkyreich |
I actually was able to figure out a rough version on my own. If you are willing to share, though, I would be interested to see. Thanks either way! |
You must be logged in to reply to this topic.

OpenEars
Our Flying Friends