HomeForumsOpenEarsps_process_raw without no_search?

This topic has 3 voices, contains 6 replies, and was last updated by  sparkyreich 140 days ago.

Viewing 7 posts - 1 through 7 (of 7 total)
Author Posts
Author Posts
October 25, 2011 at 9:56 pm #7793

Joseph S. Wisniewski

I’ve been tinkering with getting Sphinx to perform recognition during speech instead of after detecting an end of speech from the VAD.

I notice you defer the search in the listening loop in continuousModel by calling ps_process_raw with the no_search parameter set true, which means that Sphinx won’t do its three pass search until you get your timeout interval of cont_ad_read without VAD, and call ps_end_utt.

I switched over to
ps_process_raw(pocketSphinxDecoder, audioDeviceBuffer, speechData, FALSE, FALSE);
it seems to be stable, but I haven’t done your level of regression testing. The apps feel much snappier, though. Any thoughts? It’s a nice companion change to the multitasking flite controller.

That, and an added method to poll for partial hypothesis, and I get to watch the recognition in realtime ;)

October 25, 2011 at 10:07 pm #7794

Halle

Sounds very cool. The only reason I have no_search = TRUE is because after a great deal of testing it turned out that it caused crashes when set to false:

http://sourceforge.net/tracker/?func=detail&atid=101904&aid=3117704&group_id=1904

Another developer confirms the behavior in the discussion thread. But you will probably only see the crash in fewer than one session in ten IIRC (the 1 in 200 statement regards recognitions, not app sessions). I haven’t tested this against .7 to see if the bug was fixed. This has certainly been the day of discussing really obscure crashes :) .

October 27, 2011 at 4:06 pm #7806

Joseph S. Wisniewski

I’m seriously thinking of adding something that looks for a “stalled” Pocketsphinx and starts a new session. There’s more than one subtle way to freeze Pocketsphinx, it really needs a watchdog.

I see the synchronous recognition as being among the most desirable of features. Sphinx can’t match the performance, reliability, or language coverage of the “big” commercial product, Nuance. So, one has to concentrate on what makes it “different”. Right now, Nuance is pushing the client-server approach. They’ve got the dictation and the assistant market sewn up. But client-server is “laggy”, and there’s still UI needs for near instantaneous response, whether infomatics, medical, gaming, or home automation.

If one wants to carve a niche, that’s where to carve it, I think.

October 27, 2011 at 4:14 pm #7807

Joseph S. Wisniewski

I wonder what the stability implications are of letting Sphinx search when you’ve definitely got speech detected (the first ps_process_raw, when you’re getting VAD) but not in the “pump the tank empty” one you do after losing VAD, before the final ps_end_utt.

Definitely an experiment to try.

Especially if I put a “watchdog” around the Pocketsphinx thread. It would be nice to have a log that said:
12:04:14 session 14 started
12:09:18 sphinx froze after 8 recognition events, restarting
12:45:13 sphinx froze after 13 recognition events, restarting
13:27:16 session ended

October 27, 2011 at 5:36 pm #7815

Halle

Have you seen any freezes with the stock OpenEars distribution?

December 30, 2011 at 5:18 am #8336

sparkyreich

Joseph,

I am very interested in partial hypothesis. I was able to change that TRUE to false like you said, but have no idea where to start to make a partial hypothesis method. Would you be able to share that chunk of code and where you put it? I would be very grateful.

Thanks!

December 30, 2011 at 7:32 am #8337

sparkyreich

I actually was able to figure out a rough version on my own. If you are willing to share, though, I would be interested to see. Thanks either way!

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.