| Author | Posts |
|---|---|
| Author | Posts |
| December 23, 2011 at 11:13 am #8255 | |
|
sparkyreich |
Hi. I would like to be able to have a method called for essentially each word or phrase detected or, let’s say, every second. |
| December 23, 2011 at 12:59 pm #8258 | |
|
Halle |
Hello, Sorry, this isn’t currently a feature of OpenEars. Generally, for long dictation tasks you are probably going to see best results with a server-based service. Even with live recognition, the end recognition result would probably be the same since it would be resolving the overall utterance using the same engine, acoustic model and language model. Based on my experimentation the advantages for long dictation tasks would be in better UI feedback (but doing speech-based correction on a single error in a long dictation is not fun for the user at all) but not notably different recognition results. The advantages of this kind of recognition are much bigger for shorter command-and-control models because reaction time will be faster for whatever you are controlling in addition to the UI advantage. |
| December 23, 2011 at 9:55 pm #8259 | |
|
sparkyreich |
Well, you see, I am attempting to have the audio processed as it is inputted because I need a reaction/action taken while the rest of the audio is still being inputted. For example, a long sentence with a key word at the beginning won’t react until the sentence is fully finished. I need it done within a reasonable time after the word is said. I just want to make sure you understood me. Is this at all possible with this framework? |
| December 24, 2011 at 12:09 am #8260 | |
|
Halle |
OK, I don’t think I quite understand yet why there is recognition needed both for the full sentence and just for a keyword in it while it is underway. Can you give an example? The framework does recognition after the user has paused in speech for the duration in kSecondsOfSilenceToDetect. That’s the only trigger method for beginning recognition that is supported in its API. |
| December 24, 2011 at 7:38 am #8261 | |
|
sparkyreich |
A good example would be I guess reading a book. If you were to read a book out loud, I would want it to show each word read on the screen as it is read (or soon after) and that way I could find when a key word is read and perhaps do a sound effect or something. But I still want each word processed for dynamic reasons. Like I said, I tried lowering the kSecondsOfSilenceToDetect to even 0.1 and it still seems to wait until a paragraph is entirely read to process due to some other background noise (which can’t be avoided and should be processed for key words also). I know it’s hard to understand. Sorry if my explanation is poor. Thanks for helping! |
| December 24, 2011 at 10:33 am #8262 | |
|
Halle |
OK, that makes sense. This just isn’t a feature of OpenEars. |
| December 24, 2011 at 10:37 am #8263 | |
|
sparkyreich |
Is there any way to perhaps modify the code to process the input after so many seconds instead of after a specified period of silence? Thanks again!
|
| December 24, 2011 at 11:06 am #8265 | |
|
Halle |
Only insofar as any code can be modified to do new things, but what you are talking about isn’t a quick drop-in of some code snippets kind of a situation, it’s nontrivial restructuring in more than one class using more than one language. Search for “partial hypothesis” on the CMU Sphinx forum to get some good ideas about where to get started. You might find that it simplifies things to use the old 0.902 version of OpenEars for experiments with this behavior so that you don’t need to worry about testing the ringbuffer in the audio driver as well. |
You must be logged in to reply to this topic.

OpenEars
Our Flying Friends