January 23, 2014 at 10:57 pm #1019882
Thank you for your awesome work on OpenEars. It is turning out to be quite a powerful library.
For our app, we need to allow the user to enable and disable recording by tapping a button. Is there an easy way to achieve this, without having to re-initialize pocketsphinx each time they tap record?
We want the recognition to start immediately after they tap “Record”, and to stop immediately when they tap “Stop”.
Cheers!January 24, 2014 at 8:36 am #1019888
There is no built-in support for this, however, I think a very easy way to simulate it would be to handle your own start/stop recording with the audio recording API of your choice (that has certainly gotten easier lately), and then simply submit a WAV file to
- (void) runRecognitionOnWavFileAtPath:(NSString *)wavPath usingLanguageModelAtPath:(NSString *)languageModelPath dictionaryAtPath:(NSString *)dictionaryPath acousticModelAtPath:(NSString *)acousticModelPath languageModelIsJSGF:(BOOL)languageModelIsJSGFJanuary 24, 2014 at 9:03 am #1019890
Thank you for the fast response!
That would certainly work. We would like to be able to immediately start live recognition as soon as the user taps the record button. In other words, we would like to replace the voice activity detector with a button.
Is this possible?
Any suggestions?January 24, 2014 at 9:15 am #1019891
You’re welcome! That seems possible, see my suggestion above. I think that by using AVAudioRecorder and its prepareToRecord method you could handle all of your own start/stop and WAV-packaging requirements in around 20 lines of code; give its docs a look.January 24, 2014 at 5:50 pm #1019900
The problem with that is the speed. Our user wants to tap and immediately speak and get text immediately. If we record to a file, we can’t start transcription until the file is closed.
We want to do live (like RapidEars) recognition while the user speaks, but we want to have a button to start and stop the microphone.January 24, 2014 at 5:51 pm #1019901
I guess we can short circuit the render callback to return if the user hasn’t tapped Record. Does that seem like a good option?January 24, 2014 at 6:06 pm #1019904
Hmm. If you want to do this with RapidEars, I guess the following is possible:
1. Suspend immediately once listening has started. The user “start” interaction will cause recognition to resume. This has the same effect as your short-circuiting idea but uses the API.
2. The user “stop” interaction causes buffers of prerecorded non-speech (it has to be real low-noise-level quiet non-speech recording and not just zeroes, which the VAD will correctly ignore) to be written over the rendered callback buffer for a number of callbacks equal to .7 seconds (that’s about 6 callbacks). This should result in a natural exit from the silence detection loop as the silence makes its way into the VAD. This delay shouldn’t be a big deal for you since live recognition will already have been in progress of being performed and displayed as soon as the mic stream began, so this is just a formality to get the final hypothesis.
This should work to start and stop in both PocketsphinxController and PocketsphinxController+RapidEars.
- You must be logged in to reply to this topic.