Button to enable/disable recording

This topic has 6 replies, 2 voices, and was last updated 10 years, 2 months ago by Halle Winkler.

Viewing 7 posts - 1 through 7 (of 7 total)

Advertisement: “RapidEars is an OpenEars™ plugin that lets you perform speech recognition while the user is still speaking!”

Author

Posts
January 23, 2014 at 10:57 pm #1019882

andrew
Participant

Hi Halle,

Thank you for your awesome work on OpenEars. It is turning out to be quite a powerful library.

For our app, we need to allow the user to enable and disable recording by tapping a button. Is there an easy way to achieve this, without having to re-initialize pocketsphinx each time they tap record?

We want the recognition to start immediately after they tap “Record”, and to stop immediately when they tap “Stop”.

Cheers!

January 24, 2014 at 8:36 am #1019888
Halle Winkler
Politepix
Welcome Andrew,

There is no built-in support for this, however, I think a very easy way to simulate it would be to handle your own start/stop recording with the audio recording API of your choice (that has certainly gotten easier lately), and then simply submit a WAV file to
```
- (void) runRecognitionOnWavFileAtPath:(NSString *)wavPath usingLanguageModelAtPath:(NSString *)languageModelPath dictionaryAtPath:(NSString *)dictionaryPath acousticModelAtPath:(NSString *)acousticModelPath languageModelIsJSGF:(BOOL)languageModelIsJSGF
```
January 24, 2014 at 9:03 am #1019890

andrew
Participant

Thank you for the fast response!

That would certainly work. We would like to be able to immediately start live recognition as soon as the user taps the record button. In other words, we would like to replace the voice activity detector with a button.

Is this possible?
Any suggestions?

January 24, 2014 at 9:15 am #1019891

Halle Winkler
Politepix

You’re welcome! That seems possible, see my suggestion above. I think that by using AVAudioRecorder and its prepareToRecord method you could handle all of your own start/stop and WAV-packaging requirements in around 20 lines of code; give its docs a look.

January 24, 2014 at 5:50 pm #1019900

andrew
Participant

The problem with that is the speed. Our user wants to tap and immediately speak and get text immediately. If we record to a file, we can’t start transcription until the file is closed.

We want to do live (like RapidEars) recognition while the user speaks, but we want to have a button to start and stop the microphone.

January 24, 2014 at 5:51 pm #1019901

andrew
Participant

I guess we can short circuit the render callback to return if the user hasn’t tapped Record. Does that seem like a good option?

January 24, 2014 at 6:06 pm #1019904

Halle Winkler
Politepix

Hmm. If you want to do this with RapidEars, I guess the following is possible:

1. Suspend immediately once listening has started. The user “start” interaction will cause recognition to resume. This has the same effect as your short-circuiting idea but uses the API.

2. The user “stop” interaction causes buffers of prerecorded non-speech (it has to be real low-noise-level quiet non-speech recording and not just zeroes, which the VAD will correctly ignore) to be written over the rendered callback buffer for a number of callbacks equal to .7 seconds (that’s about 6 callbacks). This should result in a natural exit from the silence detection loop as the silence makes its way into the VAD. This delay shouldn’t be a big deal for you since live recognition will already have been in progress of being performed and displayed as soon as the mic stream began, so this is just a formality to get the final hypothesis.

This should work to start and stop in both PocketsphinxController and PocketsphinxController+RapidEars.
Author

Posts

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.