November 13, 2015 at 4:43 am #1027302
I am looking to build an application that takes a recording created by the user (presumably with SavethatWave) and then compares that real time with the speech occurring (presumably with RapidEars) to find a match. When a match is found, I’m looking to increment a score.
In looking through the forum I saw that I can use runRecognitionOnWavFileAtPath:usingLanguageModelAtPath:dictionaryAtPath:languageModelIsJSGF: – but how do I do the comparison, or is that runRecognition running the realtime comparison of the wav file to the live speech? Just looking for some more context or directional help.
Thanks!November 13, 2015 at 11:21 am #1027304
Can you describe the intended implementation and results in a bit more detail? OpenEars isn’t designed to do comparisons between audio files so I think I’m not quite understanding the goal yet.
The aspect that I’m least clear about is the time or ordering relationship between 1) using SaveThatWave to make a recording (by definition not occurring in real time since the recording can’t exist until the utterance is fully complete, and presumably not the recording from the realtime session because that one is intended to run a comparison with WAV produced from this phase) and 2) the realtime recognition (happening while a mic utterance is in progress, so not using the WAV recording) and 3) running WAV-based recognition on the original WAV (happening in a third time period altogether since it requires the existence of a fully-recorded WAV and requires that realtime recognition isn’t in progress).
I’ll be happy to give some advice if I can, once I get the goal a little better. Or maybe there is a simpler way to accomplish the underlying speech interface task, if you want to elaborate on that.November 13, 2015 at 8:22 pm #1027306
The intent is to use save the wave to pre-record a specific word. Then, the app would “listen” to a live conversation and increment a score when it hears a match.
I.e. The app prompts me to record myself saying ‘openears’ and then during a subsequent conversation (denoted by the user selecting a Start button), the app will give me a point every time I say ‘openears’.November 13, 2015 at 8:28 pm #1027307
Why not detect whether the user has said “OpenEars” by creating a language model and dictionary containing the word “OpenEars”?November 15, 2015 at 3:07 am #1027315
I didn’t realize there was an option that was that simple – so you’re saying that I can create a custom dictionary for whatever words I want to match against – the user doesn’t have to provide an Audio sample of their own to match against? What are the accuracy rates across different accents?
After it finds a match how do I programatically track when the words are said?November 15, 2015 at 11:12 am #1027316
Correct, easily making statistical speech models and grammars for you is one of the most important functions of the framework. Take a look at the sample app and run through the tutorial here, which should answer your questions as well as provide you with a starting custom implementation: https://politepix.com/openears/tutorial
If you have the same (or new) questions afterwards I’ll be happy to answer them.
- You must be logged in to reply to this topic.