Forum Replies Created
Okay, thanks for the feedback.
At the moment, I am switching between the internal phone microphone input and bluetooth speaker output. That seems to work reasonably well, but not ideal.
Basically, I stay on category AVAudioSessionPlayAndRecord and switch the input port back and forth between HFP and the internal microphone.
This is not ideal, but it seems to be the best way to get it to work.
Otherwise, OE struggles recognizing commands using the BT microphone.
Hey Halle –
So, it seems that none of these settings will help my case, as far as I can tell. These settings all affect how OEContinuousAudioUnit::setAllAudioSessionSettings sets the AVAudioSession.
From what I can tell, disablePreferredSampleRate just stop OE from attempting to override the sample rate using this AVAudioSession command :
The issue that I have observed from the automotive BluetoothHFP systems is that the audio input all seems to be 8kHz. I have already tried to set preferredSampleRate to 16000, but it always results in the sample rate staying at 8kHz.
So… I really believe that the answer here is to ‘up-sample’ the input audio signal when 8kHz to 16kHz for OpenEars to take, cleanly. Yes, the captured audio will still have 8kHz resolution, but the bit depth will be at 16kHz which seems more compatible for OpenEars.
Hi Halle –
Okay, I will be looking into this, today. Thank You for the advice.
I am guessing that disablePreferredSampleRate might do the trick is it removes the requirement for 16kHz audio input. I will see how it goes.
Another phrase that triggers ‘Hello Focus’ is ‘Hero Focus’.
It’s very close, but not the same. Ideally, we would want the former to trigger and the latter to not.
I would also like to add that we are also using RuleORama to listen for “command phrases”. i.e. :
For the same reason as above, I have been using RuleORama for this task, because it nearly always hears the phrase and the whole phrase.
Where we running into problems is w/ false positives, just as in key phrase spotting. For example, if I say ‘Text Spaghetti Monster’ with the above grammar of only ‘Text Halle’ and ‘Call Halle’, it will match it to ‘Text Halle’.
This is really the missing piece of the puzzle for us, is trying to figure out how to get this listening technology more exclusionary. Right now, it is very forgiving in matching incoming audio to one of the entries in the grammar. I am looking to mimic the ability of Rejecto to throw away unwanted speech.
Again – any help that you can offer would be appreciated.
Hey Halle –
Thanks for the response. There must be quite a few ‘false positives’ for hello focus, because it does go off “quite a bit” (ambiguous, I know) during conversation. That being said, to the extent that I’ve hunted for false positives, I had the app listening while I had some YouTube videos running and here are some of the phrases that triggered it :
“Go for it, Mr. Robbins”
“Shell of this Person”
Those are the only two phrases that I know for sure, but I can run some more tests and get back to you.
One thought that I had was maybe to identify as many of these ‘false positives’ that I can and add them to the grammar to be later ignored if triggered. It just doesn’t seem very scientific.
I know that this has been said before, but I am really looking to keep the spot-on hair-trigger recognition accuracy that I am achieving w/ RuleORama+RapidEars, yet get the exclusivity that I see happening w/ Rejecto.
Right now, I can only guarantee one or the other, depending on technology. I am trying to find a way to ensure both. Is that too much to ask? (wink, wink). I kid.
Thanks so much for your help and advice.
No worries at all. These plugins work great.
I have just been figuring out the best way to use them for our purposes.
I think that we’ve gotten things pretty good.
Thanks for your support and responses.
Yes, I was indeed fooling myself.
In this forum :
I realized that secondsOfSilenceToDetect only take affect when starting the listener, anyway. So, I was probably experiencing the placebo effect.
My current thought is to cache the RapidEars result and if the regular OpenEars hypothesis does not kick in within a given threshold, just use it.
That should accomplish the same objective.
Thanks again for all the help and the outstanding plugins.
I am just dialing in some details on an MVP that I hope takes us places.
Hey Halle –
This ‘trick’ seems to be working for us, for the most part. Just hoping to hear some confirmation from you on whether or not this is the way to go.
Hey Halle –
Just a quick followup. So, I tried my suggestion and I set secondsOfSilenceToDetect to 0.f when RapidEars has a valid hypothesis for a command. I released a build and so far it’s working. We haven’t seen any hangs. I imagine that it does indeed force silence detection.
I will keep you updated, thanks.October 14, 2015 at 2:27 am in reply to: Recommend method to play sound with active speech recognition? #1027030
Posted in wrong forum – edited to delete.