January 20, 2016 at 5:29 pm #1027736
Thanks again for making this framework + library Halle, and for taking the time to read and answer these forum posts.
I have an app where reading out loud is the only interface, I want to be able to know when each and every word is said (this is the dream)
I am getting consistently good results using rule-o-rama with a wired mic, hitting each word in an expanding array of words that make up each grammar (one of these can be said once : a, a b, a b c, a b c d) and switching to the next pre-computed grammar during periods of silence (transitions, also sometimes I switch to probabilistic mid-sentence depending on errors or strangeness )
This works great when using a wired microphone ( I have not tested bluetooth ) but is not good at all when used without a microphone and at arms length.
At this point what I do is switch to a probabilistic model (rapid ears) with Rejecto – but I don’t get as good results.
Is this what would be expected? Should I tweak the experience as well as the listening method when not using a microphone?
LiamJanuary 20, 2016 at 5:36 pm #1027737
You’re welcome, thanks for your kind words. Quick question, are you aware that RuleORama models work with RapidEars? Or is there a different reason that the switch is between RuleORama and RapidEars?
It is expected that distance recognition is going to lose accuracy, yes. This may take the form of insertions or omissions of heard words. Have you experimented at all with changing the vadThreshold value for more-distance speech?January 20, 2016 at 6:11 pm #1027738
I’m using RuleORama with Rapid ears, sorry to use the wrong term; I switch to a probabilistic model when not using a microphone, but always using rapid ears.
I haven’t tried playing with the VAD threshold yet, I’ve seen it here in the forums but I actually assumed this value is controlled by openEars and changes based on your environment.
Would it be sensible to change the VAD threshold for arms length mode? If so can you explain a little bit about how this works?
In a quiet environment at arms length for instance would it be better to have a lower VAD threshold?
Thanks again for your time, If I can get this prototype working I will get my office to buy some email support, I know I ask a lot of questions!
LiamJanuary 22, 2016 at 12:47 pm #1027752
It’s fine to ask questions here; they are useful for other potential askers who can then read the discussion. The vadThreshold property is for situations like this where you want a globally more- or less-sensitive noise threshold that you can switch between logically at runtime. The Sphinx VAD will also make some alterations to noise levels over the life of the session, that’s correct. What to do with vadThreshold is dependent on the nature of the “badness” with arms-length listening.
I always recommend first starting by turning off Rejecto (you’ll turn it back on later) and then set up a test and get the most-accurate result you can by tuning the vadThreshold. This is so that you don’t make attribution errors to results caused by Rejecto (i.e. thinking the threshold is too high and is omitting too much speech when actually it isn’t, but Rejecto is rejecting the speech farther down the line). This way you don’t need to take my advice but you can just see firsthand what works best.
Once you have a best-results vadThreshold it’s time to add Rejecto back in, paying special attention to its tweakable settings such as weight and ignoring phonemes and vowels-only.
- You must be logged in to reply to this topic.