Reply To: Key Words from Reading

March 6, 2017 at 2:29 pm #1031656

Politepix

Yes, thank you for clarifying. It’s quite important to get good testing data before you start trying to fix issues by altering settings, because changing them on the basis of bad data will result in worse results for the average user and a situation where it isn’t possible to get help (for instance from me) due issues with subjective data collection such as having too few reports, non-replicable reports, or reports in which there were other occurrences which affected recognition that you don’t know about (for instance noise or distance) so let’s talk a little bit about how to set up tests for languages that aren’t being tested firsthand in the office.

The first thing to keep in mind is that you can’t use any synthesized speech (like Yandex) because it doesn’t have enough data, so that will only confuse your troubleshooting process.

The second thing is that when you test with humans, to not rely on subjective reports of interactions when troubleshooting, even in your own language but especially with a language you aren’t testing natively in-house, at least until you have a high level of confidence in what is happening, because you have no way of seeing the environmental situation or of replicating the results. And then in that case I am actually the third party removed from the original subjective report so I can’t help effectively (assuming it isn’t just a limitation of the acoustic model but something in the framework that I can help with).

It’s possible for you to obtain complete recordings of the user speech and then to feed them into OpenEars in test mode, so you can replicate the user’s experience. This post is about giving me replicable cases, but it also explains how to use the SaveThatWave demo in order to obtain audio and then use pathToTestFile so you can observe the results yourself: https://www.politepix.com/forums/topic/how-to-create-a-minimal-case-for-replication/

For your own app, of course, it isn’t necessary to put it all inside of the sample app’s code (that’s just if you want to show it to me for help), but it should get you started with setting up replicable testing for your own app.

It’s important not to turn on Rejecto until you are very confident that vadThreshold is correct for the acoustic model (this would usually mean that a sigh is not processed as speech). You may need to test this yourself; it isn’t really necessary to be a native speaker in order to make sure that the vadThreshold is rejecting as much non-speech as possible. It does sound like vadThreshold should be higher in your case.

Once you have confidence in vadThreshold, you can obtain recorded speech as described in the linked post, and start to tune your Rejecto settings (starting from the default settings). If you continue to get unexpected results, you can give me a full replication case as described in the linked post so I can look into whether it’s a settings issue.