Reply To: Changing the recognized pause time

Home Forums OpenEars plugins Changing the recognized pause time Reply To: Changing the recognized pause time

#1024376
Halle Winkler
Politepix

Hi,

when I switch to offline listener

To clarify, both RapidEars and default OpenEars are offline speech recognition. Neither use the network, which is what online/offline refers to here.

RapidEars does realtime listening when you use one of its live delegate methods of OEEventsObserver+RapidEars, and OpenEars does pause-based listening, meaning that it performs recognition after an utterance is complete and the user silence pause period has occurred.

To address the specifics of your post:

secondsOfSilenceToDetect refers to a period in which no sound crosses the silence/speech threshold for more than a certain amount of frames. When you set it to a value like 7, you are saying that no notable sounds above the speech/silence threshold must occur for 7 uninterrupted seconds. That is never going to reliably happen that way – it is functionally equivalent to saying “except in cases of particular luck, never stop listening”, especially if you are testing in a noisy environment. It also would serve no purpose in a UI since a meaningful user pause (what secondsOfSilenceToDetect should correspond to) is probably maximum one second.

I requested setting up a replication case and only checking reasonable values because examining all the random outcomes possible with unrealistically-high values is not a good use of limited support time. This information from your post demonstrates that secondsOfSilenceToDetect is working for you as expected:

10.0f -> ~1 second
100.0f-> ~1 second
0.5f -> less than 1 second
0.9f -> 1-2 seconds
1.1f -> 1-2 seconds

At ridiculous values of 10 or over, secondsOfSilence is reset by OpenEars to the default of .7 (by the way, this default value is probably the only value you need). If you had logging on you would have received a message about those values being reset to defaults – take a look at this post entitled “Please read before you post – how to troubleshoot and provide logging info here” to understand the requirement for turning logging on during your own troubleshooting: https://www.politepix.com/forums/topic/install-issues-and-their-solutions/

So the excerpt from your list I’ve shown you consists only of values close to or at the default value, acting exactly as the default value is supposed to, by your description. The high values that I asked you not to test against are acting as I would expect them to (results are long and pretty random) since they don’t correspond to achievable periods of silence in an occupied environment which you have described as being particularly noisy, and they don’t correspond to the length of user pauses – they only represent a way to catch lots of intermittent noise.

Because you’ve focused both of your posts above after your initial question on gathering behavior with huge values of secondsOfSilenceToDetect that shouldn’t be used on user speech, it’s very difficult to see if you might have an actual bug or any strange behavior with normal values near the default and I also have less time to help you now since I’m spending time responding to it a couple of times.

So, if you are seeing a replicating issue with normal values (0.5, 0.7, 1.0, 1.1 are normal values that have a relationship to pauses in human speech, 7.0 is not a normal value unless you never want speech to finalize, 0.1 and 0.2 are not normal values unless you want to constantly interrupt the user’s speech) you can create a replication case exactly as described in this post:

https://www.politepix.com/forums/topic/how-to-create-a-minimal-case-for-replication/

Any follow-ups on this should take the form described by that replication post so the next step in the discussion is first-hand replication of one clear issue. This is the issue you reported before the digression about test results with huge secondsOfSilenceToDetect values:

The problem is, the hypotheses are divided as separate words as if I paused between speaking.

If upgrading to the current 2.03 version of both OpenEars and RapidEars didn’t fix this symptom (I would expect that it did, but maybe it didn’t), you can give me a case which replicates it so I can see it. It’s your choice whether you want to show me an issue with stock OpenEars or with RapidEars, just make sure it replicates this reported issue (hypotheses are separate words where you would expect a single continuous utterance as a hypothesis) and is in the form explained by the replication case post so I can see it directly, thanks.