Changing the recognized pause time

Tagged: OEPocketSphinxController, OpenEars, pause, silence duration

This topic has 8 replies, 2 voices, and was last updated 9 years, 3 months ago by Halle Winkler.

Viewing 9 posts - 1 through 9 (of 9 total)

Advertisement: “NeatSpeech is great-sounding offline speech synthesis, compatible with iOS6.1, and you can even edit pronunciations!”

Author

Posts
January 16, 2015 at 4:06 pm #1024358

batilc
Participant

Hello,

I am posting this here because the question is about OEPocketSphinxController.
I am using RapidEars, and trying to recognize a phrase as “Page Number One/Two/Three”. The problem below remains same when not using rapidEars as well.

The problem is, the hypotheses are divided as separate words as if I paused between speaking. I tried setting the secondOfSilenceToDetect property to an absurd 4 seconds, but it won’t change anything. I have also set latency tuning to 1. What should I do?

For this phrase only, I am checking on the rapidEarsDidReceiveFinishedSpeechHypothesis event.

Thank you

January 16, 2015 at 5:39 pm #1024359

batilc
Participant

An update on the situation:

While doing offline recognition, secondsOfSilenceToDetect actually partially works. 4.0f seconds results in 20seconds of waiting after pause to come up with a hypothesis. Is the value multiplied with something else on the inside?

On the other-hand, the RapidEars’ finished speech method does not affected by this property. How can I modify that?

I think this post belongs to “plugins” section in the forums.

January 16, 2015 at 8:00 pm #1024367

Halle Winkler
Politepix

This sounds like an audio problem and unfortunately that points to issues with your plugin. The reported issues don’t correspond to any known bugs in 2.03 or appear in my tests.

To troubleshoot this further, step one is to make sure that you have absolutely, positively done all of the upgrade steps in the upgrade guide since your app was previously using 1.7:

https://politepix.com/upgradeguide/

Step two is to make absolutely sure that you are using the most recent version of every framework, but especially OpenEars which had an update a few days ago (where to find and subscribe to changelogs: https://www.politepix.com/openears/changelog)

If that doesn’t fix the issues you’re seeing, you can create a replication case for me using the sample app:

https://www.politepix.com/forums/topic/how-to-create-a-minimal-case-for-replication/

and I’ll take a look at it.

January 16, 2015 at 8:22 pm #1024369

Halle Winkler
Politepix

BTW, go ahead and set up your replication case using a secondsOfSilence value that you actually want to use (the default of .7 is probably the best value to use, but if you want it to be a bit longer, there’s little reason to go higher than 1.0). If there’s an issue in the standard framework install and it isn’t confined to the plugin, we want to look at it happening with moderate values.

January 16, 2015 at 11:20 pm #1024372

batilc
Participant

Before posting here, I re-downloaded all the frameworks, but that did not solve it.

I’m having the problem with the RealTime Listener in my app, and also when I switch to offline listener I experience weird behaviour. I can duplicate the same weird behaviour on offline listener while using the sample app as well.

The only thing I added to the viewcontroller is the following:

[OEPocketsphinxController sharedInstance].secondsOfSilenceToDetect = 7.0f;

[OEPocketsphinxController sharedInstance].vadThreshold = 3.0f; //It’s noisy here :)

Ok, So I counted the seconds after I stop speaking and the app logs “Pocketsphinx detected a second of silence, concluding an utterance”. In some cases, no hypotheses were returned as well. Here are the timings I got:

7.0f -> 21 seconds (no hypothesis)
10.0f -> ~1 second
1.0f -> ~1 second
3.0f -> 3 seconds, 5 seconds, 7 seconds, 12 seconds changes a lot.
2.0f -> 2 seconds, 4 seconds, 7 seconds, 11 seconds changes a lot.
4.0f -> 9 seconds, 14 seconds, 22 seconds (no hypo. when +20)
0.5f -> less than 1 second
0.1f -> so quick that only the first word is recognized
0.9f -> 1-2 seconds
1.1f -> 1-2 seconds
1.6f -> changes between 2 seconds to 5 seconds
100.0f-> ~1 second
9.0f -> 10 seconds to 20+ seconds with no output when 20+.

It appears to me that the secondsOfSilence property works for values lesser than 1 second, and somehow increases much more than needed when it’s greater than 1. Behaviour is not consistent.

Would you like me to add RapidEarsDemo to the sample app and try editing soundsToDetect property? Should RapidEarsDetectedFinishedSpeech event get affected with this property? I don’t think its about the plugin because at this point I’m not tingling with it. The plugin just starts the generateLanguageModel function in objective-c side, which then pocketsphinx starts listening if no error is produced while language model creation. The timings I obtain are from Xcode console directly, so there’s no plugin interference yet.

January 17, 2015 at 10:47 am #1024376

Halle Winkler
Politepix

Hi,

when I switch to offline listener

To clarify, both RapidEars and default OpenEars are offline speech recognition. Neither use the network, which is what online/offline refers to here.

RapidEars does realtime listening when you use one of its live delegate methods of OEEventsObserver+RapidEars, and OpenEars does pause-based listening, meaning that it performs recognition after an utterance is complete and the user silence pause period has occurred.

To address the specifics of your post:

secondsOfSilenceToDetect refers to a period in which no sound crosses the silence/speech threshold for more than a certain amount of frames. When you set it to a value like 7, you are saying that no notable sounds above the speech/silence threshold must occur for 7 uninterrupted seconds. That is never going to reliably happen that way – it is functionally equivalent to saying “except in cases of particular luck, never stop listening”, especially if you are testing in a noisy environment. It also would serve no purpose in a UI since a meaningful user pause (what secondsOfSilenceToDetect should correspond to) is probably maximum one second.

I requested setting up a replication case and only checking reasonable values because examining all the random outcomes possible with unrealistically-high values is not a good use of limited support time. This information from your post demonstrates that secondsOfSilenceToDetect is working for you as expected:

10.0f -> ~1 second
100.0f-> ~1 second
0.5f -> less than 1 second
0.9f -> 1-2 seconds
1.1f -> 1-2 seconds

At ridiculous values of 10 or over, secondsOfSilence is reset by OpenEars to the default of .7 (by the way, this default value is probably the only value you need). If you had logging on you would have received a message about those values being reset to defaults – take a look at this post entitled “Please read before you post – how to troubleshoot and provide logging info here” to understand the requirement for turning logging on during your own troubleshooting: https://www.politepix.com/forums/topic/install-issues-and-their-solutions/

So the excerpt from your list I’ve shown you consists only of values close to or at the default value, acting exactly as the default value is supposed to, by your description. The high values that I asked you not to test against are acting as I would expect them to (results are long and pretty random) since they don’t correspond to achievable periods of silence in an occupied environment which you have described as being particularly noisy, and they don’t correspond to the length of user pauses – they only represent a way to catch lots of intermittent noise.

Because you’ve focused both of your posts above after your initial question on gathering behavior with huge values of secondsOfSilenceToDetect that shouldn’t be used on user speech, it’s very difficult to see if you might have an actual bug or any strange behavior with normal values near the default and I also have less time to help you now since I’m spending time responding to it a couple of times.

So, if you are seeing a replicating issue with normal values (0.5, 0.7, 1.0, 1.1 are normal values that have a relationship to pauses in human speech, 7.0 is not a normal value unless you never want speech to finalize, 0.1 and 0.2 are not normal values unless you want to constantly interrupt the user’s speech) you can create a replication case exactly as described in this post:

https://www.politepix.com/forums/topic/how-to-create-a-minimal-case-for-replication/

Any follow-ups on this should take the form described by that replication post so the next step in the discussion is first-hand replication of one clear issue. This is the issue you reported before the digression about test results with huge secondsOfSilenceToDetect values:

The problem is, the hypotheses are divided as separate words as if I paused between speaking.

If upgrading to the current 2.03 version of both OpenEars and RapidEars didn’t fix this symptom (I would expect that it did, but maybe it didn’t), you can give me a case which replicates it so I can see it. It’s your choice whether you want to show me an issue with stock OpenEars or with RapidEars, just make sure it replicates this reported issue (hypotheses are separate words where you would expect a single continuous utterance as a hypothesis) and is in the form explained by the replication case post so I can see it directly, thanks.

January 17, 2015 at 2:34 pm #1024377

batilc
Participant

Bah, you are right. This post has gone too far about testing high-values for soundsToDetectSilence, whereas my issue is different. I guess I lost focus of my original problem while tinkering with that property.

You gave the answer I was expecting for getting seemingly random wait-times due to the noise which cancels the “silence period”. That’s why I had set the vadThreshold pretty high, but I guess it still gets some noise as speech.

OK, I’m gonna add RapidEarsDemo to the sample app with minimal changes, and replicate my issue there to see if it persists. In any case, I am going to share the results I obtained in my next post here.

January 17, 2015 at 4:17 pm #1024379

batilc
Participant

OK, it seems the issue has been resolved while I was configuring the pocketsphinx. You can now close this thread.

I got reasonable behaviour from the sample app, then copy pasting its configuration to my app solved the problem.

It might be that at the time I opened this thread, there was quite an environmental noise and vadThreshold probably couldn’t make up for it. Now, everything seems fine and functional. Thank you.

January 17, 2015 at 4:24 pm #1024380

Halle Winkler
Politepix

Super! You’re welcome.
Author

Posts

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic.