[Resolved] Seeing an issue with long-term voice recognition

This topic has 26 replies, 3 voices, and was last updated 10 years ago by Halle Winkler.

Viewing 27 posts - 1 through 27 (of 27 total)

Advertisement: “RapidEars is an OpenEars™ plugin that lets you perform speech recognition while the user is still speaking!”

Author

Posts
February 28, 2014 at 3:04 pm #1020382

wfilleman
Participant

Hi Halle,

Got another issue to pick your brain on.

I’ve got several users who are using the voice recognition feature long-term in the background of their iOS device. Although it happens when the app is in the foreground as well. What they report and I’ve confirmed, is that after a long period of time (I was able to see it about about 3 hours) the OpenEars no longer responds to voice commands.

When I ran the debugger on my iPad, what’s happening after a long period of time is the following:

OpenEars appears to be listening ok and actually does pick up the audio right before it appears to be unresponsive, but in the debug prints I see this:

2014-02-27 22:08:49.110 MobiLincHD[491:d46f] Processing speech, please wait…
INFO: file_omitted(0): cmn_prior_update: from < 38.99 -7.17 -4.31 -3.34 -2.30 -1.41 -0.54 -1.32 -0.06 -0.11 -0.10 0.34 0.39 >
INFO: file_omitted(0): cmn_prior_update: to < 38.43 -7.16 -4.34 -3.22 -2.40 -1.42 -0.56 -1.35 -0.10 -0.10 -0.13 0.34 0.40 >
INFO: file_omitted(0): 20536 words recognized (42/fr)
INFO: file_omitted(0): 172257 senones evaluated (352/fr)
INFO: file_omitted(0): 80936 channels searched (165/fr), 7776 1st, 56095 last
INFO: file_omitted(0): 25358 words for which last channels evaluated (51/fr)
INFO: file_omitted(0): 5056 candidate words for entering last phone (10/fr)
INFO: file_omitted(0): fwdtree 0.72 CPU 0.147 xRT
INFO: file_omitted(0): fwdtree 5.33 wall 1.088 xRT
INFO: file_omitted(0): Utterance vocabulary contains 49 words
INFO: file_omitted(0): 10036 words recognized (20/fr)
INFO: file_omitted(0): 154390 senones evaluated (315/fr)
INFO: file_omitted(0): 94457 channels searched (192/fr)
INFO: file_omitted(0): 20010 words searched (40/fr)
INFO: file_omitted(0): 14609 word transitions (29/fr)
INFO: file_omitted(0): fwdflat 0.24 CPU 0.049 xRT
INFO: file_omitted(0): fwdflat 0.24 wall 0.049 xRT
INFO: file_omitted(0): </s> not found in last frame, using ___REJ_K.488 instead
INFO: file_omitted(0): lattice start node <s>.0 end node ___REJ_K.274
INFO: file_omitted(0): Eliminated 1108 nodes before end node
INFO: file_omitted(0): Lattice has 2561 nodes, 24194 links

and then nothing else gets printed. In a normal scenario I would usually see the following, but this never appears (I let it run for another 12 hours before killing it):

INFO: file_omitted(0): Normalizer P(O) = alpha(<sil>:94:97) = -780488
INFO: file_omitted(0): Joint P(O,S) = -788376 P(S|O) = -7888
INFO: file_omitted(0): bestpath 0.22 CPU 0.229 xRT
INFO: file_omitted(0): bestpath 0.23 wall 0.234 xRT
2014-02-27 22:08:35.843 MobiLincHD[491:d46f] Pocketsphinx heard ” ” with a score of (-7888) and an utterance ID of 000000039.

Any ideas why the framework isn’t able to progress past the ngram_search_lattice() call? I’m assuming that this is returning a NULL which is why I’m not seeing the call into ngram_search_bestpath(). But why would that cause a complete breakdown of mic sound recognition?

I haven’t seen it on short time frame durations. This is definitely something that pops up under long term OpenEars use. (3+ hours).

I tried this using both the VoiceChat and Default modes. Both exhibited the exact same behavior.

Any ideas would be greatly appreciated!

Thanks Halle!
Wes

February 28, 2014 at 3:18 pm #1020383

Halle Winkler
Politepix

Ha, OK, first of all I will take it as a sign of generally good behavior both on the part of OpenEars and Pocketsphinx that it is even possible to discuss what it does after over three hours has elapsed! :) .

But, yes, I am tracking a bug regarding very long searches. It isn’t technically that they don’t return, but that the return is so delayed that it feels like a hang (from a UX perspective the same problem IMO).

It is happening because the search space on these searches is too big for some reason (my early impression is that the reason is a very long utterance due to some persistent noises being taken for an extended speech utterance, combined with something about language model weight values) and it is my current top priority to figure out and fix, but also a challenging issue to pin down. Here are the current correlations for this bug, I’m very interested in any new info you can give me about yours:

1. It appears consistently when a weight above 1.2 is applied to Rejecto when there is a particularly long utterance. This is verified.
2. It has been reported without Rejecto when background volume increases suddenly, although my own tests with the most recent version of the OpenEars beta do not replicate this.

Here is the most recent beta link:

https://politepix.com/wp-content/uploads/OpenEarsDistributionBeta.tar.bz2

I can’t take any test cases that occur over periods of hours, but if you can provide me with a test case that occurs in fewer than 10 minutes, based on the sample app plus an audio recording added to pathToTestFile, I will be very happy to add it to the data on this bug and it will help to get a faster fix.

February 28, 2014 at 3:30 pm #1020384

wfilleman
Participant

Hehe, I’ll expect the same from OpenEars as my customers do from me. Total 100% reliable operation. :)

Actually, from the debug logs, it looks quite clean and is working well, but yes, there is a weird bug here at play. Sounds like you might be on to it.

To answer your points, I’m using Rejecto, but only in the default mode. Meaning, I’m not changing the weighting. I left it as the default (presumably 1.0).

I have no data points without Rejecto, as Rejecto was critical in my application.

It is happening because the search space on these searches is too big for some reason (my early impression is that the reason is a very long utterance due to some persistent noises being taken for an extended speech utterance

This *sounds* right to me. I could see this happening over long periods of listening where OpenEars is exposed to the environment sound for too long and extended speech is triggered.

I’ll try out the beta release and let it run for hours and see if it behaves any differently and let you know later today.

Thanks!
Wes

February 28, 2014 at 3:44 pm #1020385

Halle Winkler
Politepix

OK, let me know what happens with the beta (it has a correction for a case in which a probability for a vocabulary word or Rejecto phoneme can be calculated wrongly, which I think may be a co-traveler with this bug even though I no longer think it’s the only cause for it). It’s helpful to know that you aren’t using Rejecto weighting, because that takes some suspicion away from Rejecto and focuses it more onto the main OpenEars/Pocketsphinx implementation, which at least narrows the field.

February 28, 2014 at 5:20 pm #1020386

wfilleman
Participant

Hi Halle,

Early report: Been running for about an hour and observing sound in the room and how OpenEars is handling it.

I’m seeing cases where long sustained background noise does cause OpenEars to take a long time to process where the CPU shoots to 100%. I’ve seen take as long as 60-90 seconds before it eventually returns to listening and all is normal again (0% CPU and listening normally).

This is slightly different than what I was seeing on 1.65 where OpenEars wasn’t returning in my test case but the CPU was 0% (never saw it peak at 100%)

Is this consistent with what you see on your end?
Is this considered “normal”?
Obviously having the CPU shoot to 100% for long periods of time isn’t desirable, but wanted your comments on this.

Leaving my test case running. Will check back in later.

Wes

February 28, 2014 at 5:33 pm #1020387

Halle Winkler
Politepix

No, the two reports I got were about 1.65 and they were about the symptom of taking 20+ seconds to return with high CPU use. I haven’t gotten reports of never returning with 0% CPU*. I believe that that isn’t technically possible if you’re in ngram_search_lattice() because that is a search (i.e. the one thing that takes a lot of CPU in OpenEars). If you were simultaneously seeing 0% CPU and stuck-ness in ngram_search_lattice() I would expect that something about that was a mismatch between what Xcode was telling you and what was actually happening.

* I have received one report of non-returning that turned out to be very delayed returning.

I would be very surprised if the change in the beta could have the effect of increasing a bug symptom. It only changes occasional inaccurate probabilities to be normal ones so it can’t really be implicated in a negative behavioral change to the best of my knowledge.

What I have seen about this bug, and the reason it is very challenging, is that it is very intermittent and very nondeterministic so what might seem like a new/different behavior might be a behavior that was there previously that hadn’t yet manifested in front of you, meaning that a new thing is not necessarily related to the beta.

February 28, 2014 at 5:42 pm #1020388

wfilleman
Participant

Thanks Halle,

I can see about capturing the background noise that’s causing the high CPU if you are interested. I seem to have a way of reproducing it.

I agree, this is a different behavior than what I saw on 1.65. I’m not sure it was “stuck” in ngram_search_lattice(), but it definitely didn’t proceed to ngram_search_bestpath().

XCODE could have been misleading on the CPU usage, but the iPad wasn’t hot like it had been running at 100% for hours on end.

So far, so good, minus the high CPU on occasion. Let me know if you’d like me to capture the reproducible noise that causes the CPU to peg.

Wes

February 28, 2014 at 6:08 pm #1020389

Halle Winkler
Politepix

OK, so my understanding is that the beta represents an improvement for you because it means you aren’t permanently losing the reactivity of the UI, or at least you are not seeing any new manifestations of that behavior, but it is also (obviously) not optimal yet because of the remaining issue with the long searches, is that correct? So it is possible that the language model fix in the beta is at least addressing the issue with the 0% CPU stuck search, which sounds like a real thing based on your description of the cool device.

Thank you, it would be very helpful to have the test case for the long searches in the beta. BTW, I don’t know if you saw this but SaveThatWave 1.65 has a feature now to capture an entire recognition session from startListening: to stopListening and the demo will run for 3 minutes, so you could use the demo to do a direct capture of a session that gets weird if you can get it to happen in fewer than 3 minutes by using the new SaveThatWaveController method startSessionDebugRecord.

Then you can drop that WAV right into pathToTestFile in your sample app and I should (more or less – none of this is perfectly deterministic) be able to see what you saw.

February 28, 2014 at 6:34 pm #1020390

wfilleman
Participant

OK, so my understanding is that the beta represents an improvement for you because it means you aren’t permanently losing the reactivity of the UI, or at least you are not seeing any new manifestations of that behavior, but it is also (obviously) not optimal yet because of the remaining issue with the long searches, is that correct?

Yes, so far. I’ve been running for 3+ hours and OpenEars is still going strong. I couldn’t get this far before, so, yes, at least in my single test case the beta appears to have addressed what I saw earlier in 1.65 as noted at the top of this post.

Ah, great! I didn’t know about SaveThatWave. I was thinking about how I’d accomplish that. Ok, let me experiment around with it and see if I can capture the session that pegs the CPU to 100%.

What’s the best way to send you the sample project with the wav file?

Wes

February 28, 2014 at 6:50 pm #1020393

Halle Winkler
Politepix

Yes, so far. I’ve been running for 3+ hours and OpenEars is still going strong. I couldn’t get this far before, so, yes, at least in my single test case the beta appears to have addressed what I saw earlier in 1.65 as noted at the top of this post.

OK, that’s good news – this is the first feedback I’ve gotten from reporters of this issue about the effect of the improvements in the beta, so we’ll keep fingers crossed that we’ll continue to only see the current symptom with the increasing background noise. A little more background: a delay due to suddenly-increasing background noise is expected behavior, because that means that the voice activity detection doesn’t have a way of distinguishing the speech/silence transition anymore, since the calibration values became irrelevant inside of a single utterance. Under these conditions, it should notice that this happened and sort itself out in about 14 seconds (this can be made a bit shorter but there are other tradeoffs to doing so, so if it is an uncommon occurrence this timeframe is probably about right).

So we’re only seeing the high CPU and inaccurate speech/silence threshhold as dysfunctional if it takes notably longer than 14 seconds to self-correct, or if this long CPU usage occurs in the absence of a swift increase in background noise. Sometimes completely normal searches can take 1-2 seconds and use 99% CPU, so just seeing a strenuous search isn’t a bug on its own.

What’s the best way to send you the sample project with the wav file?

Ideally, put it up somewhere I can download it and send me the link via the email we’ve talked over previously.

February 28, 2014 at 7:33 pm #1020394

wfilleman
Participant

Hi Halle,

A little more background: a delay due to suddenly-increasing background noise is expected behavior, because that means that the voice activity detection doesn’t have a way of distinguishing the speech/silence transition anymore, since the calibration values became irrelevant inside of a single utterance. Under these conditions, it should notice that this happened and sort itself out in about 14 seconds (this can be made a bit shorter but there are other tradeoffs to doing so, so if it is an uncommon occurrence this timeframe is probably about right).

Ok, thanks. I can’t say if I’ve seen this or not. Possibly, and didn’t know why it wasn’t responding to my commands, but then sorted itself out. I’ll pay more attention here.

Sometimes completely normal searches can take 1-2 seconds and use 99% CPU, so just seeing a strenuous search isn’t a bug on its own.

Agreed. 1-2 seconds is fine.

Ok, got good news for you. I’ve got it 100% reproducible in the sample app and it’s exposing itself with Rejecto and the word set I’m using.

Test Cases:
– I tried just the OpenEars beta and the stock words. Can’t get it to fail with my background noise.
– I loaded up my set of words in my app in the sample app (“CHANGE MODEL”). Can’t get it to fail with my background noise.
– Loaded the Rejecto demo and used the stock words. Can’t get it to fail with my background noise.
– Loaded the Rejecto demo and switched over to my set of words from my app (“CHANGE MODEL”). I can easily get the CPU to peg 100% for 10-20 seconds. Sometimes it’ll go for a minute or never exit until the Rejecto demo times out.

So, there’s a combination between Rejecto and my set of words where my background noise causes the CPU peg to occur.

The background noise I’m doing is my spoon in my coffee mug mixing up my coffee + sugar in the morning :) Total fluke that I happened to spot the correlation.

I’ve got the sample project with my test wav files zipped up and I’ll be sending you the link here soon.

One thing I had a hard time with, even though I could record the wave files, playing them back through the path directive didn’t show the bug because I had to “CHANGE MODEL” over to my set of words to expose the problem. Can’t do that with the path directive. Hopefully this is enough info that you can use to debug.

March 2, 2014 at 8:50 am #1020401

Halle Winkler
Politepix

I can’t yet get this sample app to replicate the issue – is changing the model necessary to seeing it? Does it not happen if you start the app running from your Rejecto model in the first place?

March 2, 2014 at 9:37 am #1020402
Halle Winkler
Politepix
Another question: there is a line in PocketsphinxRunConfig.h as follows:
```
//#define kMAXHMMPF @"3000" // -maxhmmpf	Maximum number of active HMMs to maintain at each frame (or -1 for no pruning), defaults to -1
```
If you uncomment it (taking care to make sure the test case is using the newly-compiled framework with the new value), does the slow search issue improve?
March 2, 2014 at 5:20 pm #1020409

wfilleman
Participant

Hi Halle,

Yes, changing the model was necessary to load in my set of words. With my set of words and the background noise it would consistently peg the CPU to 100% for extended periods of time.

I’ll take a look at the #define today and give it a try and let you know what happens, thanks.

Wes

March 2, 2014 at 7:03 pm #1020410

Halle Winkler
Politepix

Yes, changing the model was necessary to load in my set of words.

Why is this? Is it not possible to replicate the symptom if your set of words is the set that is initially loaded by the sample app when listening is started?

March 2, 2014 at 7:42 pm #1020411

wfilleman
Participant

Hi Halle,

I don’t think I tried that. I’ve always done the dynamic generation of language files and that’s what I stuck with in the sample app to show the issue.

I can try adding my words to the built in set and retry.

Wes

March 2, 2014 at 8:10 pm #1020412

Halle Winkler
Politepix

I just meant to start with your dynamic model instead of the built-in one. That way you can remove model switching from the test so it’s more straightforward.

March 2, 2014 at 9:26 pm #1020413

wfilleman
Participant

Thanks Halle, I will do that.

Ok, I did as you suggested with the #define and there is a different behavior now.

I can still reliably get the CPU to peg at 100% in my background noise reproduction, but the initial impression I’m seeing is soon after the background repetitive noise stops (maybe 10 seconds later) the CPU does drop back to 0%. Before it was pretty easy to get it to sit at 100% indefinitely.

To further clarify: With the #define uncommented, the CPU will sit at 100% while the repetitive background noise is in effect. It will stay at 100% for sometimes up to 10-15 seconds after the background noise has stopped.

So, not sure if this is fixed in your mind, but it does appear to be slightly better in a sense.

Ok, next up is to get you a test build that can reproduce it…

Wes

March 2, 2014 at 9:40 pm #1020416

Halle Winkler
Politepix

Interesting, so it is definitely at least implicated. These could be two separate symptoms – the one which resolves quickly could be a completely-fixed version of the old symptom, while the one that resolves in 10-15 seconds could be a VAD correction working as designed, since that’s about the timeframe in which that happens. You can tell if it’s a VAD correction if OpenEarsLogging makes a statement about the VAD being recalibrated after one of those 10-15 second long search returns.

March 2, 2014 at 10:08 pm #1020417

wfilleman
Participant

while the one that resolves in 10-15 seconds could be a VAD correction working as designed, since that’s about the timeframe in which that happens.

Ah, ok. I will pay attention to that then.

Is the CPU expected to hit 100% continuously during repetitive background noise or is this not what is supposed to happen?

Still working on getting a proper demo setup to send over to you.

March 3, 2014 at 3:53 am #1020420

wfilleman
Participant

Hi Halle,

Ok, I’ve sent over the demo with a WAV file that when played into the sample app will show you the 100% CPU issue.

FYI: due to the nature of getting the 100% CPU issue to show itself, play the embedded 30 second WAV file through a set of speakers so the sample app can hear it when running on an iOS device. After a couple of cycles between recognizing speech and listening you should see the CPU ramp up to 100% CPU. In my tests it only takes about 10 seconds to show itself.

Wes

April 4, 2014 at 8:02 pm #1020692

Halle Winkler
Politepix

All issues related to this should be fixed with OpenEars 1.66 out now.

April 11, 2014 at 2:46 pm #1020859

wfilleman
Participant

Hi Halle,

I rebuilt my app and ran a few test cases (including an overnight test) and I can confirm that 1.66 is looking great!

Just for documentation completeness, I see the CPU bounce between 10-20 percent when in the presence of loud repetitive background noise. As soon as the background noise stops, I see the CPU drop back to 0%.

Thank you for staying on top of this issue and, more importantly, finding a solution!

Wes

April 12, 2014 at 10:25 am #1020867

Halle Winkler
Politepix

That’s great to hear; thank you for checking in with your results.

April 15, 2014 at 11:05 am #1020873

attique2010
Participant

Hi I have integrate opeears 1.7 and use 5000 words language model and dic without rejecto , when i ran the app and in background after 10 to 20 sec the due to noise the cpu goes to 100% and after going back to 0% it stuck and dont start listening.

April 15, 2014 at 11:16 am #1020874

attique2010
Participant

This is also happening when i remain in the app . Due to this delay the openears stop listening and remain stuck unless we go in the app and perform some action .

April 15, 2014 at 5:05 pm #1020877

Halle Winkler
Politepix

This is a different issue – there haven’t been any reported circumstances where there is 0% CPU use but a stuck process, or a process which is in some way “stuck” but also responds to app interaction as you described. I’d guess it is due to something misconfigured, changed or unsupported in the audio session, or it is due to something else about the app that is happening in the same timeframe such as use of a different media object which is changing (and breaking) the audio session. There is probably information for you in the logging.

One probable issue here is that your language size is approximately 5x-20x (depending on word selection) larger than the maximum vocabulary size, which will affect search times.
Author

Posts

Viewing 27 posts - 1 through 27 (of 27 total)

You must be logged in to reply to this topic.