Home › Forums › OpenEars plugins › RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks › Reply To: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks
I’ve removed a couple of our side discussions in this thread so it’s easier for later readers who need to get an overview on related issues to get through it quickly – hope you don’t mind since the extra discussion was my fault. Today there is a new OpenEars and RapidEars version 2.502 (more info at https://changelogs.politepix.com and downloadable from https://www.politepix.com/openears and your registered framework customer account) which should fix this issue you’ve reported. Before we talk about it more I wanted to clarify what this update fixes. We’ve talked about four different things in this discussion:
1. Actual leaks in Sphinx which we’ve established are very tiny,
2. Normally-increasing memory usage from OpenEars due to a growing buffer size from longer utterances,
3. The usage of much larger amounts of memory which is then normally released after there is a hypothesis,
4. Large amounts of memory not reclaimable after stopping when there is a big search in progress during the attempted stop.
I think we covered 1 & 2 pretty well earlier in the discussion, so let’s agree to just discuss 3 & 4 now that the updates are out, if that’s OK.
The OpenEars and RapidEars 2.502 updates should fix #4, which is a serious bug and which I’m really happy you told me about and showed me an example of, thank you. In general the updates should also allow faster stops when there is a big search in progress at stopping time, even setting aside the memory usage. Please be so kind as to check this out thoroughly and let me know if it stops the bad memory events at stopping time, and also let me know if you see anything bad due to the changes. The one case which was in your replication cases but which isn’t necessary to test or report is what happens to memory when the app hangs or exits due to the demo framework timing out – this is expected to not be graceful, so the memory usage under those circumstances at the very end of the app session isn’t an issue. In your case you should be able to test against your registered 2.502 framework instead of the demo.
#3 is a more complicated subject and not really a bug as far as I can see, so I wanted to explain it a little bit. I couldn’t actually replicate the situation you were seeing with extremely large allocations during an utterance, although I worked very hard to do so, setting up an external speaker system so I could play your office audio out loud into various device microphones since it didn’t replicate with test file playback. I couldn’t ever get the big memory usage to replicate, but I could see some smaller allocations which were nonetheless bigger than I would have preferred. I believe this is a bit more of an implementation issue than a bug, with a couple of root causes:
• There are some strange noises with unusual echo and doppler in the recordings. I don’t know whether there is some kind of industrial noise in the background where you work, whether this is an artifact of the device mic (it could be a strange result of echo cancelling past a certain physical distance from the mic or similar) or even if it is an artifact of SaveThatWave, but I’ve never heard it before on SaveThatWave recordings so I think it was either really there in the environment or it is a peculiarity of the mic and hardware and usage distance. In any case, this type of audio artifact causes unexpected results with speech recognition and I’ve had the experience that it adds confusion to word searches.
• In the code you shared, the jobs of vadThreshold and Rejecto weight are reversed. Normally you want the highest possible vadThreshold which still allows intentional speech to be perceived by OpenEars, then you add Rejecto to work against real speech that isn’t part of your model, and then after adding Rejecto, in relatively uncommon cases, you can increase the weight a little. In this code, the vadThreshold is left at the default although it is resulting in all environmental sounds being treated as speech (leading to all the null hyps in every recognition round), and then there is the maximum possible Rejecto weight so that nearly all of the speech (which is really incidental noise) is first completely processed and then rejected. In RapidEars, this results in very large search spaces, because every noise is a potential word, but every word has to be analyzed using the smallest possible speech units which can occur in any combination, because your actual vocabulary is weighted very low in probability, and reject-able sounds are rated very high due to the weighting. In combination with the odd noises, this leads to the big, slow hypothesis searches as a result of non-speech, which can be seen in the logs and the profile. Although I couldn’t replicate the memory usage, I believe it is happening, and I think it is due to this circumstance.
It is my expectation that if you turn off Rejecto and first find the right vadThreshold (probably at least 2.5) and then afterwards add in a normally-weighted Rejecto model, you should see more normal memory usage and probably more accuracy. I have made a decision not to make code changes for #3 because it would have big side-effects, and I think it is due to a circumstance which would be better to address via implementation. I am still open to seeing an example which replicates consistently from a test file and giving it more consideration, but so far I haven’t been able to witness it directly so my sense is that it is bound to the environment and the vadThreshold/weight issue.
Let me know how the new stopping behavior works for you, and thanks again for providing so much info about this bug so I could fix it.