Forum Replies Created
Logging output was presumably added to your project or the sample app at some point, but if your issue is that you just want to see less of it, you can comment out the line [OELogging startOpenEarsLogging] and/or any references to verbosePocketsphinx. Make sure to turn it back on if you have any issues you want to troubleshoot.
Those aren’t errors, they are just notifications that the word wasn’t found in the default lookup dictionary. Jargon like “workplan” or words from other languages such as “Ankur” will be processed using the fallback method because they won’t be found in the default phonetic lookup dictionary – that is expected and it is referenced in the warning. If you are experiencing this with words like “left” or “company” please check out the post Please read before you post – how to troubleshoot and provide logging info here so you can see how to turn on and share the logging that provides troubleshooting information for this kind of issue.June 7, 2016 at 6:22 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030554
I’ve removed a couple of our side discussions in this thread so it’s easier for later readers who need to get an overview on related issues to get through it quickly – hope you don’t mind since the extra discussion was my fault. Today there is a new OpenEars and RapidEars version 2.502 (more info at http://changelogs.politepix.com and downloadable from https://www.politepix.com/openears and your registered framework customer account) which should fix this issue you’ve reported. Before we talk about it more I wanted to clarify what this update fixes. We’ve talked about four different things in this discussion:
1. Actual leaks in Sphinx which we’ve established are very tiny,
2. Normally-increasing memory usage from OpenEars due to a growing buffer size from longer utterances,
3. The usage of much larger amounts of memory which is then normally released after there is a hypothesis,
4. Large amounts of memory not reclaimable after stopping when there is a big search in progress during the attempted stop.
I think we covered 1 & 2 pretty well earlier in the discussion, so let’s agree to just discuss 3 & 4 now that the updates are out, if that’s OK.
The OpenEars and RapidEars 2.502 updates should fix #4, which is a serious bug and which I’m really happy you told me about and showed me an example of, thank you. In general the updates should also allow faster stops when there is a big search in progress at stopping time, even setting aside the memory usage. Please be so kind as to check this out thoroughly and let me know if it stops the bad memory events at stopping time, and also let me know if you see anything bad due to the changes. The one case which was in your replication cases but which isn’t necessary to test or report is what happens to memory when the app hangs or exits due to the demo framework timing out – this is expected to not be graceful, so the memory usage under those circumstances at the very end of the app session isn’t an issue. In your case you should be able to test against your registered 2.502 framework instead of the demo.
#3 is a more complicated subject and not really a bug as far as I can see, so I wanted to explain it a little bit. I couldn’t actually replicate the situation you were seeing with extremely large allocations during an utterance, although I worked very hard to do so, setting up an external speaker system so I could play your office audio out loud into various device microphones since it didn’t replicate with test file playback. I couldn’t ever get the big memory usage to replicate, but I could see some smaller allocations which were nonetheless bigger than I would have preferred. I believe this is a bit more of an implementation issue than a bug, with a couple of root causes:
• There are some strange noises with unusual echo and doppler in the recordings. I don’t know whether there is some kind of industrial noise in the background where you work, whether this is an artifact of the device mic (it could be a strange result of echo cancelling past a certain physical distance from the mic or similar) or even if it is an artifact of SaveThatWave, but I’ve never heard it before on SaveThatWave recordings so I think it was either really there in the environment or it is a peculiarity of the mic and hardware and usage distance. In any case, this type of audio artifact causes unexpected results with speech recognition and I’ve had the experience that it adds confusion to word searches.
• In the code you shared, the jobs of vadThreshold and Rejecto weight are reversed. Normally you want the highest possible vadThreshold which still allows intentional speech to be perceived by OpenEars, then you add Rejecto to work against real speech that isn’t part of your model, and then after adding Rejecto, in relatively uncommon cases, you can increase the weight a little. In this code, the vadThreshold is left at the default although it is resulting in all environmental sounds being treated as speech (leading to all the null hyps in every recognition round), and then there is the maximum possible Rejecto weight so that nearly all of the speech (which is really incidental noise) is first completely processed and then rejected. In RapidEars, this results in very large search spaces, because every noise is a potential word, but every word has to be analyzed using the smallest possible speech units which can occur in any combination, because your actual vocabulary is weighted very low in probability, and reject-able sounds are rated very high due to the weighting. In combination with the odd noises, this leads to the big, slow hypothesis searches as a result of non-speech, which can be seen in the logs and the profile. Although I couldn’t replicate the memory usage, I believe it is happening, and I think it is due to this circumstance.
It is my expectation that if you turn off Rejecto and first find the right vadThreshold (probably at least 2.5) and then afterwards add in a normally-weighted Rejecto model, you should see more normal memory usage and probably more accuracy. I have made a decision not to make code changes for #3 because it would have big side-effects, and I think it is due to a circumstance which would be better to address via implementation. I am still open to seeing an example which replicates consistently from a test file and giving it more consideration, but so far I haven’t been able to witness it directly so my sense is that it is bound to the environment and the vadThreshold/weight issue.
Let me know how the new stopping behavior works for you, and thanks again for providing so much info about this bug so I could fix it.
OK, the new version of OpenEars 2.502 (http://changelogs.politepix.com) should fix this issue.
Yesterday’s OpenEars 2.502 update (http://changelogs.politepix.com) should fix this.
The 2.502 version of OpenEars that came out yesterday (http://changelogs.politepix.com) has some optimizations to model generation time and will be followed up shortly by a version of RuleORama with further model generation speed optimizations (will be announced at the same site).
Yesterday’s 2.502 update (http://changelogs.politepix.com) has the ability to disable any or all of the three preferred audio session settings in the API (sample rate, buffer size and number of channels), which may help with bluetooth compatibility for devices which don’t conform exactly to the standard.May 28, 2016 at 5:48 pm in reply to: rapidEarsDidDetectLiveSpeechAsWordArray not being called #1030410
Glad to hear it!May 28, 2016 at 5:27 pm in reply to: rapidEarsDidDetectLiveSpeechAsWordArray not being called #1030408
This needs to precede any calls to OEPocketsphinxController:
There are a few audio-using APIs from Apple that you can’t run simultaneously with OpenEars (or other 3rd-party audio frameworks) because they need to override the required audio session settings for the other framework. That’s an expected result since the design of iOS is a singular audio session with overridable settings. In this case the Apple framework is changing the session settings that OpenEars needs, and in one case it prevents recording (but it sounds like it is doing well under most circumstances).
(Since both frameworks are not open source, I can’t solve it at low level neither).
You can make changes to your copy of the OpenEars source if you want to change its audio session behavior.
Sorry, this is an audio coexistence issue that isn’t really possible to offer support for – it’s good news that it works as well as it does due to OpenEars’ support for mixing sessions, but there isn’t anything API-based that can accommodate the conflict between two frameworks which each need to own the audio session if one at some point takes the session over.May 24, 2016 at 3:56 pm in reply to: pocketsphinxDidReceiveHypothesis not called on device – no problem in simulator #1030388
Sure thing! OpenEars sets its own audio session settings so most of the time if you find yourself setting audio session settings in your app or letting another part of your app set them, it is likely to cause an issue.May 24, 2016 at 3:42 pm in reply to: pocketsphinxDidReceiveHypothesis not called on device – no problem in simulator #1030386
Thanks, it would be helpful to post the logs directly here so I can correlate them to other similar issues and easily link to them in my tickets. That logging output means that some other app or other part of your app is changing the audio category to a playback-only category while the OpenEars code the logging shows execution of was in progress. You can diagnose the origin of the other calls to change the audio session category by searching for non-OpenEars invocations to the audio session. They don’t originate with OpenEars.May 24, 2016 at 3:03 pm in reply to: pocketsphinxDidReceiveHypothesis not called on device – no problem in simulator #1030383
Please check out the post Please read before you post – how to troubleshoot and provide logging info here so you can see how to turn on and share the logging that provides troubleshooting information for this kind of issue.
That duration sounds a little long, are there logs showing it?
OK, this issue is fixed for the upcoming version (it is a race condition between the shutdown process and a new utterance starting that only manifests on very fast devices), no more info needed here. No estimated delivery date for the next version, but it is only dependent on one more bug fix and testing time, so it shouldn’t be too long.May 17, 2016 at 10:28 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030342
OK, although I wasn’t able to reproduce this exactly from the sample, I was able to get a similar reproduction on an older device with a different setup, so I am currently investigating this issue, thank you for all of the info.
Sorry, there is no expectation that this would work. Audio coexistence is covered a bit more in the FAQ: https://www.politepix.com/openears/support
OK, thanks for the additional report of the issue.
Unfortunately audio session coexistence isn’t supported, so even if the error being returned is a new thing due to the OS version, I can’t necessarily help with it. But the hang might be fixed incidentally by testing against the first case now that I can replicate it (let’s hope so).
This actually looks like a different issue due to a conflicting audio session being used at the same time (there’s a session error during the shutdown process if you look up above). But the initial issue is replicated so there should be a fix for it shortly.
Just an update that I am now seeing this replicate on a 6S with 9.3.1 and I’m investigating the cause. Thank you for the test case.
The small grammar that performed the worst was “(AN | ALERT | ANT | ALWAYS | AMAZES)”. “ALERT” would be falsely detected with just background noise given 5-10 seconds of listening. However, adding another 200+ words that start with different letters of the alphabet resulted in significantly better results.
OK, so the issue is about non-speech being detected as a word. I would expect that could be an issue for a very small grammar of short words (this is the issue that Rejecto is designed to help with for models).
I don’t recall specifically, but it *felt* like it was 2-3 seconds faster on a mobile device (load time and corpus size being positively correlated).
Hmm, to create a model of this size in the format that lmtool creates should actually take less than a second (when I look at the last log that was submitted for an issue here I see a similarly-sized model generation taking about 0.2 seconds on a current device including the onetime caching of the acoustic model data). Do you have a log of the 2-3 second behavior?
Good to know, I wonder if that is actually a bug that the smaller grammars are less accurate. What was the thought process behind opting for a grammar versus a language model in a case where you are looking for a single word from a set?
Additionally, although performance of OpenEars (and pocketsphinx) is really good at generating grammar or language model data on the fly
Thanks, just to clarify, Pocketsphinx doesn’t generate models or grammars. OpenEars generates grammars and dictionaries, and ARPA files are mostly done by CMUCLMTK with some modifications.
1. Does the dictionary size matter?
I’m told by the Sphinx project (whose JSGF implementation it is) that it doesn’t. When switching between grammars the dictionary will grow regardless because Sphinx doesn’t have a mechanism for switching between entirely new dictionaries in the current version, meaning that the new words are added to the existing dictionary. I believe that it shouldn’t matter, since the search is constrained to the items in the grammar and dictionary words outside of it shouldn’t be up for consideration in the search even if they appear in the dictionary.
2. If I know that at a given point in time, only 10 words, for example, should be recognized, but there are 250 words total, should I have 25 different gram files and switch between them? Or create one large gram file? It seems, in my case, that smaller gram files produce more false positives.
My expectation would be that it’s better to switch between smaller grammars, but your own testing is the last word. If you are getting less accuracy with smaller grammars, do what gives you more accuracy.
3. Does it help to add similar or dissimilar words to either the dictionary or the gram file to improve accuracy?
Nothing should be in the dictionary that isn’t in the grammar (in the case above with the growing dictionary, it’s unavoidable but it doesn’t provide any particular benefit). In my experience the grammar should just contain the items that are intended to be recognized.
OpenEars supports self-written JSGF, but it isn’t really a topic I give a lot of in-depth support for, because the method for creating grammars in OpenEars is usually its grammar specification language (which can be output by OpenEars to multiple lower-level formats such as JSGF or the RuleORama model type). The advantage of using it is that it supports all of the features Sphinx JSGF supports, but it can be dynamically generated from Cocoa types at runtime and it’s easily human-readable, take a look if you have a moment: https://www.politepix.com/2014/04/10/openears-1-7-introducing-dynamic-grammar-generation/
You probably need a higher vadThreshold for German, and then you can add Rejecto if needed once you find the ideal vadThreshold level for it.
Super. Usually the ideal process is 1) check out recognition with your vocabulary (trying to avoid high-confusion word sets that rhyme or are otherwise very similar – ideally they aren’t all one-syllable words), 2) raise the vadThreshold as high as possible to the point that when you speak a vocabulary word under normal environmental conditions, it is recognized, but as little incidental noise and non-speech as possible is heard, and then 3) add Rejecto if needed, starting with a low weight and increasing weight until you have the best out-of-vocabulary word rejection while still having the words in your vocabulary detectable when they are spoken.
I would turn rejecto off while you work with the vadThreshold.
This does fix the clicking/breathing problem, but now ‘LEFT’ isn’t often recognized anymore, ‘top’/’bottom’/’right’ still seem ok.
Are you adjusting the vadThreshold for English or German?
Sure, check out the header or the documentation to learn about setting vadThreshold. It’s the most important adjustment to make when using a non-English model. After you find the ideal level for vadThreshold the next step is potentially to add Rejecto in order to exclude out of vocabulary speech.
I rise vadThreshold to 3.6, it can recognize my commands now.
It may help with the speed to raise it to the highest possible while it can still recognize your speech.
Q: Dose it matters? If I add this command to the LanguageModelGeneratorLookupList.text, will it help to speed up the recognition process ?
It doesn’t matter in terms of speed, unless you are seeing that it is this word only that is causing slow results (that is very unlikely)
Q: How can I set the vad_postspeech and vad_prespeech params ?
It isn’t necessary to do anything with these parameters.
It took about 10s to finsh one recognition, how can I reduce the time?
It only took two seconds to do the recognition. This is where the end of speech happens in your log (OpenEars first waits for the speaker to complete their utterance and then it starts recognizing it):
2016-05-10 12:31:51.253 OpenEarsSampleApp[779:60b] Local callback: Pocketsphinx has detected a second of silence, concluding an utterance.
This is where the hypothesis from the completed recognition is given:
2016-05-10 12:31:53.221 OpenEarsSampleApp[779:3807] Pocketsphinx heard “小宝去充电” with a score of (-21356) and an utterance ID of 14.
That is 12:31:51.253 – 12:31:53.221 or almost exactly two seconds.
So, if something seems like it is 10 seconds, it is something else besides recognition time. This could be because Flite is speaking in between, or it could be because the end of user speech is not being recognized at the right time because vadThreshold is too low.
When it detects speech, the cpu usage rises to about 100% and the peak lasts for about 6 seconds.
That seems a little doubtful to me, since there are only two seconds in which the speech is being analyzed and the CPU doesn’t need to work much before the speech is being analyzed. Are you sure that isn’t Flite speech being generated that is using the CPU?May 10, 2016 at 11:35 am in reply to: Enhancement of VR when a limited set of commands is used #1030289
Look into the vadThreshold setting, followed by Rejecto’s weight setting.
Scoring has fairly limited applications – I recommend searching these forums for the word “score” and reading all of the discussions which come up in order to get an overview on what is possible with scoring and what is not possible.
OK, thanks for the clarification. I actually wonder if the issue is related more to a mic difference more than a CPU difference. Do you get better results if you increase vadThreshold up to a point that rejects most noise on the mini? The vadThreshold settings have to be evaluated and set for each acoustic model besides the English one, I think there is more info about that at the end of the other languages acoustic model download page: https://www.politepix.com/otherlanguages. Let me know if this helps or if you’d like to troubleshoot it more (in this case I’ll ask for some logging output as seen here: https://www.politepix.com/forums/topic/install-issues-and-their-solutions/).
I added a test of these words to the German acoustic model testbed and they work (you can also verify that these phonemes are in the acoustic model if you’re very interested, by opening the acoustic model definition file AcousticModelGerman.bundle/mdef in a text editor since the supported phonemes are right near the top of the definition), so we should troubleshoot your implementation a little more and find out why it isn’t working for you.
Normally a complaint from OEPocketsphinxController about missing phonemes would suggest that either the language model is being generated using the German acoustic model but actual listening is instead being started using the English acoustic model, or that the files within the model have been changed. You can let me know which is more likely and we can start looking into it from there.
By the way, in case it is helpful to you, it is not necessary to capitalize the words – OpenEars can now handle lowercase or mixed-case as well as uppercase.
Can you give me a little bit of information about what you are doing with it? How large is the vocabulary, what device is the slow device, is there noise that is leading to ongoing recognition attempts, is this about one of the plugins and if so which ones, etc?
But I found that the cpu usage is 100% on A5 CPU
What is the duration of this CPU peak?
Thanks for your report, I will check this out next week.
Thanks very much for taking the time. I ran your file and view controller code (on an iPhone 5S with 9.3.1) and the issue didn’t replicate for me yet – with which device do you reliably experience this result? Is there anything interesting about the device that could lead to it being more prone towards this result (hard to imagine, but maybe something like having lots of apps loaded, conceivably an audio app backgrounded, very little disk space available, jailbroken, anything of note)?
The logs are intriguing because it actually doesn’t look like there’s an utterance in progress at the time that it complains that there is. In the other report of this issue there is a slow search happening while stopping is being attempted, but in this one the hypothesis is received and it doesn’t even look like listening is resumed before the shutdown starts. I’m suspicious of that route change, which looks kind of extraneous and which also pops up in the other report right before the bad shutdown.
Thanks for letting me know you are also experiencing this – it is a high priority to fix but I haven’t had the same success reproducing it. I am currently trying to reproduce this from a submitted recording (I haven’t been able to reliably reproduce it with a local recording which is a prerequisite for adding a test to prevent this generally). Is there any chance you could take a look at this post about replication cases with audio recordings and consider making me a recording to go with your sample app that reliably causes this when run with pathToTestFile?
Note that this requires a) installing the SaveThatWave demo, b) getting a complete audio dump of the session in which this happens, c) adding the audio file to your sample app code above by using pathToTestFile, and d) making sure that the issue replicates when you run the app, so we’ll know it will replicate for me as well. If possible, this would be extremely helpful towards fixing this issue faster.
I’ve added some more explanation of how this works to the FAQ I linked to above, so you can read the two questions following the question I linked to get a more extensive answer.
The size of the framework file is effectively unrelated to the size of the app.April 29, 2016 at 7:10 am in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030228
Thank you, I will look at the new version whenever you have it for me. You can erase the other upload since I have a copy.
OELogging isn’t on in that log, it only has verbosePocketsphinx output.
But a bigger issue is that the last iOS version which was supported by the iPhone 3G was iOS 4.2.1. OpenEars 2.5 supports iOS 7.1.1 through current versions and it hasn’t supported the 3G since version 1.5 if I recall correctly. You probably need to start your troubleshooting by looking at the issue that it seems like you are developing with the iOS 6 SDK on a device which doesn’t support iOS 6 or iOS 5. OpenEars specifically now requires the latest version of Xcode (at least 7.2) so make sure to upgrade when developing for a compatible device.
that doesn’t really look entirely like an OpenEars issue (it looks a bit more like an issue with the development environment), but if you turn on OELogging and show the complete log from beginning to end without anything being removed, there may be some debugging information if it has a connection to OpenEars. Here is info about how to turn on and show the entire logging: https://www.politepix.com/forums/topic/install-issues-and-their-solutions/
1. Will pocketsphinx only recognize full names if my corpus contains only full names?
I think we already talked this one through in your previous question, but if we haven’t, clarify it a little more with reference to your previous questions so I can understand what differentiates it, thanks.
2. Does the fallback method utilize the lookup list in AcousticModelEnglish? i.e. Would an error in the English lookup list cause problems with the fallback method?
Sorry, the question is a bit outside of the scope of support here – make sure that you only add entries to the lookup list which are valid and in the alphabetically-correct position so there is no need to discuss acoustic model failure states. If your changes to the lookup list lead to functionality issues you should remove them.
3. If I duplicate (in the Finder) the AcousticModelEnglish.bundle, rename it to AcousticModelCustom.bundle, add it to my project and point the pathToModel method to the Custom bundle, would you expect that to work? Or, should I just modify the lookup list in the English bundle?
That should work fine.
The only modification I support is adding entries to an existing acoustic model lookup list in the alphabetically-correct location, but not altering the bundle contents or removing entries from the lookup list, sorry.April 22, 2016 at 4:46 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030174
OK, is the saved Instruments output file made from running the app you sent when using the WAV test file pocketsphinx_sample_log_201604221141402.wav?April 22, 2016 at 4:29 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030173
OK, the next thing I noticed is that you have
[OEPocketsphinxController sharedInstance].legacy3rdPassMode = TRUE ;
set before you call this message:
[[OEPocketsphinxController sharedInstance] setActive:TRUE error:nil];
But setActive has to come before any property setting.April 22, 2016 at 4:27 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030172
I believe that it is possible that there is an extraordinary bug of some kind which causes the symptoms of just the RapidEars framework not logging just some of its logging output, so I am not ruling that out, but it seems like a lower probability than linking to an old framework right now.
It is an extraordinary bug of some kind which causes the symptoms of just the RapidEars framework not logging just some of its logging output :/ . When both OELogging and verbosePocketsphinx are on, verbosePocketsphinx suppresses the earliest part of the RapidEars output. Sorry it was difficult to pin this down and thank you for bringing it to my attention. I will try to fix that in the next version but for now you can easily verify the version by running OELogging with verbosePocketsphinx turned off.April 22, 2016 at 11:23 am in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030167
No, sorry, I don’t see that behavior when stopping, that is why I have asked for a replication case. I see the behavior I described at the beginning of the discussion, that both the buffer and the hypothesis search can grow to the size needed (which can get pretty large when there is a long utterance that has noise and the presence of words is unclear) and then is eventually released (the hypothesis search memory releases sometime after the search finishes, and the buffer memory releases sometime after stopping listening). It can take a fair amount of time before Instruments shows it as being released, and of course there are also other things happening in the sample app that can make their own use of memory (such as TTS and language model generation file caching).
I also don’t see leaks in Instruments (other than the very tiny leaks we discussed at the beginning of the discussion, adding up to less than 2k).
When I look at your Instruments example you sent, it doesn’t have leaks other than the tiny <2k leaks (leaks are orphaned memory). But it has live memory that continues to be live, at a time in which it can be expected to be possible to release, and it is a lot of memory. The memory usage is from a 3rd-pass search that doesn't complete successfully when you stop listening (this is shown in the log when you look at the log timing and compare it to the memory usage).
What is happening in that session is that the 3rd-pass search gets very large (probably due to intermittent background noise that builds up in a long utterance without any easily-found words) and it keeps searching even while the stopListening message is in progress and trying to shut down, and after too much time passes, the stopListening method has to stop attempting to release the search because it will lead to an exception. The reason that I've wanted to debug your RapidEars version install is that this behavior used to be possible with RapidEars during a big unclear 3rd-pass search, but this issue was fixed, so it is unexpected that it is happening in your install, and it is also not possible for me to replicate with the current versions of the frameworks using my own audio files or audio input.
It also seems technically impossible that this happens with 3rd-pass searches turned off, so it's a confusing issue. So these are the reasons I'm now hoping for a replication case from you where it replicates using an audio file from your environment. I would _really_ like to fix it if it is a current problem since I have already worked on this and thought it was fixed, but I can't cause it to happen in my own system. My local install demonstrates the fixes that were made, and also doesn't run 3rd-pass searches like the one shown in the Instruments file when I turn 3rd-pass searches off.
If this is an issue:
before it is 13MB there is an increase up to 70MB and falling down at 19Mb
It is a different issue – the one I’ve been trying to replicate is the one that was shown in your Instruments file you sent, where there is a far larger allocation with none of it ever being released when stopping listening fails (this is the old issue that I expected to be fixed). An intermittent usage of a large amount of memory is something that can happen briefly in many kinds of apps that have a temporary need for it without it being a problem, and the search data structures can temporarily get pretty big on a 64-bit platform, although this is something that occurs for a matter of seconds.
I’m happy to investigate a new issue like “after a large search is successfully released, there is still 6MB more memory used than expected at the time of restarting listening and it isn’t clear whether it is due to a new memory need or a bug”, but first we need to wrap up the reported issue in the file you sent, which is a huge allocation where none of it is released because stopListening is not successful, which is a different scale of problem.
Is it possible that the reported issue doesn’t replicate for you now because you have done some troubleshooting on your RapidEars install version and you are now linking to RapidEars 2.5, or do you still not see RapidEars 2.5 logging when you try to set up your replication case?April 21, 2016 at 6:42 pm in reply to: Adding custom words to the LanguageModelGeneratorList.txt #1030153
G L AE D T UW HH EH L PApril 21, 2016 at 2:59 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030150
Thank you, I look forward to checking it out.April 21, 2016 at 2:27 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030147
“Creating shared instance of OEPocketsphinxController” is the second line. But I really don’t think those logs are from a project successfully linked with a current version of RapidEarsDemo.framework. If the Rejecto 2.5 and OpenEars 2.5 logging work as expected, and all three of them work as expected when I run it locally, and the local behavior when stopping listening is different on your setup, I think the logging problem is most likely somehow connected with linking to an old copy of RapidEars locally.
I believe that it is possible that there is an extraordinary bug of some kind which causes the symptoms of just the RapidEars framework not logging just some of its logging output, so I am not ruling that out, but it seems like a lower probability than linking to an old framework right now. There is some RapidEars logging appearing (“Starting listening”), so the issue isn’t that there is no logging output from RapidEars at all, but that the new logging output that was added to RapidEars 2.5 is not apparent.April 21, 2016 at 1:55 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030144
I don’t have also the Creating shared instance of OEPocketsphinxController line in my logs
I’ve gone back and checked again, and the “Creating shared instance of OEPocketsphinxController” line is in the logging in the Instruments output that you sent me (and was in my run of your latest sample app code), it’s only the new RapidEars 2.5 logging which is not visible in your Instruments log.
I think it’s probably somewhere in your local logging as well if OELogging is turned on before doing anything else since that line has been in OpenEars for a couple of years, it’s probably just an oversight due to the large amount of logging output, or maybe a case-sensitive search or similar. I’m not aware of any conditions which suppress the standard OpenEars logging output.April 21, 2016 at 1:39 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030143
Double check that you are testing this with OELogging started, and started at the time that the view first loads (like in your code above).April 21, 2016 at 1:38 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030142
I downloaded a new copy of the Rejecto and RapidEars demos from the same link in your demo download email from this morning to make sure we were linking to the identical binaries, and the RapidEars demo definitely prints both of these lines when I copy and paste your code above into the sample app using the shipped version of OpenEars 2.051. Old versions of RapidEars don’t print either of those lines, so the issue is going to be related to that somehow.April 21, 2016 at 1:20 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030140
It’s in there after the logging line “Attempting to start listening session from startRealtimeListeningWithLanguageModelAtPath”.April 21, 2016 at 1:11 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030139
OK, I’ll check it out and get back to you.April 21, 2016 at 1:07 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030137
The code you just posted won’t work with the RapidEars demo at all, it links to a licensed version.April 21, 2016 at 12:56 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030135
What you’re looking for in a replication case is that when you use your recorded audio from the SaveThatWave demo as a test audio file using pathToTestFile, the call to stop listening gets stuck unable to stop gracefully and then afterwards (like a minute afterwards) the full amount of memory used at the time of attempting to stop is still allocated.April 21, 2016 at 12:45 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030134
It is definitely not in the logging for the Instruments case you sent so I’m not sure how to proceed when I can’t replicate it (I’ve run it with every example of challenging audio that I have and I’m out of ideas for how to get it to happen – this was an issue with much older OpenEars versions but a fix was added for it, so this is a bit mysterious, especially in combination with it being due to a 3rd-pass search in your previously-sent logs and then apparently still happening when you turn off 3rd-pass searches). Can you create a full replication case according to this post, that you can run and see behave in the same way:
And send me a link via the contact form? Thank you.April 21, 2016 at 10:51 am in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030131
Some hints for the troubleshooting process: Xcode links to the framework both via the file navigator and via the Framework Search Paths entry of Build Settings. It is possible that they don’t both point to the same thing. I would probably start by searching my system for RapidEars.framework and removing all copies of it found, making sure that the app project breaks, removing the RapidEars.framework entry from Framework Search Paths, and then downloading your RapidEars 2.5 framework from the licensee site and installing it to the app project fresh. At that point you should see the line of logging stating the version number of RapidEars.April 21, 2016 at 10:46 am in reply to: Adding custom words to the LanguageModelGeneratorList.txt #1030130
Are the dashes ignored when generating?
Ignored in what respect?
Do they change inflections or timing anything like that?
I do not want to add the names separately as only specific full names are used. For instance “John Doe” and “John Henry Doe” are used but “John Henry” is not. My thinking is that the recognition will be more accurate with phrases than it will be with simpler words that are used to make the phrase. True?
Okay, to clarify the case answer… If my Array contains “John-Henry-Doe”, and the entry in the LookupList is “john-henry-doe” will that match?
And what is returned in the string of recognized text?
What is returned is what you put in your array submitted to OELanguageModelGenerator. If that is “John-Henry-Doe”, “John-Henry-Doe” gets returned.
I’ve added a note about specific bluetooth device compatibility issues to the FAQ here: https://www.politepix.com/openears/supportApril 21, 2016 at 9:08 am in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030121
Unfortunately I haven’t been able to replicate the issue locally although I’ve been trying. Now I’m a bit confused that you say that [OEPocketsphinxController sharedInstance].legacy3rdPassMode = TRUE doesn’t affect it, since the issue in the logging shouldn’t be possible with that setting since it appears to be caused by an overly-long 3rd-pass search, and using legacy3rdPassMode with RapidEars turns 3rd-pass searches off. That suggests that it could be a local issue to your install.
When I examine your logs more closely, the Rejecto version linked is 2.5, but it looks like the RapidEars version linked is older than 2.5. I think that somehow your project is not really linking to the current version of RapidEars, though I believe you downloaded it – perhaps the linked version is in a different location from the downloaded version.
Can you do some troubleshooting of why the current version of RapidEars doesn’t seem to be linked to the project and then let me know whether you still have this issue? I’m currently at a dead end in demonstrating it in my own testbed and it could be due to linking to an old RapidEars version.
When you’ve successfully linked to RapidEars 2.5, there will be a line of logging in your OELogging output giving the version number of your RapidEars framework, right at the beginning. Thanks!
Glad that helped!April 20, 2016 at 7:29 pm in reply to: Adding custom words to the LanguageModelGeneratorList.txt #1030109
Can I create a phrase in the LookupList?
Yes, I recommend it for words or phrases you know you’ll be generating dictionaries for, but you can’t have any spaces in the word entry before the tab. You could also add JOHN and HENRY and DOE separately and a request for the phrase “John-Henry Doe” should then find all of your added words. Take care to put your additions into the alphabetically-correct location in the file, since the English-language lookup uses the order to optimize lookups.
Can the phrases contain punctuation like ‘ (apostrophe) and – (dash)?
Yes, these are the two allowed forms of punctuation.
I recall reading on the forum somewhere the the words do not need to be in all caps any more. Is that correct?
Yes, but this regards the generation of the language model or grammar, not the LanguageModelGeneratorLookupList entries, which should just be added in whatever case the rest of the list is in (uppercase, lowercase, or mixed case). The framework will make sure to normalize requests and the entries to the same case during lookups so you don’t have to worry about case.April 20, 2016 at 5:09 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030106
Does this still happen if you set
[OEPocketsphinxController sharedInstance].legacy3rdPassMode = TRUE;
at the time that you are otherwise configuring OEPocketsphinxController?April 20, 2016 at 3:41 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1030105
You can read and subscribe to update information for all Politepix frameworks/plugins here so you can get that information automatically: http://changelogs.politepix.com
OK, I think I have a fix for this and subject to more testing it will be in the next update. I don’t have an ETA for that update but there is only one other high-priority bug on the list so it shouldn’t be too long.
OK, I have replicated this and I have a pretty good sense of what the issue is. Thank you for the good test case and the report. It isn’t going to be an easy fix, so I can’t guarantee it will be in the next update, although that is a goal. In the meantime, I believe you can work around this by passing the same instance of your OELanguageModelGenerator to your subviews rather than re-instantiating it.
There is a lot of discussion about this in other forums topics and also in the FAQ here: https://www.politepix.com/openears/support. It will be helpful to do some forums searches using related keywords like noise and background.
Just turn on the logging to see versions, check out the post Please read before you post – how to troubleshoot and provide logging info here so you can see how to turn on and share the logging that provides troubleshooting information for this kind of issue.
That sounds like maybe you’re linking to an old Rejecto. Is everything (OpenEars, RapidEars and Rejecto) all version 2.5 or higher?
my old Jabra Bluetooth headset works with WhatsApp. Also a simple Swift 2 program including
let string = "Hello World!" let utterance = AVSpeechUtterance(string: string) utterance.voice = AVSpeechSynthesisVoice(language: "en-US") let synthesizer = AVSpeechSynthesizer() synthesizer.speakUtterance(utterance)
OK, but I think these are playback examples (maybe the Whatsapp example is recording?) while the logging shows an issue with recording only, so that’s really all we want to look into.
Generally, the option if you have a headset that you’d like to be able to support in your app and you can see it doing low-latency recording with another 3rd-party app, is that you can send me an example of the headset and I can investigate a bit more what is going on. Unfortunately given the range and expense of BT devices and the spotty compatibility with 3rd-party apps, it isn’t something where I can attempt to maintain a testbed or commit to support (or even very heavy troubleshooting) of any one device. But if you wanted to send it over, I’d be willing to look into it and see what’s going on, let me know.
Yup, YouTube may not be a reliable test since it probably uses a fully-wrapped video API to do playback only (as far as I know), and we’re more concerned with the ability to do low-latency recording.
Much better than testing another headset (which could easily have the same issue with 3rd-party apps) would be to check out your current headset with some 3rd-party apps that do low-latency recording (VOIP or other form of real-time audio chat is a safe bet) and see if it works.
Have you seen the headset record successfully with any other 3rd-party apps that could be expected to use a low-level recording API (for instance a VOIP app)? IME not every headset is compatible with Apple’s Bluetooth audio APIs. That is (unfortunately) the reason that Bluetooth support is still marked as experimental in OpenEars.
Got it. This is getting a little bit beyond the scope of an ASR tool and more into the region of text analysis. As you dive into a goal like this, many questions start to come up about what is “good handling” for advanced cases and then it gets multiplied by the different requirements for language models versus grammars versus RuleORama grammars, and then things that may or may not be equally likely across languages. It is also something that can usually be judged by visual observation of your own grammar.
In an individual app (versus a framework like OpenEars) you can probably restrict the range of what is likely much more, making this simpler to implement for your own specific case. You can check out the file LanguageModelGeneratorLookupList.text in the acoustic model for the language you’re using and (for instance) load it into a data structure like an NSDictionary in order to be able to access it to do evaluation of closeness according to the needs of your application, if there is no opportunity to just look at the grammar at the time of creation and consider whether it contains similar words.
Their closeness to each other?
Scoring is returned with most APIs, although its applicability is inherently limited. You can search these forums for more info about that.
a quantified value of phonetic closeness in words
This is a score.
To clarify, when you refer to the app or application, are you referring to someone’s particular app implementation of the OpenEars framework?
Are you asking for scoring?
Please check out the post Please read before you post – how to troubleshoot and provide logging info here so you can see how to turn on and share the logging that provides troubleshooting information for this kind of issue.
Super, I’ll add a test case and see if I can fix it for the next update (or let you know what code needs to be changed if it’s a code issue).
I’m happy to hear you found the issue!
The best bet is to make as brief an example as you can, which replicates the issue and show me the smallest amount of code possible that causes it. Swift isn’t supported here by me yet, but I’m up for taking a look at a non-enormous code example and seeing if anything jumps out at me.
I’m pretty sure I’ve heard of other projects taking this approach (starting listening with a probabilistic language model plus Rejecto and then switching to a grammar when a wake-up word has been invoked). I haven’t personally tried it yet but there shouldn’t be a technical or speed barrier to this working. It is also possible that RuleORama could handle both parts, maybe worth giving it a try as well.
Super, I just wanted to check to make sure those weren’t potential issues. OK, since you mentioned two issues that are sort of general issues rather than a single acute issue, I can give some general advice about the order of applying the different tools.
1. Turn off Rejecto temporarily. Create a few test cases which demonstrate environments that users will experience, and find the most-optimal vadThreshold setting that seems to get you the closest to a good balance between rejection and recognition across these cases (the highest priority is that it doesn’t reject speech that you want to recognize, and a lower priority is that it can reject some of the incidental noise or speech that you do not want to recognize without rejecting speech you want).
2. Next, add Rejecto back in. If it rejects speech you want to keep, turn the weight down. If it doesn’t reject incidental speech or noise you think it should reject, turn the weight up. Change the weight in small intervals and retest so you can see your results.
3. RapidEars CPU settings don’t affect accuracy, they just affect polling rate, so you can set them to the lowest setting that feels “fast enough” for your UX and save some energy, or set it to the highest if nothing is more important than hypothesis speed. None of them are energy hogs, but of course it’s better to use the least CPU overhead you can get away with in your interface.
Let me know if this helps.
It looks like the acoustic model isn’t really added to the app target.
Sure, it couldn’t hurt to see the complete logging with all logging options on. Please leave everything in the log as-is – it’s OK to change the app name only, as long as you do a search-and-replace to change it everywhere in the log so it isn’t confusing.
Behavior will always be the same, but time to return a hypothesis will decrease as CPU power increases.
The log looks a little odd, are you sure nothing has been accidentally removed from it?
If you can create a code-only replacement for the main viewcontroller in the sample app that demonstrates it, it can be recreated and fixed faster.
testing using real devices only
It might be a good idea to look at the logging for this behavior when you stop the engine before the view controller is dismissed to see if it is able to shut down cleanly.
Please verify that this isn’t a Simulator-only issue with stopping listening. Generally, no Simulator-only bugs are taken as reports here, from the post Please read before you post – how to troubleshoot and provide logging info here:
The Simulator can’t be used for testing or evaluation with OpenEars (there is more about this in the documentation and the source) so please do not submit any questions or bug reports relating to Simulator results.
If this can either be replicated on a real device or disproven that it relates to Simulator environment differences from the device, I can help further.
Sorry, I don’t know the cause of that. You can troubleshoot it more by testing with the default English acoustic model that ships with OpenEars 2.5 rather than a custom one, by testing using real devices only, and by testing against other unknown words.
What OpenEars initialization are you referring to specifically? The issue you have seems to only relate to OELanguageModelGenerator, but OEPocketsphinxController no longer is really initialized per se, since it just has a shared object. I would take a look at the way things are set up in the sample app and compare it to your app to make sure there isn’t any unnecessary or out-of-date code as another troubleshooting step.
it’s just one dismissing and then creating again when I needed
It might be a good idea to look at the logging for this behavior when you stop the engine before the view controller is dismissed to see if it is able to shut down cleanly.
I’ve never heard of it, but I’d be looking for situations where you have two view controllers up simultaneously that both have access to the OELanguageModelGenerator objects for English-language unknown pronunciation generation and some kind of reference cycle where an unused view controller can’t release its objects, or specifically its OELanguageModelGenerator. It seems like a race condition to make use of a flite voice to do g2p that probably needs to be singular.April 6, 2016 at 12:21 pm in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1029951
OK, with the Instruments file and the full logging I can see what is happening here (and why I missed it) – it looks like only if the session can’t be gracefully stopped due to audio conflicts it orphans the decoder in memory as an alternative to causing an exception, and this is an edge case missing from my testbed. I’ll add a test and fix this in the next update as a high-priority fix, thank you for your report. With some luck it could be this week or next. For your own info, it’s a issue with OpenEars and not with the plugins.April 6, 2016 at 10:07 am in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1029949
OK, I will replicate this and figure out what’s happening. You can also remove your downloads that you linked to in this discussion if you want to since I have a copy now.April 6, 2016 at 9:55 am in reply to: RapidEars startRealtimeListeningWithLanguageModelAtPath memory leaks #1029946
Super, thanks. A couple of questions – is it a particularly noisy area? It’s unusual to see continuous empty utterances like this log has.
My second question is that I noticed that your firm is one of the customers who received my email on March 8th 2016 about needing to replace your licensed copy of RapidEars 2.5. Did you replace it? The email was sent to df@ your company name since that was the address used for the purchase.