Forum Replies Created
-
AuthorPosts
-
Halle WinklerPolitepix
Hi,
OEEventsObserver and OELogging are different things – we need to get OELogging working in order to see the kinds of errors we can use to troubleshoot your issue. Is OELogging working for you in the sample app, and if so, can you show me the OELogging output from the sample app? All you have to do to turn on OELogging in the sample app is to uncomment the line [OELogging startOpenEarsLogging] if it is commented, thanks.
Halle WinklerPolitepixWelcome,
Let’s start by troubleshooting why logging isn’t working for you. Does it work for you when you run the sample app and uncomment the line where OELogging is started?
Halle WinklerPolitepixHello,
Please go ahead and directly test the plugins for your particular use case to make sure they are a fit for your goals. Performance can mean many different things when discussing speech recognition, and answering those questions for you will be accomplished most easily and directly by testing it for the app you are developing. You can read more about the purpose of RapidEars and its capabilities on its page here: https://www.politepix.com/rapidears and in the RapidEars-specific FAQ entries here: https://www.politepix.com/openears/support, thanks!
Halle WinklerPolitepixHello,
That’s correct, in the app target build settings.
Halle WinklerPolitepixSorry, I’ve tested it and it doesn’t seem to be working currently. I will take a look at why when there is time.
Halle WinklerPolitepixHi,
Initialize Slt before setting target_mean.
Halle WinklerPolitepixOK, why don’t you show a specific very short piece of code and tell me the results. target_mean is the only property you need to change to change the pitch of the voice, probably only a small degree below 1.0.
Halle WinklerPolitepixWelcome,
They are properties of the class, so you can set them after instantiating it. If you have done this and managed to change the sound of the voice, but not in the way you intended, please share some specific information about your results and I can try to help.
Halle WinklerPolitepixGreat to hear – for future troubleshooting, make sure to turn on OELogging since it will give you output that will make it easier to discover that kind of issue.
Halle WinklerPolitepixHi,
That is the verbosePocketsphinx logging, but the more important logging output you need is the OELogging output – turning it on is also covered in the post linked earlier.
Halle WinklerPolitepixWelcome,
This is very likely to be a path issue. Make sure you turn on logging to see the errors OpenEars is giving you so you can troubleshoot or share them with me; here is a post about logging and related: https://www.politepix.com/forums/topic/install-issues-and-their-solutions/
Halle WinklerPolitepixHi Keith,
The outermost rule needs to be a ThisWillBeSaidOnce enclosing the rest of your grammar; otherwise it ought to work.
Halle WinklerPolitepixHi Keith,
Sorry, there is no accommodation for OOV + RuleORama.
Halle WinklerPolitepixWelcome Keith,
The hypothesis is actually two complete recognitions according to your ruleset (RuleORama can return more than one complete rule-conforming utterances in a single hypothesis if they were uttered). The reason that it doesn’t correspond to your real utterance is because for performance reasons, RuleORama doesn’t support optional repetitions, so it reduces them to a single repetition. This is mentioned in the docs and the verbose logging, but it’s the kind of small detail it’s easy to accidentally overlook when you’re first getting started. So, the answer is that there is nothing wrong with your ruleset, but your optionals are silently turning into single repetitions only.
Halle WinklerPolitepixWelcome gregorym,
I am going to close this up since having implementation instructions for other platforms here tends to create very difficult support problems over time as they age, sorry I can’t be more helpful!
Halle WinklerPolitepixHi Vladimir,
The primary recognition issue is going to be testing with a non-native German speaker. I will close this up since the issue is pretty straightforward, but if you want to follow my advice above later on regarding how to test, it’s fine to open a new topic if you continue to have unexpected results, thanks.
Halle WinklerPolitepixHi Vladimir,
It’s the link I gave you above:
https://www.politepix.com/forums/topic/how-to-create-a-minimal-case-for-replication/
I can’t guarantee it is something I can assist with, but I can take a look as long as you follow the instructions in that post very carefully.
Halle WinklerPolitepixHi Vladimir,
You’re welcome, and good luck with your investigations!
Best regards,
Halle
Halle WinklerPolitepixYes, thank you for clarifying. It’s quite important to get good testing data before you start trying to fix issues by altering settings, because changing them on the basis of bad data will result in worse results for the average user and a situation where it isn’t possible to get help (for instance from me) due issues with subjective data collection such as having too few reports, non-replicable reports, or reports in which there were other occurrences which affected recognition that you don’t know about (for instance noise or distance) so let’s talk a little bit about how to set up tests for languages that aren’t being tested firsthand in the office.
The first thing to keep in mind is that you can’t use any synthesized speech (like Yandex) because it doesn’t have enough data, so that will only confuse your troubleshooting process.
The second thing is that when you test with humans, to not rely on subjective reports of interactions when troubleshooting, even in your own language but especially with a language you aren’t testing natively in-house, at least until you have a high level of confidence in what is happening, because you have no way of seeing the environmental situation or of replicating the results. And then in that case I am actually the third party removed from the original subjective report so I can’t help effectively (assuming it isn’t just a limitation of the acoustic model but something in the framework that I can help with).
It’s possible for you to obtain complete recordings of the user speech and then to feed them into OpenEars in test mode, so you can replicate the user’s experience. This post is about giving me replicable cases, but it also explains how to use the SaveThatWave demo in order to obtain audio and then use pathToTestFile so you can observe the results yourself: https://www.politepix.com/forums/topic/how-to-create-a-minimal-case-for-replication/
For your own app, of course, it isn’t necessary to put it all inside of the sample app’s code (that’s just if you want to show it to me for help), but it should get you started with setting up replicable testing for your own app.
It’s important not to turn on Rejecto until you are very confident that vadThreshold is correct for the acoustic model (this would usually mean that a sigh is not processed as speech). You may need to test this yourself; it isn’t really necessary to be a native speaker in order to make sure that the vadThreshold is rejecting as much non-speech as possible. It does sound like vadThreshold should be higher in your case.
Once you have confidence in vadThreshold, you can obtain recorded speech as described in the linked post, and start to tune your Rejecto settings (starting from the default settings). If you continue to get unexpected results, you can give me a full replication case as described in the linked post so I can look into whether it’s a settings issue.
Halle WinklerPolitepixNo, my question is more about how the testing is happening – I don’t expect that you have a group of native German-speaker testers that you can directly observe in your office as they use the app in the same way you could do so for a local language, so how are you observing their testing and receiving the results?
Halle WinklerPolitepixCan you tell me a little bit about the process of evaluating the results? i.e., how do you hear about the recognition rate and results from the native speakers who are testing for you, and how do you reproduce results you’ve heard about? The reason that I ask is that usually we don’t have an office full of native speakers of other languages we can just observe directly as we could for our own native language, so the method for gathering a large cross-section of native speaker data for another language is the underlying condition that makes it possible to improve recognition by adjusting the speech interface or framework settings.
Halle WinklerPolitepixOK, so when you test with people you are only ever testing using native German speakers, is that an accurate statement?
Halle WinklerPolitepixOK, and what are the regional German accents of the people reading the text you mentioned as the second approach? i.e. what parts of Germany did they grow up in?
Halle WinklerPolitepixCan you clarify more about what translator dictation is?
Halle WinklerPolitepixWelcome,
The different acoustic models have very different accuracy and performance levels, so a lot of variance should be expected (this is discussed on the page about other languages), but we can investigate this a bit to see if there are any underlying causes that can be addressed other than the model itself. How are you obtaining the German speech that you are testing?
Halle WinklerPolitepixHi Xi,
Sorry, no specific ideas, but you can collect audio of these cases for your own QA so you can investigate issues with your app when people outside of your team are using it. OpenEars has two features that can help you undertake your QA: the first is that you can collect audio of other users’ OpenEars sessions using SaveThatWave (you can use the demo version for the purpose of collecting problem audio without buying the plugin as long as you don’t ship your SaveThatWave-enabled version to the App Store) using its method startSessionDebugRecord to obtain all speech for an entire app session as a WAV, and then once you have this WAV and add it to your bundle, you can use OpenEars’ method pathToTestFile in order to replay the session in your app, which may show you what is happening when there is a problem. Then you can (for instance) try different words in your model or change the vadThreshold or the other standard troubleshooting steps available to you. You will also need to collect info about which devices and OS versions are being used in case you need to break out separate logic for those cases.
Sorry, it isn’t possible for me to troubleshoot very generalized issues such as a model which works well for some speakers and less well for others (this is a standard issue in speech recognition), but if you do your own QA and see something that could be reported as a specifiable and replicable bug, please feel free to let me know about it.
Halle WinklerPolitepixOK, I will close this up since I think the most-appropriate answer is to do model switching using the built-in API which doesn’t require stopping first and restarting, but if this is due to a workaround of some kind which is specific to this implementation, I’m unfortunately not going to be able to assist because switching models isn’t really what start/stop is designed for. Sorry!
Halle WinklerPolitepixHi Liam,
It isn’t necessary to stop listening in order to change models.
Halle WinklerPolitepixHi,
AVAudioPlayer in the foreground has very little behaviorally to do with background Youtube music or AVPlayer in general – it is a simpler API and generally pretty compatible with OpenEars (I think this is also in the FAQ). However, OEPocketsphinxController will (again, depending on device and OS since this isn’t specified by Apple) often change the sample rate or volume of AVAudioPlayers which are running at the time that recognition is started, for similar reasons to those I mentioned previously. A category of AVAudioSessionCategoryAmbient is unquestionably being changed, since it doesn’t have audio input, even if isn’t changing it in a way that you are noticing affecting the audio output (there are a few questions posted here about how to prevent changes to existing AVAudioPlayer sessions at the time of starting OEPocketsphinxController). I’ll go ahead and close this up since the problems of audio framework coexistence are covered in the FAQ and many questions here so I can’t shed further light on this, which is a side-effect of the design of audio APIs on the device and the limited specification for audio behavior on the platform rather than something I can help out with. Best of luck and take a look at the FAQ for a bit more background here, thanks!
Halle WinklerPolitepixWelcome,
I think this would be unlikely to work (or unlikely to always work the same on every device and OS), because OpenEars has to change the session category in order to begin but not vice versa. That is, while the audio is compatible with both its preferred category and OpenEars’ category so it wouldn’t need to change the category if started during a listening session, OpenEars is not compatible with the audio’s preferred category, requiring a session restart that can interrupt playback. Regretfully, this is an OS implementation detail rather than an OpenEars implementation detail (audio session isn’t a very deeply specified API), meaning that it is unlikely that the same thing happens identically even from OS version to OS version so it would be challenging to try to control these kinds of results from OpenEars. Issues with audio coexistence are covered in more detail in the support FAQ, but generally, I’m sorry that it isn’t possible to cover every case like this.
January 19, 2017 at 10:06 am in reply to: Issues when playing sound by OpenAl by upgrading Openears to 2.504 #1031529Halle WinklerPolitepixHi Xi,
So, it’s never been supported to use another audio framework simultaneously with recognition (it’s actually something I’ve actively advertised as being unsupported here and in the FAQ), but I understand you had something that was nominally working previously. Like I explained in the previous question, it isn’t possible to give you or support older versions, so unfortunately that is ruled out.
The audio changes from 2.5 onwards increased compatibility for mixing cases rather than decreasing them, so the most likely cause of your emerging issues is too much intervention, i.e. where you explain that you have set the audio session category, which is something you can’t do with OpenEars and its plugins, and I’d guess there is probably more forcing of audio settings OpenEars needs to set for itself that was part of making it work in the older version.
This is something that you would need to troubleshoot locally with your team since I don’t support this scenario or maintain a testbed that could even help me give you input on it, however I am happy to give advice about how I would approach the problem for the best results if it were an issue I had on my plate. Step 1: return your integration of the two frameworks to a simple/naive state. That is, whatever you had to manually override in the audio settings/session to get them working together (i.e. any audio session overrides), please un-override it as if you had just tried them together for the first time. A good rule of thumb is that a case-insensitive search of “audiosession” in your app code shouldn’t find anything. A few of your forum questions are about audio session issues from combining OpenEars with unsupported audio frameworks, so you could have a couple of iterations of this kind of code. Step 2: Take a close look at the documentation for OEPocketsphinxController 2.504, which in addition to having an 8k recognition mode option now, also has four properties that could affect audio coexistence: disableMixing, disableSessionResetsWhileStopped, disablePreferredSampleRate, and disablePreferredBufferSize, and try them out to see if they help (making sure you’ve already made the shared OEPocketsphinxController object active before interacting with any of its properties).
A simpler alternative would simply be to use a primed AVAudioPlayer for playback since it is supported (again, making sure that nothing, including the AVAudioPlayer, is making calls to the audio session). I’ll leave this open so we can discuss results a bit if it’s understood that I can’t discuss the issue of producing old versions of any of the software further, or the question of whether this kind of audio coexistence should be supported/supportable, thanks.
Halle WinklerPolitepixHi Xi,
Sorry, older versions of OpenEars are not supported by me (or the plugins), so I wouldn’t be able to help with the project of using an obsolete version or with the project of configuring an unsupported acoustic model for use with it. I will close this up since I’m not able to assist with it.
Halle WinklerPolitepixHi Xi,
With OpenEars 2.5 and the addition of so many new languages, OpenEars moved to an OpenEars-specific bundle package including a new file format, which means that only acoustic model bundles issued by Politepix are compatible with OpenEars (previously, other language acoustic models were explicitly not supported by this project, but with the older and simpler acoustic model structure I was willing to give some hints for creating an unsupported model as long as it was understood that I didn’t support it). There is no 2.5-compatible Russian model, sorry. I unfortunately don’t discuss future plans (or lack of plans) here, apologies for the inconvenience.
Halle WinklerPolitepixWelcome,
Sorry, I’m honestly not sure – there is definitely an x86_64 slice in the voice files.
December 27, 2016 at 3:41 pm in reply to: rapidEarsDidDetectFinishedSpeech not called in Swift 3 #1031484Halle WinklerPolitepixWelcome,
There is a typo in your signature:
setReturnSegmen(true)
First of all make sure you are first initializing the OEPocketsphinxController object as explained in the docs, and then set these to true for the OEPocketsphinxController shared object:
setReturnSegments(returnSegments: Bool)
setReturnSegmentTimes(returnSegmentTimes: Bool)
and after initializing the OEEventsObserver object, making sure that you are making its protocol available as described in the Swift tutorial first (and making OEEventsObserver+RapidEars available in your bridging header), use the callback:
rapidEarsDidDetectFinishedSpeech(asWordArray words: [Any]!, andScoreArray scores: [Any]!)
If you’d like more troubleshooting, please show the relevant code from your app, and also the logging output as described in the post how to troubleshoot and provide logging info here where you can see how to turn on and share the logging that provides troubleshooting information for this kind of issue.
Halle WinklerPolitepixHi,
Sorry, there have been no recent changes that would lead to these issues. They are situations where minor iOS or device changes can bring out new behavior so I would check against other iOS versions and file a radar if the behavior is undesirable, or spend more simplifying the messy code so a clear correlation emerges that would be possible to replicate and troubleshoot. I don’t actually support background mode so I wouldn’t be able to help much with that aspect generally, sorry.
I will close this since it is pretty far afield of the topic, but if you have issues happening in the foreground that you can create a reduced and simple example of an issue which is demonstrably related to OpenEars according to this process:
https://www.politepix.com/forums/topic/how-to-create-a-minimal-case-for-replication/
then it’s totally fine to open a new topic just for that report and I’ll be happy to take a look, thanks.
Halle WinklerPolitepixHi,
These can be seen in the downloadable docs (found in the documentation folder of the disk image) and on the respective plugin pages under method documentation: https://www.politepix.com/rapidears/
Each Objective-C method signature also has a Swift 3 method signature at the end of its definition (sometimes this is not in specially-formatted text, so it may resemble the written definition).
Halle WinklerPolitepixGreetings,
It’s for answering questions of the type that would be posted in this forum, but via email. Here are the support terms: https://www.politepix.com/supportterms/
Halle WinklerPolitepixHi Matt,
This sounds a bit like it might be a misdiagnosis of the drop in accuracy rate with your cloud service – it wouldn’t be obvious to me how a 16-bit/16k PCM format is removing that much needed data from 8k speech audio, so it might be a case for trying different approaches.
Be that as it may, Pocketsphinx’s 8k sample rate mode doesn’t make any changes to how OpenEars manages its audio buffers (or it wouldn’t have been possible to add right now) and SaveThatWave is ignorant of devices and Pocketsphinx by design, so trying to make SaveThatWave aware of Pocketsphinx runtime settings or the audio driver and branching its behavior isn’t a probable direction for development, sorry.
Halle WinklerPolitepixHi Matt,
Sorry, it is a bluetooth compatibility mode accommodation only (and probably will not even stick around very long as an OpenEars API at all if I get a sense of how to automate the bluetooth compatibility accommodations without side-effects). Format conversions of SaveThatWave output isn’t an area I’m likely to get into since the number of potential output variations developers could make use of is large and SaveThatWave isn’t trying to be a general-purpose audio tool. However, this is not difficult for you to take on in the way that meets your needs the best as part of your own developer scope – the best high-level API to investigate starting out with is probably the AVAsset family.
November 19, 2016 at 10:26 am in reply to: Applying changes to OEPocketsphinxController singleton #1031302Halle WinklerPolitepixWelcome,
Yes, nearly all of the properties of OEPocketsphinxController refer to pocketsphinx startup values or initialization values for the audio driver, so they don’t get recalculated by pocketsphinx or OpenEars’ audio driver on the fly. On the other hand, the public methods of OEPocketsphinxController refer to operations that OpenEars can make on an already-started instance of pocketsphinx, so they are for the most part expected to be called on an already-started listening session (like suspending or changing a model or grammar).
A lot of attention has been paid to speeding up stopping and starting times in OEPocketsphinxController, so you may find on modern devices that there isn’t as much of a speed penalty to stopping and starting with new values as you may be concerned about, give it a look.
It would be helpful to share your technique for objectively evaluating the level of background noise – if it is accurate an automatic vad adjustment could be added to the framework.
Halle WinklerPolitepixHi guys,
I’m fairly sure that the reason darasan didn’t have the logging output was because the excerpted log output section precedes the initialization of RapidEars’ methods, and evilliam, plugin version logging in OELogging was only added in RapidEars 2.5 so you would not have any under any circumstances except with 2.5 or later plugins.
There is a mysterious console bug in which a great deal of other simultaneous logging can “squash” RapidEars version output logging, but the info available in this post doesn’t really point to that bug. In order to avoid it, just make sure that OELogging is on but verbosePocketsphinx isn’t, and other extraneous app logging isn’t, and you are using a 2.5 or later version of RapidEars. If OELogging reports your OpenEars version as being the current 2.504 version, and RapidEars links and builds with it, that can be assumed to be a 2.5 or later version of RapidEars (it should of course be the most recent version, but I understand that the entire reason you want to see the version output is to verify whether you are using the most recent version :) ).
I’m going to ask that any further discussion in this topic start with OELogging output from the complete app session (but without any other logging) since I think not having it and speculating is probably generating some confusion about causes both within this discussion and probably also for searchers finding this thread while troubleshooting, thanks!
Halle WinklerPolitepixOK, today’s version 2.504 should build fine in 8.1.
Halle WinklerPolitepixOK, I’ve exposed API for running in 8k mode in today’s update 2.504, you can set use8kMode in OEPocketsphinxController. In the longer term I’ll think about whether this can be handled automatically.
Halle WinklerPolitepixOK, I’ve added this to today’s update 2.504, should now run fine on iOS 10 devices.
Halle WinklerPolitepixWelcome,
Sorry, that looks like an iOS 10 requirement that I overlooked in the sample app; I’ll add the key to the info.plist and push a new version next week. In the meantime you should be able to add an NSMicrophoneUsageDescription key and the corresponding string value “Microphone access is used for performing offline speech recognition” to the sample app’s info.plist and it ought to work.
Halle WinklerPolitepixInteresting, I was under the impression that current Pocketsphinx didn’t use that mode at all any more. I’ll add an API hook for it in the meantime and consider longer term approaches to this problem. API support for 8kHz mode will be along in the next couple of days along with the build command fix.
Halle WinklerPolitepixHi Danny,
Thanks very much for the great bug report. I’ve solved this by running your alternate version in the case that the first attempt fails, and also have a fix for the second issue so that the build completes, and just need a day or two to test my solution for edge cases and then I’ll push the new version.
Halle WinklerPolitepixJust to double-check – you built via Archive rather than Build, is that correct?
Halle WinklerPolitepixI’ll check it out, thanks.
Halle WinklerPolitepixWelcome,
Sorry, there isn’t.
Halle WinklerPolitepixThere is now a Swift 3 tutorial for RuleORama: https://politepix.com/openearsswift-tutorial/
Halle WinklerPolitepixNo problem, glad you found the cause and I also appreciate your setting my mind at ease about it.
Halle WinklerPolitepixHi,
Sorry, it isn’t possible to have a single framework work with more than one bundle ID.
Halle WinklerPolitepixHi Matt,
The notification that results in this delegate method isn’t sent out until the fail/success result of the actual writeout is received, meaning that it can’t start until after the writeout is not just complete but its result status is also known. Although it is possible for the notification to be delayed, I can’t see a way for it to be premature to the existence of the file unless the bug is in NSData.
Based on this design and that I haven’t received this issue report before (I have had reports of delays, which is sometimes expected), my expectation in this case is that the issue lies elsewhere, but if you don’t have any leads on any peculiarities of the implementation that could result in an issue on the filesystem side, rather than polling, I would recommend just padding the action that you take after receiving that delegate callback using some type of sleep, by whatever number of milliseconds gives you consistent results. It’s likely to be a small number. If you don’t want to pad it generally, it would probably work fine to make one attempt and then if that fails make one time-padded attempt.
Halle WinklerPolitepixHi,
Glad that helped. I am going to take in the data on this for a little while but my current suspicion is that the majority of cases will be fixed by disablePreferredBufferSize and that maybe a minority will also benefit from disablePreferredSampleRate, and so far I haven’t heard of any that needed the channel override to be functional. On the assumption that it will help the most users while making the least changes, my recommendation would be to start by using disablePreferredBufferSize, pay attention for reports of issues with BT devices that might confirm that the other setting is also needed, do your own tests of performance (the interesting performance question is CPU usage when there is absolutely no speech or anything perceived as speech) and let me know about them in the general BT thread, and always start your troubleshooting process by upgrading to all current versions. It isn’t humane UX to give users multiple choices for managing their own device compatibility – I would either have one “Bluetooth compatibility mode” switch that turns on as many of the settings as your testing has demonstrated it’s reasonable to ship with, or simply set them on with no switch and carefully test whether there is any negative impact on your use case when Bluetooth isn’t being used.
Halle WinklerPolitepixHi Matt,
Check out the recent bluetooth discussions here in the forums and the FAQ entries on BT compatibility methods and support in order to get a headstart troubleshooting this, thanks!
Halle WinklerPolitepixHi,
OELogging is the API for printing the version for OE and the plugins. If the RapidEars version isn’t showing with only OELogging and not verbose on, the actual linked framework version is most likely a version which precedes the addition of this feature. I think a good approach for troubleshooting is to turn off all of your other app logging and to remove your RapidEars framework to the extent that it decisively breaks (including in Framework Search Paths), then reinstall it and see what OELogging says without any extraneous logging statements. The reason I suggest this is that your excerpt from your logging above (please be so kind as to show whole sessions when sharing logging) stops well before the RapidEars version would be able to appear, but most of its logging output isn’t from OELogging.
Halle WinklerPolitepixHi,
The RapidEars version would normally print when it starts listening, does this log get that far? There has been a reported and elusive issue where having verbosePocketsphinx on sometimes suppresses RapidEars’ version printing (version printing is a function of OELogging rather than verbosePocketsphinx) so you can try setting verbosePocketsphinx to off and see if it helps. Another reason to not see RapidEars version printing would be if it were a pre-2.5.x version of RapidEars, since plugin version printing via OELogging is a recent feature.
Halle WinklerPolitepixNope, it just isn’t a feature of the plugin.
Halle WinklerPolitepixWelcome,
Sorry, it isn’t possible to output an audio file from NeatSpeech.
September 28, 2016 at 11:02 am in reply to: Constructively share and discuss your results with bluetooth devices here #1031029Halle WinklerPolitepixHi,
It’s fine if you want to repurpose this thread as a place for people to share results, as long as it remains very info-based and constructive. I don’t have bandwidth to compose any reports about BT from my end of things, sorry, but this project has added many improvements for broader BT compatibility (and other coexistence issues which don’t originate with it) over time, so I will continue to pay attention to what developers report and continue to make generalized improvements where possible/non-harmful.
If developers want to discuss their results with particular BT devices here, please include the following info:
BT device and its hardware version
Apple device
Device OS version
OpenEars version and the version of any plugins used (these should always be current – please share them to verify this)
Any OpenEars coexistence APIs used and results
Noted impact on performance and/or accuracy (care should be taken to set up the least-subjective tests possible before sharing data on either of these – one-off anecdotes or unobserved hearsay can lead to a lot of confusion and I will remove observations that I’m concerned are both low-data and confusing)I will ask that in order to keep this thread available as a reference and way to passively inform me of experiences with devices at such time that I do have bandwidth to investigate generalizable improvements, that posters here operate from the assumption that I’ve given thought to BT support issues and that the time and status it is given is an accurate reflection of the time resources available and the points of origination for the issues encountered, and hopefully see that despite its experimental status, it gets regular attention in the codebase as a result of this decision process.
September 27, 2016 at 8:55 am in reply to: Constructively share and discuss your results with bluetooth devices here #1031021Halle WinklerPolitepixHi,
No thoughts, since BT support is experimental due to these issues not originating in this library (check out the FAQ for more elaboration on this) but if you look at your previous post you linked to, at the bottom I discussed additions to OEPocketsphinxController in 2.502 that give you much more ability to troubleshoot your own BT integrations (experiences/successes with them are also discussed in a few recent forums posts for more granular info), good luck!
Halle WinklerPolitepixHi,
Yes, there’s a technical holdup at the moment which resulted in the bitcode slice being (hopefully temporarily) removed. It is solved but needs testing to make sure the solution is reliable.
Halle WinklerPolitepixThat’s the correct format.
Halle WinklerPolitepixHi Coeur,
Thank you for the suggestion, I’ll give it some consideration for the next version. If you have requests, this is the right place to let me know about them, thanks!
Halle WinklerPolitepixThis is correct behavior. The reason to use a ruleset is so that utterances that don’t conform to the rules are rejected. If your grammar is compared to an utterance that has information that doesn’t match the ruleset, the utterance shouldn’t be recognized. The options for this situation are either to use a language model rather than a grammar (examples of this can be found in the tutorial and sample app) or to use a grammar which contains entries representing the additional speech.
Halle WinklerPolitepixDue to a server migration, part of this discussion was lost, but this is the most recent post from the original poster:
Well I understand that so I will ask something more specific.
The first item in the grammar I game is the phrase “TODAYS DATE IS”. I have a wav file that says “Today’s date is September 16th”.
When I pass this file to runRecognitionOnWavFileAtPath it identifies the phrase at the beginning. If I tack on extra seconds of voice after the September 16th, but not containing any other parts of the grammar, the phrase at the beginning is not recognized.
I am not sure what I could send you to help understand the issue. I have a project adapted from the OpenEarsSample app I could send along if that would help.
Halle WinklerPolitepixOK, this may be more what you are intending:
NSDictionary *grammarOpening = @{ ThisWillBeSaidOnce : @[ @{ ThisWillBeSaidOnce : @[@"TODAYS"]}, @{ ThisWillBeSaidOnce : @[@"DATE"]}, @{ ThisWillBeSaidOnce : @[@"IS"]} ] };
Wrapping the entire ruleset in optional repetitions doesn’t do anything on the outside of the ruleset (it’s a given that the whole ruleset may be used more than one time) but it can probably lead to some unexpected outcomes, even more so with RuleORama which doesn’t support it as a tag (this can be seen in the logging if you turn it on). Is this grammar for use with stock OpenEars or with RuleORama?
BTW, you can use lowercase/mixed case and apostrophes with OpenEars if you want to.
September 13, 2016 at 10:12 am in reply to: pocketsphinx not suspended after pocketsphinxDidSuspendRecognition event fires #1030980Halle WinklerPolitepixHi Matt,
Please check out the post Please read before you post – how to troubleshoot and provide logging info here so you can see how to turn on and share the logging that provides troubleshooting information for this kind of troubleshooting request.
It would also probably be helpful for you to search for and read some of the existing discussions about why immediately suspending after you start listening isn’t a good idea or necessary in current versions, and likely to lead to problems.
Halle WinklerPolitepixWelcome,
I think the docs have a good rundown of how the human-readable grammar language works with RuleORama, RapidEars and OpenEars, and for statistical language models there are examples in the tutorial and the sample app to start with. Unfortunately I can’t construct the grammar for you on a case-by-base basis, but I’ll be happy to answer specific questions if you have unexpected results with specific grammar rules after taking a look at the docs.
Halle WinklerPolitepixThat is great news, now I’m even happier I added the override. I will think about whether it should be the default in a future version and/or extending the documentation about it to help with troubleshooting coexistence, and I appreciate your letting me know about your results.
Halle WinklerPolitepixHi,
It’s quite possible that this behavior changes from iOS version to iOS version or device to device, but if session mixing is turned on and the audio object still behaves that unexpectedly (BTW, reduced sampling rate is not unexpected – OpenEars has to set the sampling rate on the session and if you play audio with a higher rate during OpenEars’ session, it may be downsampled – this isn’t documented in Core Audio but appears to have consistently been the default behavior across versions), that is a Core Audio bug or undocumented Core Audio behavior. As a result, although I’d like a different result as much as you would, it’s unclear what could be done about it by this framework, particularly if the behavior didn’t manifest in every iOS version or every device – I’ve documented that audio object coexistence during recognition is going to be very limited and problematic, although AVAudioPlayer is known to work with 16kHz PCM files since it is also a major element of OpenEars. I wouldn’t be surprised if there is different behavior with compressed codecs, although since these topics aren’t documented in Core Audio and appear to change over devices and iOS versions I’m not in a position to simply improve those results via troubleshooting for one version and one device (changing the default audio mode or using a non-recording mode as a semi-workaround is a non-starter, regrettably).
I can recommend taking a look at the new OEPocketsphinxController overrides added with 2.052 that are intended for improving Bluetooth device behavior when the Bluetooth device doesn’t exactly match up to spec:
disablePreferredSampleRate
disablePreferredBufferSize
disablePreferredChannelNumberand convert your audio to a PCM format such as WAV, and see if the results are better. Consider trying out disableSessionResetsWhileStopped if this is happening after listening ends. Make sure to set these overrides before starting listening and after activating the OEPocketsphinxController instance.
Halle WinklerPolitepixHello,
OpenEars isn’t iOS Sphinx, but yes, changing vocabularies is one of the most basic functions of the framework. There is an example in the sample app and it is also described in the documentation so give the docs a read.
September 1, 2016 at 12:50 pm in reply to: App quits with Error : Context cue word has a non zero count #1030931Halle WinklerPolitepixNo problem, thank you for the logs. I don’t know the specifics of your grammar ruleset, but looking at this it is possible that it ends up very large after processing. To troubleshoot, does the same thing happen when you use the same ruleset but with only (for instance) 10 words?
I doubt this is the underlying issue, but I also recommend that you transcribe your numbers before submitting them to grammar generation (that means create a routine that first changes “41” into “forty-one”). The reason for this is that OpenEars doesn’t attempt to do this for you automatically since it doesn’t know whether it will be read out “forty-one” or “four-one”, but letting the fallback method handle it will make your grammar generation slower and it’s also only guessing which one you want, so the safe bet is to spell numbers out in the same way you want them to be perceived in the user speech.
September 1, 2016 at 12:33 pm in reply to: App quits with Error : Context cue word has a non zero count #1030928Halle WinklerPolitepixHi,
As I mentioned in your previous post, this is where the logs have to go. I need to be able to reference them later if there is a correlated issue that someone else has later on, and later questioners with similar errors have a fair expectation of receiving similar hits when they do a search here. I’m regretfully going to remove your account if I get more of these posts or replies, your call.
August 30, 2016 at 8:38 am in reply to: Can I adjust any arguments to get more precise results? #1030892Halle WinklerPolitepixWelcome,
Thank you for sharing your results, they are pretty good already for such a large task and it’s good to hear. I would recommend searching these forums for the keyword “accuracy” and looking through the FAQ: https://www.politepix.com/openears/support to see if there are some improvements I’ve already described.
August 19, 2016 at 9:36 pm in reply to: Extra logging on new versions of frameworks (2.502) #1030830Halle WinklerPolitepixWelcome,
This is just standard RapidEars hypothesis logging as a result of OELogging being set on. I think it’s been there for several versions. You can turn it off by turning off OELogging.
Halle WinklerPolitepixWhat should happen, if everything on the device and Apple side is working according to its documented behavior, is that it will be upsampled automatically in the render callback buffer even if the preferred rate is unsettable or overridden. However, this will not add any information to the audio data (nothing will, and it might also be compressed when it first comes into the callback for a further information reduction), and IIRC it may also result in the overall volume of the buffered data being reduced. So, to a certain extent, I would say that the results may not be the preferred outcome but may be as expected – I’m afraid that ultimately the issue probably doesn’t originate in the engine but is just manifesting there. It is also not necessarily the case that the device and/or the audio API are working according to their documented behavior, which is why bluetooth is only supported experimentally. Sorry I can’t help out more with this.
Edit: however, since there could be a change in volume due to sampling rate change, please make sure that your not-great results aren’t due to a need to adjust the vadThreshold and/or Rejecto weighting (if used, and only after clarifying vadThreshold setting) in one direction or the other. Otherwise, definitely no other thoughts.
Halle WinklerPolitepixHi Tim,
Yes, I added a few audio session override properties in the last version for experimenting with improving bluetooth under these kinds of conditions. In the OEPocketsphinxController docs check out the new properties disablePreferredSampleRate, disablePreferredBufferSize, and disablePreferredChannelNumber to see if any one of them or a combination helps with your situation. I would recommend confirming that everything is working fine in a non-bluetooth implementation, then trying them one at a time, then trying them in groups of two, then all together, all with several repetitions to avoid fluke results, and document your results and choose the least-intrusive combination that helps (if any do). It is better to override these settings as little as possible.
Halle WinklerPolitepixI recommend leaving secondsOfSilenceToDetect alone – having it very short is unlikely to help much in this case. vadThreshold should probably be higher, as high as possible before it starts rejecting speech that you want to perceive. After this, the next thing to try is adding Rejecto.
July 28, 2016 at 7:52 pm in reply to: Is it possible to use an acoustic model outside the main bundle #1030753Halle WinklerPolitepixWelcome,
This should work fine if you are up for doing the troubleshooting, but there is no API for it – it would be necessary to stop using the convenience path methods in OEAcousticModel and instead pass the real path to your model’s top-level directory. This is an advanced topic and gets a bit beyond the scope of support given, but I would recommend experimenting with passing the path to the model to the various methods that ask for it, and I think you’ll be able to get it working.
Halle WinklerPolitepixWelcome,
This is a general problem with combining very similar-sounding one-syllable words (this is a difficult task out of context) and non-native speaker recognition, sorry. To get better results, differentiate the language that needs to be recognized a bit more.
July 21, 2016 at 10:46 am in reply to: rapidEarsDidDetectLiveSpeechAsWordArray not being called #1030723Halle WinklerPolitepixWelcome,
This is known to be working fine, so give the documentation (OEPocketsphinxController as well as RapidEars) and this thread a closer look (what you tried isn’t what solved the issue for the other poster) and you’ll get it working.
Halle WinklerPolitepixHello,
No, OpenEars is designed for continuous listening and it isn’t recommended to create workarounds in order to try to use it as a push-to-talk interface. If this is a requirement for your product you can make your own recording interface using AVAudioRecorder and then submit the result to runRecognitionOnWavFileAtPath:.
Halle WinklerPolitepixHello,
Sorry, I don’t know the reason for that. I recommend taking a look at the FAQ and other forum posts on the topic of accuracy.
Halle WinklerPolitepixWelcome,
Sorry, there is no link or support for using an older version. The older version of OpenEars wasn’t less sensitive to background noise (there are actually many older discussions in these forums about how to reduce its noise sensitivity), it just had a (very slightly) higher voice activity threshold setting, but which couldn’t be adjusted.
You can raise vadThreshold in OpenEars 2.x in order to get the same results, but also have the flexibility to set it to your usage scenario’s ideal setting. There is more info in the FAQ and 2.x-related posts in the forums about using vadThreshold and/or possibly Rejecto to ameliorate non-speech and out-of-vocabulary input.
Halle WinklerPolitepixWelcome,
A couple of things – the most important is that when you have a debug question, it’s also necessary to show the entire output from OELogging as explained here: https://www.politepix.com/forums/topic/install-issues-and-their-solutions/
But the approach of loading a recorded file into an AVAudioRecorder instance and then referencing it from that instance’s URL doesn’t really make sense to me – the method is expecting a string which is the path to an already-completed and closed file which is a 16-bit, 16k mono WAV. The code you’ve shown doesn’t suggest that that is what is being passed to the method. It just looks like it can’t open and read the contents of the path, which is what I’d expect in this case. Just go ahead and pass a path to a completed and closed WAV file with the right format to the method, and skip the indirection or breakage that the AVAudioRecorder instance is causing.
July 11, 2016 at 2:39 pm in reply to: note: Module debugging should be disabled when shipping static libraries. #1030671Halle WinklerPolitepixOpenEars is compiled with module debugging off (you can confirm this in the xcode project build settings), so I think the framework would have had to have been unintentionally recompiled at some point.
Halle WinklerPolitepixHi,
Yes, I’ve tested the accuracy against test recordings and it was fine, however I can’t test first-hand since I’m not a native speaker (which also means that it is possible there are underrepresented groups in the model that I’m unaware of, e.g. accents or genders, etc), and keep in mind that it is a non-tonal model. Most of the serious issues I’ve heard about with the model were due to the vadThreshold being too low, so you may want to start by increasing it higher even if you have a level where there has been improvement.
July 6, 2016 at 10:38 am in reply to: note: Module debugging should be disabled when shipping static libraries. #1030658Halle WinklerPolitepixHi Konrad,
Sorry, I don’t see those warnings so I don’t have suggestions. Is it possible that you’re using a beta Xcode, or that you at some point recompiled the framework?
Halle WinklerPolitepixHi,
I’m asking you to read the class documentation so you can find out how to use OEPocketsphinxController.sharedInstance().setActive(true) correctly, and see and experiment with the other properties documented in the class that might have a connection to your issue before asking me to troubleshoot. Let me know when you’ve had a chance to do that – we can troubleshoot this more when your function is designed according to the docs and you can let me know what happened when you tried the other documented override functions besides the ones I mentioned above, thanks.
Halle WinklerPolitepixHi,
I feel like it would be a generally good idea to set aside a few minutes to give a quick read through the docs for the OpenEars classes you are using, because the info I’m writing here about disabling the preferred settings for bluetooth results improvement is in there, as well as more related options which might help you troubleshoot on your own once you’ve seen them. A bigger issue it will help with is that this code is out of order and will invisibly affect your results with the disabling properties you are calling:
OEPocketsphinxController.sharedInstance().disablePreferredSampleRate = true OEPocketsphinxController.sharedInstance().disablePreferredBufferSize = true do { try OEPocketsphinxController.sharedInstance().setActive(true) } catch { print(error) }
and it would be my preference to give more help after you’ve had a look and had a chance to see why, since that’s also in the docs and it’s the same explanation by the same author as would potentially be re-written in this post :) .
Once you can fix that up and narrow down which of those two properties is really taking effect, and maybe after you’ve seen what your results are with the other available overrides listed in the docs that you’ll learn about, show me your new initialization code and let’s come back to the loudness issue if it is still happening.
Halle WinklerPolitepixThe error still remains like the previous log i copied. If needed please let me know if you need a more detailed log or more information.
It sounds like it may be running the error a few times at the start of the buffer callback but then working correctly, so if there are no recognition issues I wouldn’t worry about it. As I mentioned, the incompatibility errors are silent so there may not be a clear output result in the logs when it starts working either.
Is that part at the end of your post from you, or is it quoting my or someone else’s writing in another post? That is a bit hard to understand when it is added to the end of your own post without comment; maybe you can write the part you wanted to ask me about separately below or maybe reformat the post above with the blockquote button, thank you.
Halle WinklerPolitepixHello,
OK, I’d probably recommend that you troubleshoot basic functionality issues like this without n-best, and turn it on at the end when everything else is working, only if you discover in testing that it is helpful to the enduser in some way. It’s relatively uncommon that it is used in shipping applications.
If you have future questions, make sure not to edit anything out of the OELogging output like in the excerpt above.
What the errors mean is that there is an incompatibility between the bluetooth device and Apple’s low-level audio API causing a silent failure, which happens sometimes and is the reason that OpenEars’ bluetooth support is experimental – I don’t have any input into Apple’s API or how hardware manufacturers implement their bluetooth devices, so I can’t offer a lot of support with those issues. The one thing I can suggest is that since OpenEars 2.052, it is possible to turn off setting the preferred sample rate, the preferred buffer size, and the preferred number of channels, and it can be the case that the incompatibility is with one of those three settings. So, I would try disabling them in sequence and possibly all together and see if that helps. More information on that is in the OEPocketsphinxController documentation. The FAQ has some info about audio issues so it is probably worth checking out: https://www.politepix.com/openears/support
Halle WinklerPolitepixWelcome,
Why are the hypotheses in an array? Your logging has verbosePocketsphinx (or maybe verboseRapidEars) turned on, but it is necessary to also turn on OELogging and post the entire app session output to troubleshoot an implementation issue, take a look here: https://www.politepix.com/forums/topic/install-issues-and-their-solutions/
Halle WinklerPolitepixHello,
You can use any datasource desired in order to create an array or dictionary, but since it happens before submitting the array or dictionary to OELanguageModelGenerator, it is an app implementation detail and not really an OpenEars implementation detail.
Halle WinklerPolitepixOEFliteController uses Flite 1.4 voices, so if you want to build a voice which is compatible with trunk Flite 1.4 it should be compatible with OpenEars, although I only give technical support for the use of OEFliteController with the shipped Slt.framework voice since I am the party who has packaged it for OpenEars and I have a basis for giving assistance with it.
June 28, 2016 at 10:12 am in reply to: Using OpenEars twice for two separate sets of words. #1030630Halle WinklerPolitepixHello,
No, but the normal way to do this is to switch models (this takes no time). There is an example of model switching in the sample app.
Halle WinklerPolitepixWelcome,
Take a closer look at the example in the docs – it isn’t a single dictionary full of repeating keys (this would not be a dictionary, which must have unique keys), it is a dictionary with one key for an array containing several small individual dictionaries.
-
AuthorPosts