Halle Winkler

Forum Replies Created

Viewing 100 posts - 1,901 through 2,000 (of 2,171 total)

  • Author
    Posts
  • in reply to: Problem switching between OpenEars and RapidEars #13501
    Halle Winkler
    Politepix

    Hello,

    It shouldn’t be necessary to have two instances, and it is probably harmful since they may both be accessing the driver and the VAD in a way that is unexpected due to ARC.

    Both PocketsphinxController and PocketsphinxController+RapidEars use a PocketsphinxController instance, so you should be able to use a single instance of PocketsphinxController for both, and when you want to listen with RapidEars use RapidEars’ start method of startRealtimeListeningWithLanguageModelAtPath: and when you want to listen without RapidEars use the basic PocketsphinxController startListeningWithLanguageModelAtPath: method. Just be sure that you use the stopListening method for either before you start the other one. I’ve personally used both from the same PocketsphinxController instance in the same session so I would expect it to work.

    Let me know if this helps.

    in reply to: Need clarification on reducing binary size. #13473
    Halle Winkler
    Politepix

    Now that I’ve had time to double-check, confirming the fact that the “Deployment Postprocessing” build setting hasn’t had a change of name in recent versions of Xcode.

    in reply to: Changing Noise Level for Detecting Speech #13241
    Halle Winkler
    Politepix

    Hi Matt,

    There is no built-in way to do this, but you can investigate this approach with using a different audio unit/audio session type in this thread:

    https://www.politepix.com/forums/topic/add-mode-options-next-version/

    in reply to: Need clarification on reducing binary size. #13236
    Halle Winkler
    Politepix

    Hello,

    1) I believe it should still be called Deployment Postprocessing. I would search for that phrase in the search field.

    2) Correct, that is what it says in the FAQ as well.

    in reply to: Problems using AudioServicesPlaySystemSound with openEars #13128
    Halle Winkler
    Politepix

    This is due to the audio session settings used by the framework. If you want to sidestep the entire issue, just play the sound with AVAudioPlayer. Otherwise you can take a look at the approaches from me and others in these threads:

    https://www.politepix.com/forums/topic/keep-system-sounds-while-listening/
    https://www.politepix.com/forums/topic/conflict-with-audiotoolbox/
    https://www.politepix.com/forums/topic/pocketsphinx-disables-vibrate/
    https://www.politepix.com/forums/topic/simultaneous-mpmovieplayercontroller-video-and-speech-recognition/

    in reply to: OpenEars and the main thread #13121
    Halle Winkler
    Politepix

    How about this very simple approach: every time you launch you generate a complete language model, however, every time you generate a phonetic dictionary you add any unique entries to cmu07a.dic (alphabetized). Consequently, first run will be the big time-consuming generation, every subsequent generation will never use the fallback for a name that has ever been generated via the fallback. No core data/synchronization/repeated fallbacks.

    in reply to: Cancelling speech synthesis in progress. #13117
    Halle Winkler
    Politepix

    You can look at all of the source for OpenEars and modify it for your app. It’s compiled into a framework for your convenience, but it is all shipped with the distribution as source. Here is another discussion of this question that might be helpful to you: https://www.politepix.com/forums/topic/interrupting-speech/

    in reply to: Recording OpenEars Audio Input to File #13112
    Halle Winkler
    Politepix

    Right, you’ll need at least version 1.2 for plugin compatibility. Can you download current version 1.2.4, diff your customizations and reintegrate them into it? There are a few nice additions and fixes since the version you must be working with, so it’s probably worthwhile.

    Generally the version can be found in the OpenEars.xcodeproj source files in the comments at the top, and in a file called Version.txt in the root of the OpenEars directory. Since 1.2 it is also shown in the manual and in OpenEarsLogging.

    in reply to: OpenEars and the main thread #13109
    Halle Winkler
    Politepix

    Yup, the fallback can’t be sped up particularly. It’s actually vastly faster since 1.1 and just needs a fraction of a second now, but as you’ve observed it is a (very) stripped-down form of speech synthesis so there’s no getting around the CPU load and time.

    If this were my thing I think I would keep my own Core Data model of the songs on the device along with their already-created pronunciation, and on launching the app I’d do a comparison of the user’s songs and my database of the user’s songs and just add pronunciations for new ones.

    in reply to: OpenEars and the main thread #13107
    Halle Winkler
    Politepix

    BTW, it’s a good idea to do this in any case, because the fallback method is quite a shot in the dark for this kind of application (i.e. an application like band name listing, where spelling something weirdly or cryptically is part of the aesthetic of the field in question), so if you have too much fallback you are just going to have holes in the pronunciation matching and the user might say it correctly without getting a result.

    in reply to: OpenEars and the main thread #13106
    Halle Winkler
    Politepix

    That is my suggestion. Since 1.2 LanguageModelGenerator has been able to take a custom or customized master phonetic dictionary in place of cmu07a.dic, and for this task I think the best approach is to add to cmu07a.dic and keep using it. In order to get the pronunciations to add I would do the following steps (this involves a little hacking to OpenEars and I’m leaving implementation up to you):

    1. Source some kind of comprehensive list of the most popular tracks with whatever cutoff for popularity and timeframe makes sense for you. Feed the whole thing into LanguageModelGenerator using the simulator and save everything that triggers the fallback method.

    2. Attempt to pick all the low-hanging fruit by trying to identify any compound nouns which consist of words which are actually in the master phonetic dictionary, just not with each other.

    3. Do the tedious step with the remaining words of transcribing their phonemes using the same phoneme system found in cmu07a.dic.

    4. Add them all to cmu07a.dic.

    5. Sort cmu07a.dic alphabetically (very important), use.

    With some good choices about your source data you could make a big impact on the processing time. The rest of it might have to be handled via UX methods such as further restricting the number of songs parsed or only using songs which are popular with the user, and informing the user in cases where this could lead to confusion.

    Remember that for Deichkind and similar, you aren’t just dealing with a unknown word but also a word that will be pronounced with phonemes which aren’t present in the default acoustic model since it is made from North American speech. There’s nothing like that “ch” sound in English, at least as it is pronounced in Hochdeutsch. If your target market is German-speaking and you expect some German-named bands in there it might be worthwhile to think about doing some acoustic model adaptation to German-accented speech. I just did a blog post about it.

    in reply to: OpenEars and the main thread #13104
    Halle Winkler
    Politepix

    Have you taken the advice from OpenEarsLogging to convert everything to uppercase first? You can do this easily by taking your NSString and messaging [myString uppercaseString]

    in reply to: Logging Time of Word Recognition #13089
    Halle Winkler
    Politepix

    Hi Matt,

    You could just give yourself a timestamp using [NSDate date] in whichever RapidEars+OpenEarsEventsObserver delegate method you are taking your recognition results from. If you want it in a particular format you can create an NSDataFormatter as an ivar of your view controller and keep reusing it to get the formatted timestamp:

    http://stackoverflow.com/questions/2035201/iphone-nsdateformatter

    in reply to: Preloading speech #13087
    Halle Winkler
    Politepix

    In NeatSpeech, ongoing speech is always queued for instant- or near-instant playback — just keep on giving text to the sayWithNeatSpeech: method without waiting for the in-progress speech to complete and it will be processed in the background and queued up for fast-as-possible playback once the current in-progress utterance is complete. Alternately, you can just give a very long chunk of text at the start and NeatSpeech will divide it smartly into process-able pieces, queue them up, and load them into the player as they complete. Are you seeing a different result?

    in reply to: Getting Output level for NeatSpeech #13086
    Halle Winkler
    Politepix

    OK, good to know, thanks. In a pinch, from outside of the class you can always check whether speech is in progress by checking the boolean [self.fliteController.audioPlayer isPlaying].

    in reply to: Large Number Grammar #13083
    Halle Winkler
    Politepix

    This isn’t a feature of FliteController, but NeatSpeech operates with a queue and it renders the new speech in the background so that it generally starts playing instantly when the previous speech is complete, and it has a male and female UK voice.

    in reply to: Large Number Grammar #13081
    Halle Winkler
    Politepix

    To learn about how an acoustic model is adapted you probably want to check out the CMU Sphinx project, since that isn’t something I can support from here beyond pointing you to the docs at the CMU project since it isn’t part of OpenEars: http://cmusphinx.sourceforge.net/wiki/tutorialadapt

    The corpus of speech you would want to use in order to adapt to a UK accent for your particular application would have a number of different speakers with the desired UK accents saying the words for which you want more accuracy (I would have them say all of the words in your language model). Basically you will want to make recordings of your speakers saying the words and then you will use the acoustic model adaptation method linked above to integrate their speech into the acoustic model. The result ought to be that your adapted acoustic model will get better at recognizing/distinguishing between those words in the accents you include. The acoustic model you end up with can be used with OpenEars just like the default acoustic model.

    in reply to: Large Number Grammar #13078
    Halle Winkler
    Politepix

    Looks like a good start. There might be an accent bias hurting accuracy since the default acoustic model is comprised of US speech. You might want to adapt the model to a variety of UK accents using your number set as the speech corpus. This may get you some improvement with the thirty/fifty/eighty issue.

    in reply to: Large Number Grammar #13075
    Halle Winkler
    Politepix

    I’ve never thought about this task so this is not coming from a position of experience with it, but if the maximum is (for instance) 999,999 this seems to me that it would need [0-9], a set of tens incrementing by ten going up to “90”, a set of hundreds incrementing by 100 going up to “900”, and a set of thousands incrementing by 1000 going up to “9000”, so a model with a base set of 40 unigrams which have equal probability of being found in a particular bigram or trigram. Out of that you can make 999,999 with the available words “nine hundred”, “ninety” “nine thousand” “nine hundred” “ninety” “nine”. It seems that interpreting this back into digits should be possible to construct a ruleset for since there are only a few variations on correct statement of a number in English. I can also see why you would want a grammar, however, to have a rules-based recognition that you can be more confident about processing backwards into digits.

    in reply to: Large Number Grammar #13073
    Halle Winkler
    Politepix

    Hello,

    I’m not aware of a pre-rolled grammar for large numbers, sorry. I generally recommend not using JSGF due to slow performance and what seems like slightly buggy recognition in the engine. Have you tried generating a text corpus of number words and creating your own ARPA language model (like in this blog post: https://www.politepix.com/2012/11/02/openears-tips-1-create-a-language-model-before-runtime-from-a-text-file/)?

    in reply to: Recording OpenEars Audio Input to File #13066
    Halle Winkler
    Politepix

    Hello,

    Maybe SaveThatWave does what you’d like: https://www.politepix.com/shop/savethatwavedemo. It wraps up starting/stopping non-blocking WAV saving from incoming speech, and notifications that the file was recorded with the path to the recording as part of the notification, handles logic for suspending/resuming and other corner cases, and wraps up deleting a single recording and deleting all recordings. You can use the tutorial tool to integrate it very easily: https://www.politepix.com/openears/tutorial

    in reply to: Getting Output level for NeatSpeech #13011
    Halle Winkler
    Politepix

    This is fixed in OpenEars 1.2.4 out today.

    in reply to: Getting Output level for NeatSpeech #12775
    Halle Winkler
    Politepix

    Whoops, this may have been an oversight in version 1.0, I will check it out. If you want to try to make a quick change to OpenEars in the meantime that might help, find this line in the project OpenEars.xcodeproj in the class FliteController:

    – (Float32) fliteOutputLevel {
    if(self.speechInProgress == TRUE) {

    and change it to:

    – (Float32) fliteOutputLevel {
    if([self.audioPlayer isPlaying] == TRUE) {

    Then recompile the OpenEars framework and use in your app. Let me know if that results in audio output being restored.

    in reply to: OpenEars and the main thread #12660
    Halle Winkler
    Politepix

    Yup, I would expect that to take a bit of time. Is there a requirement that they be dynamic, or would this blog post help: https://www.politepix.com/2012/11/02/openears-tips-1-create-a-language-model-before-runtime-from-a-text-file/

    My other question is about the kind of multithreading. Have you experimented with doing it with a block (or even a timer) if you’ve been using NSThread? I do a lot of multithreaded coding and I used to use NSThread and I’ve entirely switched over to blocks/GCD because it’s so much cleaner.

    in reply to: interrupting speech #12537
    Halle Winkler
    Politepix

    Super, glad it is working.

    in reply to: Error while integrating RapidEarsDemo framework #12513
    Halle Winkler
    Politepix

    Super, glad to hear it!

    in reply to: Error while integrating RapidEarsDemo framework #12508
    Halle Winkler
    Politepix

    You are using an old version of OpenEars, just update to the current OpenEars version.

    in reply to: Playing voice out left channel only #12478
    Halle Winkler
    Politepix

    Hi Josh,

    I honestly have no idea! I’ve never considered channel-splitting on the new iPads. But, if it is possible to do this at all, it is most likely controlled by the audio session setting, so the good news is that it probably isn’t dependent on NeatSpeech but instead something that could (possibly) be achieved by changing OpenEars’ general audio session settings in AudioSessionManager. “Is is possible to route audio output out of a single channel on a stereo iPad?” might be an interesting question for Stack Overflow.

    in reply to: Change pitch without recreating voice obj #12474
    Halle Winkler
    Politepix

    Hello,

    In NeatSpeech, voice characteristics have to be set during initialization.

    in reply to: OpenEars and the main thread #12183
    Halle Winkler
    Politepix

    Nope, it doesn’t make any notable continuous use of the disk. Have you completely ruled out something going on in your app? If you move the operations in the sample app to another thread, do you see the same results? I would try to simplify the test case as much as possible until you can replicate the issue with only one code addition.

    in reply to: interrupting speech #12159
    Halle Winkler
    Politepix

    Side note: Great tool. It was easy to install and worked the first time.

    Thanks! Always nice to hear. Hmm, this is more something that I’ve thought of as being solvable via UI (for instance, however you signal the user that you are listening, don’t signal that when the app is talking) but I can definitely see the advantage to being able to stop playback programmatically.

    This would really not be hard for you to hack in quickly, because you are only trying to call FliteController’s method interruptTalking directly. You’d just need to check whether FliteController exists and if so, call interruptTalking.

    If you wanted to bring the complete OpenEars architecture to bear on it so everything is nicely decoupled and standard, you could create a new OpenEarsEventsObserver delegate method (fliteSpeechRequestedToInterruptTalking might be a self-describing name) and look at the various examples of how notifications are sent to OpenEarsEventsObserver throughout the framework to see how to signal it when you want to request that playing speech is interrupted (presumably this would happen as a result of a new method for PocketsphinxController which could be named something like “interruptTalking”). Then you can add the new delegate callback to FliteController since it already inherits OpenEarsEventsObserver and when it is called, call interruptTalking. If/when I add this feature it will look something like that. The advantage is that the parts of the framework can remain ignorant of each other’s status.

    in reply to: Audio driver issues in combination with AirPlay #12078
    Halle Winkler
    Politepix

    Interesting, thank you for the follow-up.

    in reply to: OpenEars and the main thread #12046
    Halle Winkler
    Politepix

    OpenEars expects to be launched on mainThread and to handle its own multithreading, but I wouldn’t automatically expect a problem with launching it on a secondary thread. Certain operations will always be returned to mainThread (notifications sent to OpenEarsEventsObserver and delegate callbacks of OpenEarsEventsObserver will always go to mainThread, I think that audio playing in FliteController must be initiated mainThread IIRC, maybe there are some other examples of things which expect to initiate from or return to mainThread). But, if all you are doing is running LanguageModelGenerator and you are running it on a background thread, my guess is that the issue is in the actual threading, because LanguageModelGenerator works as a series of single-threaded activities so it should be possible to background. Are you using blocks, NSTimer or NSThread?

    in reply to: Minimum volume threshold to detect speech #11785
    Halle Winkler
    Politepix

    It’s tricky — having noise cancellation and noise suppression is obviously a good thing, but the VAD in OpenEars is partial to non-noise-suppressed sources. I personally don’t use VoiceProcessingIO because it doesn’t seem to get the same degree of QA as RemoteIO (or it just has a lot more options that need QA attention) and I’ve had it stop working in a couple of minor OS updates on a couple of devices, which is a little bit too much of a needle in the haystack situation for maintaining a framework.

    Actually, thinking of the VAD and its issues with noise-suppressed sources, I wonder if any of the command-line options in PocketsphinxRunConfig.h would help you. It might be worth a quick look in there to see if there is anything relevant you can turn on (maybe AGC or dithering or something).

    Something I’ve noticed is that as a developer you will tend to try to find a test space with the least ambient noise possible because it’s unproductive to test speech applications in an uncontrolled environment, but real user environments are almost always noisier, so you might not need to worry too much about the corner case of an extremely quiet environment that has a slightly-less-quiet noise in it. I used to have a very similar issue when testing AllEars (which ended up providing the OpenEars code) that a sparrow would visit my balcony and the relatively quiet cheeping would totally ruin recognition, but I haven’t received user reports of similar scenarios.

    in reply to: Minimum volume threshold to detect speech #11780
    Halle Winkler
    Politepix

    Hiya,

    I meant not overriding the audio session or audio unit settings, since I think the OpenEars defaults are less likely to zero out low noise buffers, meaning that there will be less of a difference between a low noise buffer and a not-that-much noise buffer, which theoretically might help the VAD to not overreact to not-that-much-noise buffers.

    in reply to: Audio driver issues in combination with AirPlay #11778
    Halle Winkler
    Politepix

    Hi Ferm, check out this thread for an example of code which stops listening after an arbitrary interval — I think you asked about how to do this so maybe it will help:

    https://www.politepix.com/forums/topic/kill-voice-recognition-thread

    in reply to: Minimum volume threshold to detect speech #11776
    Halle Winkler
    Politepix

    Hiya,

    Sorry, there is no trivial way to do this. You can only attempt to hack ContinuousADModule.mm with the proviso that wrong values forced into the VAD usually cause crashes. You might have better results going with the slightly less-sensitive default settings because the noise reduction might be causing an artificially wide differential between minor noises and noise-supressed quiet.

    in reply to: Speed of speech #11753
    Halle Winkler
    Politepix

    You can just set the property after initializing the voice and FliteController, and before messaging fliteController with say:withVoice:.

    self.fliteController.duration_stretch = 0.5;
    [self.fliteController say:@”something” withVoice:self.slt];

    in reply to: Audio driver issues in combination with AirPlay #11751
    Halle Winkler
    Politepix

    I don’t really think those things are correlated though. This is the first example of this issue and it is happening when using a technology that has a network buffering requirement, so it is probably related to the network buffering.

    in reply to: Audio driver issues in combination with AirPlay #11750
    Halle Winkler
    Politepix

    Well, in my opinion that isn’t a bug, it’s expected behavior when moving an app to the front that has its own audio session. Mixing with another session is opt-in behavior for OpenEars but it would be a questionable default behavior since all development would have to proceed from the possibility that there might be other unknown incoming audio while running recognition.

    in reply to: TTS say phonemes #11747
    Halle Winkler
    Politepix

    Just wanted to follow up on this issue with the TTS voice quality and mention that there is now a plugin for OpenEars which lets it use better TTS voices which are as fast as the Flite voices but much clearer, and it can process long statements and multiple statements in a row much faster than the Flite voices. It’s called NeatSpeech and you can read more about it here: https://www.politepix.com/neatspeech

    in reply to: Acoustic model issue in app #11739
    Halle Winkler
    Politepix

    Super, glad to hear everything is working for you.

    in reply to: Acoustic model issue in app #11737
    Halle Winkler
    Politepix

    This happens because the Framework folder wasn’t dragged in to the app or because it was dragged in to the app with “Create folder references for any added folders” selected.

    in reply to: Acoustic model issue in app #11736
    Halle Winkler
    Politepix

    No, no issue with ARC, you just need to make sure that the contents of the Framework folder are all really added to your app. From the tutorial:

    “Inside your downloaded OpenEars distribution there is a folder called “Frameworks”. Drag that folder into your app project in Xcode. Make absolutely sure that in the add dialog “Create groups for any added folders” is selected and NOT “Create folder references for any added folders” because the wrong setting here will prevent your app from working.”

    in reply to: Acoustic model issue in app #11733
    Halle Winkler
    Politepix

    BTW, please don’t test recognition on the Simulator.

    in reply to: Acoustic model issue in app #11732
    Halle Winkler
    Politepix

    Your acoustic model isn’t in your app:

    ERROR: ā€œacmod.cā€, line 91: Folder ā€˜/Users/c5163090/Library/Application Support/iPhone Simulator/5.1/Applications/45AFBC47-26D1-4BA0-A26A-F9722E4BC34C/SaskPower.appā€™ does not contain acoustic model definition ā€˜mdefā€™

    in reply to: Acoustic model issue in app #11713
    Halle Winkler
    Politepix

    Right here: https://www.politepix.com/forums/topic/install-issues-and-their-solutions/

    It is also in the documentation and the FAQ if you need more info.

    in reply to: Acoustic model issue in app #11710
    Halle Winkler
    Politepix

    Thank you for confirming with me that the issue is not with the sample app, that would have started a huge round of re-testing.

    OK, there are a couple of possibilities. The first possibility is that your version of Xcode is too old or you are compiling with GCC rather than LLVM. The only fix for this is to use a recent Xcode and to use LLVM. The other possibility is that there is a problem with your integration into your app (I think this is more likely) so please just do the regular steps of turning on OpenEarsLogging and turning on verbosePocketSphinx and posting the output here. It’s very likely that your acoustic model simply wasn’t moved into your app and you are getting a crash when speech is attempted to be detected and there is no acoustic model, but this will be shown as an error in OpenEarsLogging so please turn it on.

    in reply to: Acoustic model issue in app #11704
    Halle Winkler
    Politepix

    Hello,

    Which version of Xcode are you using?

    in reply to: Compatibility issue with CMU JSGF example #11690
    Halle Winkler
    Politepix

    OK, I will put this down as a bug since it should also be able to parse CMU’s version, but I’m glad it’s working for you.

    in reply to: Compatibility issue with CMU JSGF example #11687
    Halle Winkler
    Politepix

    OK, let me know if the issue persists with your own app.

    in reply to: Compatibility issue with CMU JSGF example #11684
    Halle Winkler
    Politepix

    Does it work if you use a grammar and dictionary combination that doesn’t result in this error from the logs:

    ERROR: “fsg_search.c”, line 334: The word ‘QUIDNUNC’ is missing in the dictionary

    in reply to: Compatibility issue with CMU JSGF example #11682
    Halle Winkler
    Politepix

    OK, I will check it out on Friday, but can you post the output of OpenEarsLogging and verbosePocketSphinx here so I can get a head start?

    in reply to: Accuracy with Irish accent in speech recognition #11667
    Halle Winkler
    Politepix

    No problem, the best way to replicate for an accent that is not your own accent is to obtain WAV recordings of speech which should work (i.e. it contains the words in your language model) and run it through – (void) runRecognitionOnWavFileAtPath:(NSString *)wavPath usingLanguageModelAtPath:(NSString *)languageModelPath dictionaryAtPath:(NSString *)dictionaryPath languageModelIsJSGF:(BOOL)languageModelIsJSGF;

    This should show very quickly whether the issue is in the language model/dictionary and/or is due to an issue with how the app is being tested.

    in reply to: Accuracy with Irish accent in speech recognition #11665
    Halle Winkler
    Politepix

    Basically, you know you have bad data because there are misspelled and non-English entries in the phonetic dictionary, and a test that really had a 3% accuracy rate has to have been somehow mis-administered or was done on too little data to be meaningful. So if you start to adapt the language model based on this bad data, you will get more bad data out. The first step is getting rid of the issues you already know about and then getting into the accent adaptation once you are seeing tests with believable results (for me this would be better than 40% accuracy rate at least).

    in reply to: Accuracy with Irish accent in speech recognition #11664
    Halle Winkler
    Politepix

    I don’t think you have enough information yet to commit to going down that path since you haven’t replicated the issue in-house with real numbers under known-working conditions, which means you don’t have a way of measuring improvement. But you can adapt the language model to an Irish accent — ask at the CMU board for specifics since it is a question about Pocketsphinx and SphinxTrain. They will also ask for concrete accuracy numbers and directly observed behavior if you ask about accuracy so it’s a good idea to set up your tests locally before getting started with that.

    in reply to: Accuracy with Irish accent in speech recognition #11662
    Halle Winkler
    Politepix

    OK, for future reference it would be helpful to lead with this information, or give it in response to the questions asked about it. It’s been some work for me to find out what accents you are really trying to recognize and what accuracy levels you are seeing. I would not expect great accuracy for Irish accent recognition with the default language model, which is entirely made up of US accents.

    This sounds like a subjective report: “An Irish female tester reported that accuracy was as low as 3% when she was holding iPhone as one normally would. When she increased the distance between herself and device, the accuracy got better.”

    The reason it sounds like a subjective report is that I doubt she tested 100 times (and if she did, is it 100 repetitions of the same phrase or 100 different phrases?), so 3% is more likely to be a qualitative statement for “recognition was bad in my testing round”. Reasons for this could be diverse — it could have something to do with your UI, it could have something to do with the environment she is testing in, it could have to do with the non-English (MEATH ) and misspelled (VETINARY ) words in your vocabulary which will have pronunciation entries in the dictionary which will never be spoken, it could have to do with her expectations of what can be recognized (most end users don’t realize the vocabulary is limited and/or that saying a lot of extra things that are outside of the vocabulary will affect recognition quality).

    The symptom that recognition is worse when she is close to the phone is unlikely to be strictly true since closeness to the phone improves recognition under normal testing, so what is more likely is that the other variables mentioned above changed at the time that she got farther from the phone. I’m sure there is something to it but it is an isolated data point that is unexpected so it needs replication from your side.

    I can’t really remote bugfix an issue you are receiving as a remote report — at some point, someone needs to make a first-hand observation of the issue and test it in an organized way and replicate. If something was wrong with her test session (she was saying words that aren’t in the vocabulary, or it was really noisy, or the UI in the app was giving her the impression it was listening at a time that it wasn’t listening) it’s harmful to try to adapt your approach to that limited data.

    My recommendation is to obtain some WAV recordings of your speakers saying phrases that should be possible to recognize with your vocabulary and put them through PocketsphinxController to find out what the accuracy levels are for them. It is important to check the words in your OpenEarsDynamicGrammar.dic file that were not found in cmu07a.dic and make sure that the phonetic transcription in there is a real description of how someone would say those words, and to remove any typos, because if you have “VETINARY” and someone says “VETRINARY”, not only will you not recognize “VETRINARY” but it will hurt the recognition of any other words in the statement since you now have an out-of-vocabulary utterance in the middle of the statement.

    in reply to: Accuracy with Irish accent in speech recognition #11658
    Halle Winkler
    Politepix

    Can you be very specific with me about what this means: “Language is English, closest to American dialect I believe, although neither of the speakers are native Americans.” Where are the speakers from and what is their native language and why did you choose them for evaluating accuracy levels for English speech recognition?

    The reason I ask is because it’s unusual to say that a non-native speaker has a native dialect or close to one — that’s an exceedingly unusual outcome in language learning. As an example, I speak German as a non-native speaker and the regional accent of German that I speak is probably closest to the Northwestern German pronunciation, but no one from that region would say that I had a Northwestern German dialect because my US accent is at least as strong as a regional German accent in my speech. I would not be a good test subject for evaluating accuracy of German speech recognition.

    in reply to: Accuracy with Irish accent in speech recognition #11657
    Halle Winkler
    Politepix

    My suggestion is to see which words are being put in your OpenEarsDynamicGrammer.dic file which are not present in cmu07a.dic and look at the pronunciations that are listed therein and make sure that they are accurate descriptions of the way those words are pronounced. If they are not, those words will never be successfully recognized.

    >At worst, we would like to have at least 60% accuracy (that is, 6 successes from 10 experiments). 80% and higher accuracy would be good enough.

    Could you please tell me your current accuracy rate? It is improbable that you will get 80% for non-native speakers.

    I would turn on logging (both verboseLanguageModelGenerator and OpenEarsLogging) and see if there are any error or warning messages.

    in reply to: Accuracy with Irish accent in speech recognition #11654
    Halle Winkler
    Politepix

    You can’t use synthesized text for testing recognition, it has to be a real speaker.

    I’m confused about the idea of single word phrases — are they single words, or phrases?

    Unfortunately you are always going to see reduced accuracy when the speaker has an accent, unfair as it is. What is the accuracy rate you are seeing?

    Take a look at the .dic file that is output for the words which aren’t found in the cmu dictionary, because if the fallback method gets the pronunciation wrong, it won’t be recognized correctly.

    in reply to: Accuracy with Irish accent in speech recognition #11652
    Halle Winkler
    Politepix

    Welcome,

    Can you tell me a bit more about your application? Language, OpenEars version, which device, which kind of audio recording input, are you using any other media playback objects in your app like AVPlayer or MPMoviePlayerController, what is the accent, gender and age of the parties you are testing with, what is the accuracy rate you are seeing, have you verified that you are never sending any messages to the audio session or AVAudioSession, are the phrases found in the lookup dictionary or not (it’s the file CMU07a.dic that ships with the framework)?

    It should work ideally when users are closer to the device so that is a sign that there could be an implementation issue.

    in reply to: Selectively enabling or disabling Rapid Ears #11644
    Halle Winkler
    Politepix

    Welcome,

    I would try RapidEars first to see if the accuracy is an issue, but if it is, you can use them both in the same app because they are started with separate start commands. The issue is that you can’t switch between them right in the middle of a recognition session, so you have to stop the recognition of one and then start a new recognition with the other. The only real downside to this is that it is necessary to recalibrate.

    in reply to: Pocketsphinx disables vibrate? #11638
    Halle Winkler
    Politepix

    The audio session that is required for an app that plays back and records audio disables system sounds, unfortunately. You can try enabling OpenEars’ AudioSessionManager’s soundMixing property as described in this thread: https://www.politepix.com/forums/topic/flitecontroller-pauses-music/

    in reply to: Limit the number of words to detect #11627
    Halle Winkler
    Politepix

    Are you seeing that in your profiling?

    in reply to: Limit the number of words to detect #11624
    Halle Winkler
    Politepix

    I don’t think the timer will get good results, people speak at different speeds and words have different lengths. How about ignoring anything after the first space in the hypothesis?

    in reply to: Placing code in app delegate instead of view controller #11622
    Halle Winkler
    Politepix

    No problem, this is what I had in mind with the design so I’m happy when it’s used this way.

    in reply to: Placing code in app delegate instead of view controller #11620
    Halle Winkler
    Politepix

    Sure, if you like, or for a more standard architecture you can also put it in a root-level view controller and send commands back to it via delegate callbacks from the child controllers. The only thing I don’t support is putting it in a singleton because that usually goes badly due to all of the multithreading.

    in reply to: [Resolved] Does the current SDK support iOS6? #11617
    Halle Winkler
    Politepix

    Hi Greg,

    I have the definitive answer to why this is sometimes happening — OpenEars requires that everything added to mainBundle is added at the root level (just because I don’t want to open up the can of worms of either trying to figure out at what level some crucial file is added, or forcing the developer to declare it for every file that the framework uses) and in the “add file” dialog in Xcode, if the option “Create folder references for any added folders” is selected, Xcode will create a folder inside the app’s mainBundle and put the files in there. So the acoustic model files were almost certainly in your app, but they probably couldn’t be found by the framework due to being in their own directory. I have improved the documentation to make sure this is clear and I appreciate your drawing my attention to the fact that it was a pitfall.

    in reply to: ConvertInput error in pocketsphinxDidReceiveHypothesis #11615
    Halle Winkler
    Politepix

    That is a little bit unlikely as the underlying cause since OpenEars itself has at least two OpenEarsEventsObservers instantiated internally, then add your first one and you’re up to a minimum of three out of the gate.

    Not debating that it is helping the issue you are seeing, just that it is probably ultimately due to something else. I haven’t seen the full logs and I imagine there are nuances to the code that can’t be put across via small snippets but it would be great if you could email me a stripped-down test case. Looking at this report of the same issue, it really looks like the audio session category is being overridden somewhere:

    http://stackoverflow.com/questions/5215919/convertinput-using-invalid-anchor-time-error-received-when-recording-on-device

    in reply to: Audio driver issues in combination with AirPlay #11608
    Halle Winkler
    Politepix

    >Where in the code would I look at the buffers?

    All the buffer code is in the ContinuousAudioUnit module. The buffer callback is at the top of the implementation. Unfortunately it isn’t the easy implementation to get to grips with since it’s in C and has a custom ringbuffer, but I think I would just focus on the callback and the initial configuration in OpenAudioDevice.

    You can also try experimenting with the settings in AudioSessionManager and research whether there are any particular pitfalls with certain audio session settings other developers are encountering with AirPlay.

    I forgot to ask — are you sending any messages to the shared audio session anywhere? This is a very common cause of issues like this. You can check by doing a case-insensitive search for “audiosession” in your code.

    in reply to: ConvertInput error in pocketsphinxDidReceiveHypothesis #11607
    Halle Winkler
    Politepix

    Interesting. We know that there isn’t an inherent conflict between a class inheriting the delegate protocols of OpenEarsEventsObserver and AVAudioPlayer because FliteController inherits the delegate protocols of both. But I don’t think your interpretation is offbase because there is a lot going on there — the recognition audio unit, FliteController’s audio and its callbacks, and your audio and its callbacks.

    Do you want to show me some logs featuring OpenEarsLogging and verbosePocketSphinx for the timeframe in which this is occurring? Maybe there are some hints.

    Something to just double-check is that a lot of AVAudioPlayer sample code contains unneeded calls to AVAudioSession and a lot of developers who asked me about similar issues found calls to the audio session in their code that were responsible. It might be worth a quick project-wide search for “audiosession” without case matching to make sure that some audio session overrides didn’t sneak in there.

    in reply to: Audio driver issues in combination with AirPlay #11606
    Halle Winkler
    Politepix

    I can only speculate at the moment and I’m not in front of the code, but IIRC the audio unit has both an input and an output and the audio session category is also input and output, meaning that the audio unit callback buffer is doing double duty. In Core Audio the callback buffer size is just a request, not a contract, so my suspicion is about whether a backlog of unsent output buffers or a changing buffer size is having an effect on the input buffer size or ability to call back. There is headroom in the ringbuffer for changing input buffer sizes but ultimately it’s a fixed amount.

    In this case I would expect more reported errors in the callback, but if I could check this out right now, this is where I would be looking. You can experiment with turning off the output in the audio unit to see if it helps (you basically already know that disabling output helps because using a different output from AirPlay helps, so that is where I’d look).

    in reply to: Audio driver issues in combination with AirPlay #11602
    Halle Winkler
    Politepix

    Probably, but I just did a number of OpenEars updates and this is the only open bug right now and it needs a lot of testing and setup to fix, so I can’t promise I will be able to get to it soon. You’re also welcome to send a patch if you can see the fundamental issue since you are already using the setup on which it is occurring.

    in reply to: Audio driver issues in combination with AirPlay #11600
    Halle Winkler
    Politepix

    That’s what I was suggesting — you should be able to detect programmatically ahead of time when you have a detection loop that isn’t working by checking whether the input level is stuck as it is in your loop above.

    in reply to: Audio driver issues in combination with AirPlay #11598
    Halle Winkler
    Politepix

    This is related to Airplay, it’s the reason there is a buffer underrun. If you read pocketsphinxInputLevel on mainThread it will block.

    in reply to: Audio driver issues in combination with AirPlay #11596
    Halle Winkler
    Politepix

    > Do you mean that the audio recognizer is not started in the case where (1) happens?

    I think it’s started but there isn’t enough of a stream to keep up with the buffer looping and the calibrator is then reading zeroes and never getting to the point that it can identify a silence state or a speech state.

    in reply to: Audio driver issues in combination with AirPlay #11595
    Halle Winkler
    Politepix

    I would expect that more than a tenth of a second of a stuck input level is probably a bad loop.

    in reply to: Audio driver issues in combination with AirPlay #11594
    Halle Winkler
    Politepix

    I think that if you want to catch the lack of recognition in progress right now, you should look to see if pocketsphinxInputLevel is just returning zeroes or a fixed number. Remember to check it on a thread other than mainThread.

    in reply to: Audio driver issues in combination with AirPlay #11593
    Halle Winkler
    Politepix

    Ah, I see your issue, you aren’t receiving the error callback during the loop in which you have no recognition. I think that that must be a buffer underrun. I’ll look into it.

    in reply to: Audio driver issues in combination with AirPlay #11588
    Halle Winkler
    Politepix

    The error isn’t really that it stops detecting speech, it is that the audio unit that feeds audio to the recognizer doesn’t start at the top of a particular recognition loop which means that when the engine gets to the audio the engine (most likely the VAD) eventually crashes because it’s expecting information.

    So (1) isn’t the bug or the event, it’s just one result of the audio unit being unexpectedly stopped when the loop restarts. I’m entering it as an airplay issue but I can’t give you a timeframe for fixing it.

    pocketSphinxContinuousSetupDidFail is there so that you can react to an error state, have you just tried stopping recognition when you receive it so that you don’t keep running the recognizer until it crashes due to not having a calibrated state?

    in reply to: Audio driver issues in combination with AirPlay #11586
    Halle Winkler
    Politepix

    OK, a quick look at the error you’re getting here:

    http://www.google.com/search?hl=en&as_q=airplay+AUIOClient_StartIO+failed

    Makes me think that this is happening when you don’t have connectivity to the network device at the exact moment that the audio unit needs to start.

    I would recommend going to ContinousAudioUnit.mm’s function:

    int32 startRecording(PocketsphinxAudioDevice * audioDevice);

    And seeing if you can patch in a test for the airplay connection before this line:

    OSStatus startAudioUnitOutputStatus = AudioOutputUnitStart(audioDriver->audioUnit);

    which is where the attempt is made to start the audio unit. I don’t have an airplay device so for me to fix it it would have to wait until whatever point in the future I have one.

    in reply to: Audio driver issues in combination with AirPlay #11584
    Halle Winkler
    Politepix

    OK, I don’t see the beginning of the OpenEarsLogging logs there, which version of OpenEars is that?

    in reply to: Audio driver issues in combination with AirPlay #11581
    Halle Winkler
    Politepix

    Welcome,

    Can you please post your complete logging output with verbosePocketSphinx on and OpenEarsLogging on? I don’t have a way of helping you unless you do and I’ve never received a report of this behavior so I’d like to see everything.

    > Secondly if (1) happends I feel I should still be able to programatically stop and then start listening.

    Sorry, it isn’t possible to assure any kind of of functionality when there is unknown behavior that might be due to a bug. This is most likely because AirPlay is unknown territory but it might become clearer when you post the logs.

    in reply to: Using open Ears Framework in my app #11562
    Halle Winkler
    Politepix

    Hi Vignesh,

    Welcome. There is no problem with app acceptance and OpenEars, it has public APIs and only uses public APIs. This is covered in the FAQ but you can also see several well-known apps that use OpenEars to the right of the main OpenEars page at https://www.politepix.com/openears so you don’t have to take my word for it.

    Reducing your application size is also covered in the FAQ so give it a read:

    https://www.politepix.com/openears/support

    OpenEars should only increase your app size by about 7 mb if you are using speech recognition and maybe another 5mb per voice used for TTS.

    By the way, the App Store doesn’t limit your app size to 20mb; there are apps that are hundreds of mb in size. But I agree that it is friendly to your users to create the smallest app possible.

    in reply to: Commercial use #11558
    Halle Winkler
    Politepix

    No prob, good luck with your app.

    in reply to: Commercial use #11556
    Halle Winkler
    Politepix

    Here is an example of the CMU license:

    http://www.speech.cs.cmu.edu/flite/doc/flite_2.html#SEC2

    in reply to: Commercial use #11555
    Halle Winkler
    Politepix

    Ah, sorry, I over-edited the FAQ recently. OK, I can’t give you permission or advice for things relating to CMU because I am neither part of that program nor a lawyer. However, as another developer who uses their libraries I have the subjective impression that it is sufficient to credit each library and link to their agreement since the terms of their license are analogous to the MIT license. Something like this in your about appears to me as a good-faith attempt to meet CMU’s crediting request and mine:

    $MYAPP uses CMU Sphinx, CMU Flite, CMU CLMTK [link here to the CMU agreement] and Politepix’s OpenEars [link here to politepix.com/openears].

    in reply to: Commercial use #11553
    Halle Winkler
    Politepix

    Sure, you can use it commercially for the purposes described in the license.

    Powered by Politepix is nice and I would find that fine for my own credit if there was some kind of link so curious parties could find out what that meant. It would still be necessary to credit CMU so take a look at that part of the FAQ: https://www.politepix.com/openears/support

    in reply to: [Resolved] Does the current SDK support iOS6? #11535
    Halle Winkler
    Politepix

    Having tested this again, I wanted to follow up that I can’t replicate an issue with adding the Framework folder itself as shown in the tutorial so I don’t think this issue is precisely that it is necessary to import that folder’s contents individually.

    It’s likely that re-importing the acoustic model files individually fixes something minor that has gone wrong a bit earlier in the process with the addition of that folder (for instance if the target was accidentally unchecked while adding or it was added as a reference and the reference was somehow problematic or became so later) so I strongly endorse gregquinn’s method described above as a troubleshooting step if you see a crash when listening starts.

    in reply to: Recognizing acronyms and numbers #11534
    Halle Winkler
    Politepix

    This is unfortunately something that doesn’t work that wonderfully with speech recognition — accuracy for this application of the functionality is never great. But it shouldn’t be any problem using the dynamic generation to create a language model that recognizes letters and numbers. In what sense is there no success, do you have low accuracy (that is unfortunately to be expected for the specified requirement) or do the language models lack entries for your letters and numbers?

    One thing you can try if you are dealing with entire acronyms that are known at the time of language model creation (versus arbitrary combinations of letters) is to use the whole acronym as the corpus or array entry and then edit the phonetic dictionary.

    So, instead of having these in the array: @”A”, @”B”, @”C”

    You would have this in the array: @”ABC”

    And then you would need to edit the phonetic dictionary which is created so that the pronunciation associated with the word “ABC” is the correct phoneme sounds for “A”, “B” and “C” in sequence.

    To work with already-created language model and dictionary files instead of making new ones at runtime you can follow these instructions from the docs:

    If you need to create a fixed language model ahead of time instead of creating it dynamically in your app, just use this method (or generateLanguageModelFromTextFile:withFilesNamed:) to submit your full language model using the Simulator and then use the Simulator documents folder script to get the language model and dictionary file out of the documents folder and add it to your app bundle, referencing it from there.

    in reply to: [Resolved] Does the current SDK support iOS6? #11532
    Halle Winkler
    Politepix

    No problem, glad it’s working for you.

    in reply to: [Resolved] Does the current SDK support iOS6? #11530
    Halle Winkler
    Politepix

    OK, it doesn’t actually have to be added in any particular way, however what’s important is that the entire Framework folder found at the root of the OpenEars distribution folder is dragged from the Finder into your project. That has to be the original Framework folder with the full contents in it that are found at the time of downloading the distribution. You can either add the contents as a reference or add them by copying, but if you add them as a reference, the folder has to stay where it was at the time the reference is added.

    in reply to: [Resolved] Does the current SDK support iOS6? #11528
    Halle Winkler
    Politepix

    To clarify — are you saying that with [OpenEarsLogging startOpenEarsLogging] and pocketsphinxController.verbosePocketSphinx = TRUE set you receive no logging at all up to the time of the crash?

    in reply to: [Resolved] Does the current SDK support iOS6? #11527
    Halle Winkler
    Politepix

    Hmm, yes it definitely does. That kind of sounds like a missing acoustic model or a missing part of the acoustic model (meaning perhaps something going wrong when dragging the “Framework” folder into the app). Can you run the sample app in iOS6?

    in reply to: OpenEars intigartion with camera control #11493
    Halle Winkler
    Politepix

    You’re welcome, good luck with your app.

    in reply to: OpenEars intigartion with camera control #11491
    Halle Winkler
    Politepix

    Solving this would be an advanced undertaking that would require you to thoroughly research the iOS audio session and do a lot of self-guided experimentation in order to learn what is needed. Maybe it’s possible but it isn’t something I can walk you through unfortunately.

    in reply to: OpenEars intigartion with camera control #11489
    Halle Winkler
    Politepix

    OK, I think the issue is simply that the audio stream is not provided to PocketsphinxController since it is used by the video picker.

    in reply to: Start & Stop Listening #11486
    Halle Winkler
    Politepix

    OK, I don’t actually support simulator-only issues, not because I don’t care at all but because the simulator just hosts the local audio devices of the desktop or laptop, meaning that any individual simulator-only issue can potentially be due to just one particular arrangement of hardware that is on the local machine, meaning that any fix might only fix something that is special to one desktop or laptop version or audio conversion device.

    Since the end-user of the app will never encounter the issue, I made the judgement call that it would be better to ignore simulator-only issues than to put heavy debugging time into issues that don’t affect app users. I hope that’s understandable and that you’ll let me know if you see issues with enduser devices.

    in reply to: Start & Stop Listening #11484
    Halle Winkler
    Politepix

    The logging is from the simulator, have you ever seen this on a device? I haven’t seen it in my device testing of iOS6.

    in reply to: Start & Stop Listening #11482
    Halle Winkler
    Politepix

    Hello,

    Is this with the unchanged sample app or a version that has changes?

Viewing 100 posts - 1,901 through 2,000 (of 2,171 total)