Halle Winkler

Forum Replies Created

Viewing 100 posts - 1,901 through 2,000 (of 2,171 total)

Advertisement: “NeatSpeech is great-sounding offline speech synthesis, compatible with iOS6.1, and you can even edit pronunciations!”

Author

Posts
November 25, 2012 at 9:17 pm in reply to: Problem switching between OpenEars and RapidEars #13501

Halle Winkler
Politepix

Hello,

It shouldn’t be necessary to have two instances, and it is probably harmful since they may both be accessing the driver and the VAD in a way that is unexpected due to ARC.

Both PocketsphinxController and PocketsphinxController+RapidEars use a PocketsphinxController instance, so you should be able to use a single instance of PocketsphinxController for both, and when you want to listen with RapidEars use RapidEars’ start method of startRealtimeListeningWithLanguageModelAtPath: and when you want to listen without RapidEars use the basic PocketsphinxController startListeningWithLanguageModelAtPath: method. Just be sure that you use the stopListening method for either before you start the other one. I’ve personally used both from the same PocketsphinxController instance in the same session so I would expect it to work.

Let me know if this helps.

November 25, 2012 at 11:09 am in reply to: Need clarification on reducing binary size. #13473

Halle Winkler
Politepix

Now that I’ve had time to double-check, confirming the fact that the “Deployment Postprocessing” build setting hasn’t had a change of name in recent versions of Xcode.

November 22, 2012 at 9:16 pm in reply to: Changing Noise Level for Detecting Speech #13241

Halle Winkler
Politepix

Hi Matt,

There is no built-in way to do this, but you can investigate this approach with using a different audio unit/audio session type in this thread:

https://www.politepix.com/forums/topic/add-mode-options-next-version/

November 22, 2012 at 10:53 am in reply to: Need clarification on reducing binary size. #13236

Halle Winkler
Politepix

Hello,

1) I believe it should still be called Deployment Postprocessing. I would search for that phrase in the search field.

2) Correct, that is what it says in the FAQ as well.

November 20, 2012 at 11:32 pm in reply to: Problems using AudioServicesPlaySystemSound with openEars #13128

Halle Winkler
Politepix

This is due to the audio session settings used by the framework. If you want to sidestep the entire issue, just play the sound with AVAudioPlayer. Otherwise you can take a look at the approaches from me and others in these threads:

https://www.politepix.com/forums/topic/keep-system-sounds-while-listening/
https://www.politepix.com/forums/topic/conflict-with-audiotoolbox/
https://www.politepix.com/forums/topic/pocketsphinx-disables-vibrate/
https://www.politepix.com/forums/topic/simultaneous-mpmovieplayercontroller-video-and-speech-recognition/

November 20, 2012 at 10:40 am in reply to: OpenEars and the main thread #13121

Halle Winkler
Politepix

How about this very simple approach: every time you launch you generate a complete language model, however, every time you generate a phonetic dictionary you add any unique entries to cmu07a.dic (alphabetized). Consequently, first run will be the big time-consuming generation, every subsequent generation will never use the fallback for a name that has ever been generated via the fallback. No core data/synchronization/repeated fallbacks.

November 20, 2012 at 4:00 am in reply to: Cancelling speech synthesis in progress. #13117

Halle Winkler
Politepix

You can look at all of the source for OpenEars and modify it for your app. It’s compiled into a framework for your convenience, but it is all shipped with the distribution as source. Here is another discussion of this question that might be helpful to you: https://www.politepix.com/forums/topic/interrupting-speech/

November 19, 2012 at 10:52 pm in reply to: Recording OpenEars Audio Input to File #13112

Halle Winkler
Politepix

Right, you’ll need at least version 1.2 for plugin compatibility. Can you download current version 1.2.4, diff your customizations and reintegrate them into it? There are a few nice additions and fixes since the version you must be working with, so it’s probably worthwhile.

Generally the version can be found in the OpenEars.xcodeproj source files in the comments at the top, and in a file called Version.txt in the root of the OpenEars directory. Since 1.2 it is also shown in the manual and in OpenEarsLogging.

November 19, 2012 at 6:17 pm in reply to: OpenEars and the main thread #13109

Halle Winkler
Politepix

Yup, the fallback can’t be sped up particularly. It’s actually vastly faster since 1.1 and just needs a fraction of a second now, but as you’ve observed it is a (very) stripped-down form of speech synthesis so there’s no getting around the CPU load and time.

If this were my thing I think I would keep my own Core Data model of the songs on the device along with their already-created pronunciation, and on launching the app I’d do a comparison of the user’s songs and my database of the user’s songs and just add pronunciations for new ones.

November 19, 2012 at 5:47 pm in reply to: OpenEars and the main thread #13107

Halle Winkler
Politepix

BTW, it’s a good idea to do this in any case, because the fallback method is quite a shot in the dark for this kind of application (i.e. an application like band name listing, where spelling something weirdly or cryptically is part of the aesthetic of the field in question), so if you have too much fallback you are just going to have holes in the pronunciation matching and the user might say it correctly without getting a result.

November 19, 2012 at 5:41 pm in reply to: OpenEars and the main thread #13106

Halle Winkler
Politepix

That is my suggestion. Since 1.2 LanguageModelGenerator has been able to take a custom or customized master phonetic dictionary in place of cmu07a.dic, and for this task I think the best approach is to add to cmu07a.dic and keep using it. In order to get the pronunciations to add I would do the following steps (this involves a little hacking to OpenEars and I’m leaving implementation up to you):

1. Source some kind of comprehensive list of the most popular tracks with whatever cutoff for popularity and timeframe makes sense for you. Feed the whole thing into LanguageModelGenerator using the simulator and save everything that triggers the fallback method.

2. Attempt to pick all the low-hanging fruit by trying to identify any compound nouns which consist of words which are actually in the master phonetic dictionary, just not with each other.

3. Do the tedious step with the remaining words of transcribing their phonemes using the same phoneme system found in cmu07a.dic.

4. Add them all to cmu07a.dic.

5. Sort cmu07a.dic alphabetically (very important), use.

With some good choices about your source data you could make a big impact on the processing time. The rest of it might have to be handled via UX methods such as further restricting the number of songs parsed or only using songs which are popular with the user, and informing the user in cases where this could lead to confusion.

Remember that for Deichkind and similar, you aren’t just dealing with a unknown word but also a word that will be pronounced with phonemes which aren’t present in the default acoustic model since it is made from North American speech. There’s nothing like that “ch” sound in English, at least as it is pronounced in Hochdeutsch. If your target market is German-speaking and you expect some German-named bands in there it might be worthwhile to think about doing some acoustic model adaptation to German-accented speech. I just did a blog post about it.

November 19, 2012 at 5:12 pm in reply to: OpenEars and the main thread #13104

Halle Winkler
Politepix

Have you taken the advice from OpenEarsLogging to convert everything to uppercase first? You can do this easily by taking your NSString and messaging [myString uppercaseString]

November 19, 2012 at 1:29 pm in reply to: Logging Time of Word Recognition #13089

Halle Winkler
Politepix

Hi Matt,

You could just give yourself a timestamp using [NSDate date] in whichever RapidEars+OpenEarsEventsObserver delegate method you are taking your recognition results from. If you want it in a particular format you can create an NSDataFormatter as an ivar of your view controller and keep reusing it to get the formatted timestamp:

http://stackoverflow.com/questions/2035201/iphone-nsdateformatter

November 19, 2012 at 11:06 am in reply to: Preloading speech #13087

Halle Winkler
Politepix

In NeatSpeech, ongoing speech is always queued for instant- or near-instant playback — just keep on giving text to the sayWithNeatSpeech: method without waiting for the in-progress speech to complete and it will be processed in the background and queued up for fast-as-possible playback once the current in-progress utterance is complete. Alternately, you can just give a very long chunk of text at the start and NeatSpeech will divide it smartly into process-able pieces, queue them up, and load them into the player as they complete. Are you seeing a different result?

November 19, 2012 at 11:02 am in reply to: Getting Output level for NeatSpeech #13086

Halle Winkler
Politepix

OK, good to know, thanks. In a pinch, from outside of the class you can always check whether speech is in progress by checking the boolean [self.fliteController.audioPlayer isPlaying].

November 19, 2012 at 8:26 am in reply to: Large Number Grammar #13083

Halle Winkler
Politepix

This isn’t a feature of FliteController, but NeatSpeech operates with a queue and it renders the new speech in the background so that it generally starts playing instantly when the previous speech is complete, and it has a male and female UK voice.

November 19, 2012 at 12:19 am in reply to: Large Number Grammar #13081

Halle Winkler
Politepix

To learn about how an acoustic model is adapted you probably want to check out the CMU Sphinx project, since that isn’t something I can support from here beyond pointing you to the docs at the CMU project since it isn’t part of OpenEars: http://cmusphinx.sourceforge.net/wiki/tutorialadapt

The corpus of speech you would want to use in order to adapt to a UK accent for your particular application would have a number of different speakers with the desired UK accents saying the words for which you want more accuracy (I would have them say all of the words in your language model). Basically you will want to make recordings of your speakers saying the words and then you will use the acoustic model adaptation method linked above to integrate their speech into the acoustic model. The result ought to be that your adapted acoustic model will get better at recognizing/distinguishing between those words in the accents you include. The acoustic model you end up with can be used with OpenEars just like the default acoustic model.

November 18, 2012 at 11:06 pm in reply to: Large Number Grammar #13078

Halle Winkler
Politepix

Looks like a good start. There might be an accent bias hurting accuracy since the default acoustic model is comprised of US speech. You might want to adapt the model to a variety of UK accents using your number set as the speech corpus. This may get you some improvement with the thirty/fifty/eighty issue.

November 18, 2012 at 9:24 pm in reply to: Large Number Grammar #13075

Halle Winkler
Politepix

I’ve never thought about this task so this is not coming from a position of experience with it, but if the maximum is (for instance) 999,999 this seems to me that it would need [0-9], a set of tens incrementing by ten going up to “90”, a set of hundreds incrementing by 100 going up to “900”, and a set of thousands incrementing by 1000 going up to “9000”, so a model with a base set of 40 unigrams which have equal probability of being found in a particular bigram or trigram. Out of that you can make 999,999 with the available words “nine hundred”, “ninety” “nine thousand” “nine hundred” “ninety” “nine”. It seems that interpreting this back into digits should be possible to construct a ruleset for since there are only a few variations on correct statement of a number in English. I can also see why you would want a grammar, however, to have a rules-based recognition that you can be more confident about processing backwards into digits.

November 18, 2012 at 4:28 pm in reply to: Large Number Grammar #13073

Halle Winkler
Politepix

Hello,

I’m not aware of a pre-rolled grammar for large numbers, sorry. I generally recommend not using JSGF due to slow performance and what seems like slightly buggy recognition in the engine. Have you tried generating a text corpus of number words and creating your own ARPA language model (like in this blog post: https://www.politepix.com/2012/11/02/openears-tips-1-create-a-language-model-before-runtime-from-a-text-file/)?

November 18, 2012 at 11:34 am in reply to: Recording OpenEars Audio Input to File #13066

Halle Winkler
Politepix

Hello,

Maybe SaveThatWave does what you’d like: https://www.politepix.com/shop/savethatwavedemo. It wraps up starting/stopping non-blocking WAV saving from incoming speech, and notifications that the file was recorded with the path to the recording as part of the notification, handles logic for suspending/resuming and other corner cases, and wraps up deleting a single recording and deleting all recordings. You can use the tutorial tool to integrate it very easily: https://www.politepix.com/openears/tutorial

November 16, 2012 at 2:16 pm in reply to: Getting Output level for NeatSpeech #13011

Halle Winkler
Politepix

This is fixed in OpenEars 1.2.4 out today.

November 15, 2012 at 9:10 am in reply to: Getting Output level for NeatSpeech #12775

Halle Winkler
Politepix

Whoops, this may have been an oversight in version 1.0, I will check it out. If you want to try to make a quick change to OpenEars in the meantime that might help, find this line in the project OpenEars.xcodeproj in the class FliteController:

– (Float32) fliteOutputLevel {
if(self.speechInProgress == TRUE) {

and change it to:

– (Float32) fliteOutputLevel {
if([self.audioPlayer isPlaying] == TRUE) {

Then recompile the OpenEars framework and use in your app. Let me know if that results in audio output being restored.

November 14, 2012 at 8:05 pm in reply to: OpenEars and the main thread #12660

Halle Winkler
Politepix

Yup, I would expect that to take a bit of time. Is there a requirement that they be dynamic, or would this blog post help: https://www.politepix.com/2012/11/02/openears-tips-1-create-a-language-model-before-runtime-from-a-text-file/

My other question is about the kind of multithreading. Have you experimented with doing it with a block (or even a timer) if you’ve been using NSThread? I do a lot of multithreaded coding and I used to use NSThread and I’ve entirely switched over to blocks/GCD because it’s so much cleaner.

November 13, 2012 at 8:56 am in reply to: interrupting speech #12537

Halle Winkler
Politepix

Super, glad it is working.

November 12, 2012 at 8:46 pm in reply to: Error while integrating RapidEarsDemo framework #12513

Halle Winkler
Politepix

Super, glad to hear it!

November 12, 2012 at 8:13 pm in reply to: Error while integrating RapidEarsDemo framework #12508

Halle Winkler
Politepix

You are using an old version of OpenEars, just update to the current OpenEars version.

November 12, 2012 at 9:26 am in reply to: Playing voice out left channel only #12478

Halle Winkler
Politepix

Hi Josh,

I honestly have no idea! I’ve never considered channel-splitting on the new iPads. But, if it is possible to do this at all, it is most likely controlled by the audio session setting, so the good news is that it probably isn’t dependent on NeatSpeech but instead something that could (possibly) be achieved by changing OpenEars’ general audio session settings in AudioSessionManager. “Is is possible to route audio output out of a single channel on a stereo iPad?” might be an interesting question for Stack Overflow.

November 12, 2012 at 8:32 am in reply to: Change pitch without recreating voice obj #12474

Halle Winkler
Politepix

Hello,

In NeatSpeech, voice characteristics have to be set during initialization.

November 10, 2012 at 7:35 pm in reply to: OpenEars and the main thread #12183

Halle Winkler
Politepix

Nope, it doesn’t make any notable continuous use of the disk. Have you completely ruled out something going on in your app? If you move the operations in the sample app to another thread, do you see the same results? I would try to simplify the test case as much as possible until you can replicate the issue with only one code addition.

November 10, 2012 at 2:00 pm in reply to: interrupting speech #12159

Halle Winkler
Politepix

Side note: Great tool. It was easy to install and worked the first time.

Thanks! Always nice to hear. Hmm, this is more something that I’ve thought of as being solvable via UI (for instance, however you signal the user that you are listening, don’t signal that when the app is talking) but I can definitely see the advantage to being able to stop playback programmatically.

This would really not be hard for you to hack in quickly, because you are only trying to call FliteController’s method interruptTalking directly. You’d just need to check whether FliteController exists and if so, call interruptTalking.

If you wanted to bring the complete OpenEars architecture to bear on it so everything is nicely decoupled and standard, you could create a new OpenEarsEventsObserver delegate method (fliteSpeechRequestedToInterruptTalking might be a self-describing name) and look at the various examples of how notifications are sent to OpenEarsEventsObserver throughout the framework to see how to signal it when you want to request that playing speech is interrupted (presumably this would happen as a result of a new method for PocketsphinxController which could be named something like “interruptTalking”). Then you can add the new delegate callback to FliteController since it already inherits OpenEarsEventsObserver and when it is called, call interruptTalking. If/when I add this feature it will look something like that. The advantage is that the parts of the framework can remain ignorant of each other’s status.

November 9, 2012 at 7:27 pm in reply to: Audio driver issues in combination with AirPlay #12078

Halle Winkler
Politepix

Interesting, thank you for the follow-up.

November 9, 2012 at 6:13 pm in reply to: OpenEars and the main thread #12046

Halle Winkler
Politepix

OpenEars expects to be launched on mainThread and to handle its own multithreading, but I wouldn’t automatically expect a problem with launching it on a secondary thread. Certain operations will always be returned to mainThread (notifications sent to OpenEarsEventsObserver and delegate callbacks of OpenEarsEventsObserver will always go to mainThread, I think that audio playing in FliteController must be initiated mainThread IIRC, maybe there are some other examples of things which expect to initiate from or return to mainThread). But, if all you are doing is running LanguageModelGenerator and you are running it on a background thread, my guess is that the issue is in the actual threading, because LanguageModelGenerator works as a series of single-threaded activities so it should be possible to background. Are you using blocks, NSTimer or NSThread?

November 1, 2012 at 10:03 am in reply to: Minimum volume threshold to detect speech #11785

Halle Winkler
Politepix

It’s tricky — having noise cancellation and noise suppression is obviously a good thing, but the VAD in OpenEars is partial to non-noise-suppressed sources. I personally don’t use VoiceProcessingIO because it doesn’t seem to get the same degree of QA as RemoteIO (or it just has a lot more options that need QA attention) and I’ve had it stop working in a couple of minor OS updates on a couple of devices, which is a little bit too much of a needle in the haystack situation for maintaining a framework.

Actually, thinking of the VAD and its issues with noise-suppressed sources, I wonder if any of the command-line options in PocketsphinxRunConfig.h would help you. It might be worth a quick look in there to see if there is anything relevant you can turn on (maybe AGC or dithering or something).

Something I’ve noticed is that as a developer you will tend to try to find a test space with the least ambient noise possible because it’s unproductive to test speech applications in an uncontrolled environment, but real user environments are almost always noisier, so you might not need to worry too much about the corner case of an extremely quiet environment that has a slightly-less-quiet noise in it. I used to have a very similar issue when testing AllEars (which ended up providing the OpenEars code) that a sparrow would visit my balcony and the relatively quiet cheeping would totally ruin recognition, but I haven’t received user reports of similar scenarios.

October 31, 2012 at 10:40 pm in reply to: Minimum volume threshold to detect speech #11780

Halle Winkler
Politepix

Hiya,

I meant not overriding the audio session or audio unit settings, since I think the OpenEars defaults are less likely to zero out low noise buffers, meaning that there will be less of a difference between a low noise buffer and a not-that-much noise buffer, which theoretically might help the VAD to not overreact to not-that-much-noise buffers.

October 31, 2012 at 4:12 pm in reply to: Audio driver issues in combination with AirPlay #11778

Halle Winkler
Politepix

Hi Ferm, check out this thread for an example of code which stops listening after an arbitrary interval — I think you asked about how to do this so maybe it will help:

https://www.politepix.com/forums/topic/kill-voice-recognition-thread

October 31, 2012 at 4:08 pm in reply to: Minimum volume threshold to detect speech #11776

Halle Winkler
Politepix

Hiya,

Sorry, there is no trivial way to do this. You can only attempt to hack ContinuousADModule.mm with the proviso that wrong values forced into the VAD usually cause crashes. You might have better results going with the slightly less-sensitive default settings because the noise reduction might be causing an artificially wide differential between minor noises and noise-supressed quiet.

October 28, 2012 at 12:26 pm in reply to: Speed of speech #11753

Halle Winkler
Politepix

You can just set the property after initializing the voice and FliteController, and before messaging fliteController with say:withVoice:.

self.fliteController.duration_stretch = 0.5;
[self.fliteController say:@”something” withVoice:self.slt];

October 28, 2012 at 11:23 am in reply to: Audio driver issues in combination with AirPlay #11751

Halle Winkler
Politepix

I don’t really think those things are correlated though. This is the first example of this issue and it is happening when using a technology that has a network buffering requirement, so it is probably related to the network buffering.

October 28, 2012 at 11:17 am in reply to: Audio driver issues in combination with AirPlay #11750

Halle Winkler
Politepix

Well, in my opinion that isn’t a bug, it’s expected behavior when moving an app to the front that has its own audio session. Mixing with another session is opt-in behavior for OpenEars but it would be a questionable default behavior since all development would have to proceed from the possibility that there might be other unknown incoming audio while running recognition.

October 26, 2012 at 10:53 am in reply to: TTS say phonemes #11747

Halle Winkler
Politepix

Just wanted to follow up on this issue with the TTS voice quality and mention that there is now a plugin for OpenEars which lets it use better TTS voices which are as fast as the Flite voices but much clearer, and it can process long statements and multiple statements in a row much faster than the Flite voices. It’s called NeatSpeech and you can read more about it here: https://www.politepix.com/neatspeech

October 25, 2012 at 4:03 pm in reply to: Acoustic model issue in app #11739

Halle Winkler
Politepix

Super, glad to hear everything is working for you.

October 25, 2012 at 3:52 pm in reply to: Acoustic model issue in app #11737

Halle Winkler
Politepix

This happens because the Framework folder wasn’t dragged in to the app or because it was dragged in to the app with “Create folder references for any added folders” selected.

October 25, 2012 at 3:51 pm in reply to: Acoustic model issue in app #11736

Halle Winkler
Politepix

No, no issue with ARC, you just need to make sure that the contents of the Framework folder are all really added to your app. From the tutorial:

“Inside your downloaded OpenEars distribution there is a folder called “Frameworks”. Drag that folder into your app project in Xcode. Make absolutely sure that in the add dialog “Create groups for any added folders” is selected and NOT “Create folder references for any added folders” because the wrong setting here will prevent your app from working.”

October 25, 2012 at 3:49 pm in reply to: Acoustic model issue in app #11733

Halle Winkler
Politepix

BTW, please don’t test recognition on the Simulator.

October 25, 2012 at 3:48 pm in reply to: Acoustic model issue in app #11732

Halle Winkler
Politepix

Your acoustic model isn’t in your app:

ERROR: “acmod.c”, line 91: Folder ‘/Users/c5163090/Library/Application Support/iPhone Simulator/5.1/Applications/45AFBC47-26D1-4BA0-A26A-F9722E4BC34C/SaskPower.app’ does not contain acoustic model definition ‘mdef’

October 25, 2012 at 2:12 pm in reply to: Acoustic model issue in app #11713

Halle Winkler
Politepix

Right here: https://www.politepix.com/forums/topic/install-issues-and-their-solutions/

It is also in the documentation and the FAQ if you need more info.

October 25, 2012 at 2:00 pm in reply to: Acoustic model issue in app #11710

Halle Winkler
Politepix

Thank you for confirming with me that the issue is not with the sample app, that would have started a huge round of re-testing.

OK, there are a couple of possibilities. The first possibility is that your version of Xcode is too old or you are compiling with GCC rather than LLVM. The only fix for this is to use a recent Xcode and to use LLVM. The other possibility is that there is a problem with your integration into your app (I think this is more likely) so please just do the regular steps of turning on OpenEarsLogging and turning on verbosePocketSphinx and posting the output here. It’s very likely that your acoustic model simply wasn’t moved into your app and you are getting a crash when speech is attempted to be detected and there is no acoustic model, but this will be shown as an error in OpenEarsLogging so please turn it on.

October 25, 2012 at 1:07 pm in reply to: Acoustic model issue in app #11704

Halle Winkler
Politepix

Hello,

Which version of Xcode are you using?

October 24, 2012 at 4:31 pm in reply to: Compatibility issue with CMU JSGF example #11690

Halle Winkler
Politepix

OK, I will put this down as a bug since it should also be able to parse CMU’s version, but I’m glad it’s working for you.

October 24, 2012 at 3:57 pm in reply to: Compatibility issue with CMU JSGF example #11687

Halle Winkler
Politepix

OK, let me know if the issue persists with your own app.

October 24, 2012 at 3:47 pm in reply to: Compatibility issue with CMU JSGF example #11684

Halle Winkler
Politepix

Does it work if you use a grammar and dictionary combination that doesn’t result in this error from the logs:

ERROR: “fsg_search.c”, line 334: The word ‘QUIDNUNC’ is missing in the dictionary

October 24, 2012 at 3:36 pm in reply to: Compatibility issue with CMU JSGF example #11682

Halle Winkler
Politepix

OK, I will check it out on Friday, but can you post the output of OpenEarsLogging and verbosePocketSphinx here so I can get a head start?

October 22, 2012 at 4:38 pm in reply to: Accuracy with Irish accent in speech recognition #11667

Halle Winkler
Politepix

No problem, the best way to replicate for an accent that is not your own accent is to obtain WAV recordings of speech which should work (i.e. it contains the words in your language model) and run it through – (void) runRecognitionOnWavFileAtPath:(NSString *)wavPath usingLanguageModelAtPath:(NSString *)languageModelPath dictionaryAtPath:(NSString *)dictionaryPath languageModelIsJSGF:(BOOL)languageModelIsJSGF;

This should show very quickly whether the issue is in the language model/dictionary and/or is due to an issue with how the app is being tested.

October 22, 2012 at 4:20 pm in reply to: Accuracy with Irish accent in speech recognition #11665

Halle Winkler
Politepix

Basically, you know you have bad data because there are misspelled and non-English entries in the phonetic dictionary, and a test that really had a 3% accuracy rate has to have been somehow mis-administered or was done on too little data to be meaningful. So if you start to adapt the language model based on this bad data, you will get more bad data out. The first step is getting rid of the issues you already know about and then getting into the accent adaptation once you are seeing tests with believable results (for me this would be better than 40% accuracy rate at least).

October 22, 2012 at 4:12 pm in reply to: Accuracy with Irish accent in speech recognition #11664

Halle Winkler
Politepix

I don’t think you have enough information yet to commit to going down that path since you haven’t replicated the issue in-house with real numbers under known-working conditions, which means you don’t have a way of measuring improvement. But you can adapt the language model to an Irish accent — ask at the CMU board for specifics since it is a question about Pocketsphinx and SphinxTrain. They will also ask for concrete accuracy numbers and directly observed behavior if you ask about accuracy so it’s a good idea to set up your tests locally before getting started with that.

October 22, 2012 at 3:31 pm in reply to: Accuracy with Irish accent in speech recognition #11662

Halle Winkler
Politepix

OK, for future reference it would be helpful to lead with this information, or give it in response to the questions asked about it. It’s been some work for me to find out what accents you are really trying to recognize and what accuracy levels you are seeing. I would not expect great accuracy for Irish accent recognition with the default language model, which is entirely made up of US accents.

This sounds like a subjective report: “An Irish female tester reported that accuracy was as low as 3% when she was holding iPhone as one normally would. When she increased the distance between herself and device, the accuracy got better.”

The reason it sounds like a subjective report is that I doubt she tested 100 times (and if she did, is it 100 repetitions of the same phrase or 100 different phrases?), so 3% is more likely to be a qualitative statement for “recognition was bad in my testing round”. Reasons for this could be diverse — it could have something to do with your UI, it could have something to do with the environment she is testing in, it could have to do with the non-English (MEATH ) and misspelled (VETINARY ) words in your vocabulary which will have pronunciation entries in the dictionary which will never be spoken, it could have to do with her expectations of what can be recognized (most end users don’t realize the vocabulary is limited and/or that saying a lot of extra things that are outside of the vocabulary will affect recognition quality).

The symptom that recognition is worse when she is close to the phone is unlikely to be strictly true since closeness to the phone improves recognition under normal testing, so what is more likely is that the other variables mentioned above changed at the time that she got farther from the phone. I’m sure there is something to it but it is an isolated data point that is unexpected so it needs replication from your side.

I can’t really remote bugfix an issue you are receiving as a remote report — at some point, someone needs to make a first-hand observation of the issue and test it in an organized way and replicate. If something was wrong with her test session (she was saying words that aren’t in the vocabulary, or it was really noisy, or the UI in the app was giving her the impression it was listening at a time that it wasn’t listening) it’s harmful to try to adapt your approach to that limited data.

My recommendation is to obtain some WAV recordings of your speakers saying phrases that should be possible to recognize with your vocabulary and put them through PocketsphinxController to find out what the accuracy levels are for them. It is important to check the words in your OpenEarsDynamicGrammar.dic file that were not found in cmu07a.dic and make sure that the phonetic transcription in there is a real description of how someone would say those words, and to remove any typos, because if you have “VETINARY” and someone says “VETRINARY”, not only will you not recognize “VETRINARY” but it will hurt the recognition of any other words in the statement since you now have an out-of-vocabulary utterance in the middle of the statement.

October 22, 2012 at 2:39 pm in reply to: Accuracy with Irish accent in speech recognition #11658

Halle Winkler
Politepix

Can you be very specific with me about what this means: “Language is English, closest to American dialect I believe, although neither of the speakers are native Americans.” Where are the speakers from and what is their native language and why did you choose them for evaluating accuracy levels for English speech recognition?

The reason I ask is because it’s unusual to say that a non-native speaker has a native dialect or close to one — that’s an exceedingly unusual outcome in language learning. As an example, I speak German as a non-native speaker and the regional accent of German that I speak is probably closest to the Northwestern German pronunciation, but no one from that region would say that I had a Northwestern German dialect because my US accent is at least as strong as a regional German accent in my speech. I would not be a good test subject for evaluating accuracy of German speech recognition.

October 22, 2012 at 2:28 pm in reply to: Accuracy with Irish accent in speech recognition #11657

Halle Winkler
Politepix

My suggestion is to see which words are being put in your OpenEarsDynamicGrammer.dic file which are not present in cmu07a.dic and look at the pronunciations that are listed therein and make sure that they are accurate descriptions of the way those words are pronounced. If they are not, those words will never be successfully recognized.

>At worst, we would like to have at least 60% accuracy (that is, 6 successes from 10 experiments). 80% and higher accuracy would be good enough.

Could you please tell me your current accuracy rate? It is improbable that you will get 80% for non-native speakers.

I would turn on logging (both verboseLanguageModelGenerator and OpenEarsLogging) and see if there are any error or warning messages.

October 22, 2012 at 1:55 pm in reply to: Accuracy with Irish accent in speech recognition #11654

Halle Winkler
Politepix

You can’t use synthesized text for testing recognition, it has to be a real speaker.

I’m confused about the idea of single word phrases — are they single words, or phrases?

Unfortunately you are always going to see reduced accuracy when the speaker has an accent, unfair as it is. What is the accuracy rate you are seeing?

Take a look at the .dic file that is output for the words which aren’t found in the cmu dictionary, because if the fallback method gets the pronunciation wrong, it won’t be recognized correctly.

October 22, 2012 at 1:16 pm in reply to: Accuracy with Irish accent in speech recognition #11652

Halle Winkler
Politepix

Welcome,

Can you tell me a bit more about your application? Language, OpenEars version, which device, which kind of audio recording input, are you using any other media playback objects in your app like AVPlayer or MPMoviePlayerController, what is the accent, gender and age of the parties you are testing with, what is the accuracy rate you are seeing, have you verified that you are never sending any messages to the audio session or AVAudioSession, are the phrases found in the lookup dictionary or not (it’s the file CMU07a.dic that ships with the framework)?

It should work ideally when users are closer to the device so that is a sign that there could be an implementation issue.

October 18, 2012 at 8:07 pm in reply to: Selectively enabling or disabling Rapid Ears #11644

Halle Winkler
Politepix

Welcome,

I would try RapidEars first to see if the accuracy is an issue, but if it is, you can use them both in the same app because they are started with separate start commands. The issue is that you can’t switch between them right in the middle of a recognition session, so you have to stop the recognition of one and then start a new recognition with the other. The only real downside to this is that it is necessary to recalibrate.

October 18, 2012 at 11:39 am in reply to: Pocketsphinx disables vibrate? #11638

Halle Winkler
Politepix

The audio session that is required for an app that plays back and records audio disables system sounds, unfortunately. You can try enabling OpenEars’ AudioSessionManager’s soundMixing property as described in this thread: https://www.politepix.com/forums/topic/flitecontroller-pauses-music/

October 16, 2012 at 5:31 pm in reply to: Limit the number of words to detect #11627

Halle Winkler
Politepix

Are you seeing that in your profiling?

October 16, 2012 at 5:26 pm in reply to: Limit the number of words to detect #11624

Halle Winkler
Politepix

I don’t think the timer will get good results, people speak at different speeds and words have different lengths. How about ignoring anything after the first space in the hypothesis?

October 15, 2012 at 6:57 pm in reply to: Placing code in app delegate instead of view controller #11622

Halle Winkler
Politepix

No problem, this is what I had in mind with the design so I’m happy when it’s used this way.

October 15, 2012 at 4:24 pm in reply to: Placing code in app delegate instead of view controller #11620

Halle Winkler
Politepix

Sure, if you like, or for a more standard architecture you can also put it in a root-level view controller and send commands back to it via delegate callbacks from the child controllers. The only thing I don’t support is putting it in a singleton because that usually goes badly due to all of the multithreading.

October 15, 2012 at 10:51 am in reply to: [Resolved] Does the current SDK support iOS6? #11617

Halle Winkler
Politepix

Hi Greg,

I have the definitive answer to why this is sometimes happening — OpenEars requires that everything added to mainBundle is added at the root level (just because I don’t want to open up the can of worms of either trying to figure out at what level some crucial file is added, or forcing the developer to declare it for every file that the framework uses) and in the “add file” dialog in Xcode, if the option “Create folder references for any added folders” is selected, Xcode will create a folder inside the app’s mainBundle and put the files in there. So the acoustic model files were almost certainly in your app, but they probably couldn’t be found by the framework due to being in their own directory. I have improved the documentation to make sure this is clear and I appreciate your drawing my attention to the fact that it was a pitfall.

October 14, 2012 at 10:31 am in reply to: ConvertInput error in pocketsphinxDidReceiveHypothesis #11615

Halle Winkler
Politepix

That is a little bit unlikely as the underlying cause since OpenEars itself has at least two OpenEarsEventsObservers instantiated internally, then add your first one and you’re up to a minimum of three out of the gate.

Not debating that it is helping the issue you are seeing, just that it is probably ultimately due to something else. I haven’t seen the full logs and I imagine there are nuances to the code that can’t be put across via small snippets but it would be great if you could email me a stripped-down test case. Looking at this report of the same issue, it really looks like the audio session category is being overridden somewhere:

http://stackoverflow.com/questions/5215919/convertinput-using-invalid-anchor-time-error-received-when-recording-on-device

October 13, 2012 at 10:30 am in reply to: Audio driver issues in combination with AirPlay #11608

Halle Winkler
Politepix

>Where in the code would I look at the buffers?

All the buffer code is in the ContinuousAudioUnit module. The buffer callback is at the top of the implementation. Unfortunately it isn’t the easy implementation to get to grips with since it’s in C and has a custom ringbuffer, but I think I would just focus on the callback and the initial configuration in OpenAudioDevice.

You can also try experimenting with the settings in AudioSessionManager and research whether there are any particular pitfalls with certain audio session settings other developers are encountering with AirPlay.

I forgot to ask — are you sending any messages to the shared audio session anywhere? This is a very common cause of issues like this. You can check by doing a case-insensitive search for “audiosession” in your code.

October 13, 2012 at 10:12 am in reply to: ConvertInput error in pocketsphinxDidReceiveHypothesis #11607

Halle Winkler
Politepix

Interesting. We know that there isn’t an inherent conflict between a class inheriting the delegate protocols of OpenEarsEventsObserver and AVAudioPlayer because FliteController inherits the delegate protocols of both. But I don’t think your interpretation is offbase because there is a lot going on there — the recognition audio unit, FliteController’s audio and its callbacks, and your audio and its callbacks.

Do you want to show me some logs featuring OpenEarsLogging and verbosePocketSphinx for the timeframe in which this is occurring? Maybe there are some hints.

Something to just double-check is that a lot of AVAudioPlayer sample code contains unneeded calls to AVAudioSession and a lot of developers who asked me about similar issues found calls to the audio session in their code that were responsible. It might be worth a quick project-wide search for “audiosession” without case matching to make sure that some audio session overrides didn’t sneak in there.

October 13, 2012 at 9:59 am in reply to: Audio driver issues in combination with AirPlay #11606

Halle Winkler
Politepix

I can only speculate at the moment and I’m not in front of the code, but IIRC the audio unit has both an input and an output and the audio session category is also input and output, meaning that the audio unit callback buffer is doing double duty. In Core Audio the callback buffer size is just a request, not a contract, so my suspicion is about whether a backlog of unsent output buffers or a changing buffer size is having an effect on the input buffer size or ability to call back. There is headroom in the ringbuffer for changing input buffer sizes but ultimately it’s a fixed amount.

In this case I would expect more reported errors in the callback, but if I could check this out right now, this is where I would be looking. You can experiment with turning off the output in the audio unit to see if it helps (you basically already know that disabling output helps because using a different output from AirPlay helps, so that is where I’d look).

October 12, 2012 at 5:40 pm in reply to: Audio driver issues in combination with AirPlay #11602

Halle Winkler
Politepix

Probably, but I just did a number of OpenEars updates and this is the only open bug right now and it needs a lot of testing and setup to fix, so I can’t promise I will be able to get to it soon. You’re also welcome to send a patch if you can see the fundamental issue since you are already using the setup on which it is occurring.

October 12, 2012 at 5:33 pm in reply to: Audio driver issues in combination with AirPlay #11600

Halle Winkler
Politepix

That’s what I was suggesting — you should be able to detect programmatically ahead of time when you have a detection loop that isn’t working by checking whether the input level is stuck as it is in your loop above.

October 12, 2012 at 5:04 pm in reply to: Audio driver issues in combination with AirPlay #11598

Halle Winkler
Politepix

This is related to Airplay, it’s the reason there is a buffer underrun. If you read pocketsphinxInputLevel on mainThread it will block.

October 12, 2012 at 4:57 pm in reply to: Audio driver issues in combination with AirPlay #11596

Halle Winkler
Politepix

> Do you mean that the audio recognizer is not started in the case where (1) happens?

I think it’s started but there isn’t enough of a stream to keep up with the buffer looping and the calibrator is then reading zeroes and never getting to the point that it can identify a silence state or a speech state.

October 12, 2012 at 4:55 pm in reply to: Audio driver issues in combination with AirPlay #11595

Halle Winkler
Politepix

I would expect that more than a tenth of a second of a stuck input level is probably a bad loop.

October 12, 2012 at 4:54 pm in reply to: Audio driver issues in combination with AirPlay #11594

Halle Winkler
Politepix

I think that if you want to catch the lack of recognition in progress right now, you should look to see if pocketsphinxInputLevel is just returning zeroes or a fixed number. Remember to check it on a thread other than mainThread.

October 12, 2012 at 4:53 pm in reply to: Audio driver issues in combination with AirPlay #11593

Halle Winkler
Politepix

Ah, I see your issue, you aren’t receiving the error callback during the loop in which you have no recognition. I think that that must be a buffer underrun. I’ll look into it.

October 12, 2012 at 4:23 pm in reply to: Audio driver issues in combination with AirPlay #11588

Halle Winkler
Politepix

The error isn’t really that it stops detecting speech, it is that the audio unit that feeds audio to the recognizer doesn’t start at the top of a particular recognition loop which means that when the engine gets to the audio the engine (most likely the VAD) eventually crashes because it’s expecting information.

So (1) isn’t the bug or the event, it’s just one result of the audio unit being unexpectedly stopped when the loop restarts. I’m entering it as an airplay issue but I can’t give you a timeframe for fixing it.

pocketSphinxContinuousSetupDidFail is there so that you can react to an error state, have you just tried stopping recognition when you receive it so that you don’t keep running the recognizer until it crashes due to not having a calibrated state?

October 12, 2012 at 3:45 pm in reply to: Audio driver issues in combination with AirPlay #11586

Halle Winkler
Politepix

OK, a quick look at the error you’re getting here:

http://www.google.com/search?hl=en&as_q=airplay+AUIOClient_StartIO+failed

Makes me think that this is happening when you don’t have connectivity to the network device at the exact moment that the audio unit needs to start.

I would recommend going to ContinousAudioUnit.mm’s function:

int32 startRecording(PocketsphinxAudioDevice * audioDevice);

And seeing if you can patch in a test for the airplay connection before this line:

OSStatus startAudioUnitOutputStatus = AudioOutputUnitStart(audioDriver->audioUnit);

which is where the attempt is made to start the audio unit. I don’t have an airplay device so for me to fix it it would have to wait until whatever point in the future I have one.

October 12, 2012 at 3:21 pm in reply to: Audio driver issues in combination with AirPlay #11584

Halle Winkler
Politepix

OK, I don’t see the beginning of the OpenEarsLogging logs there, which version of OpenEars is that?

October 12, 2012 at 3:02 pm in reply to: Audio driver issues in combination with AirPlay #11581

Halle Winkler
Politepix

Welcome,

Can you please post your complete logging output with verbosePocketSphinx on and OpenEarsLogging on? I don’t have a way of helping you unless you do and I’ve never received a report of this behavior so I’d like to see everything.

> Secondly if (1) happends I feel I should still be able to programatically stop and then start listening.

Sorry, it isn’t possible to assure any kind of of functionality when there is unknown behavior that might be due to a bug. This is most likely because AirPlay is unknown territory but it might become clearer when you post the logs.

October 9, 2012 at 11:11 am in reply to: Using open Ears Framework in my app #11562

Halle Winkler
Politepix

Hi Vignesh,

Welcome. There is no problem with app acceptance and OpenEars, it has public APIs and only uses public APIs. This is covered in the FAQ but you can also see several well-known apps that use OpenEars to the right of the main OpenEars page at https://www.politepix.com/openears so you don’t have to take my word for it.

Reducing your application size is also covered in the FAQ so give it a read:

https://www.politepix.com/openears/support

OpenEars should only increase your app size by about 7 mb if you are using speech recognition and maybe another 5mb per voice used for TTS.

By the way, the App Store doesn’t limit your app size to 20mb; there are apps that are hundreds of mb in size. But I agree that it is friendly to your users to create the smallest app possible.

October 8, 2012 at 4:50 pm in reply to: Commercial use #11558

Halle Winkler
Politepix

No prob, good luck with your app.

October 8, 2012 at 4:46 pm in reply to: Commercial use #11556

Halle Winkler
Politepix

Here is an example of the CMU license:

http://www.speech.cs.cmu.edu/flite/doc/flite_2.html#SEC2

October 8, 2012 at 4:42 pm in reply to: Commercial use #11555

Halle Winkler
Politepix

Ah, sorry, I over-edited the FAQ recently. OK, I can’t give you permission or advice for things relating to CMU because I am neither part of that program nor a lawyer. However, as another developer who uses their libraries I have the subjective impression that it is sufficient to credit each library and link to their agreement since the terms of their license are analogous to the MIT license. Something like this in your about appears to me as a good-faith attempt to meet CMU’s crediting request and mine:

$MYAPP uses CMU Sphinx, CMU Flite, CMU CLMTK [link here to the CMU agreement] and Politepix’s OpenEars [link here to politepix.com/openears].

October 8, 2012 at 3:54 pm in reply to: Commercial use #11553

Halle Winkler
Politepix

Sure, you can use it commercially for the purposes described in the license.

Powered by Politepix is nice and I would find that fine for my own credit if there was some kind of link so curious parties could find out what that meant. It would still be necessary to credit CMU so take a look at that part of the FAQ: https://www.politepix.com/openears/support

October 5, 2012 at 10:12 am in reply to: [Resolved] Does the current SDK support iOS6? #11535

Halle Winkler
Politepix

Having tested this again, I wanted to follow up that I can’t replicate an issue with adding the Framework folder itself as shown in the tutorial so I don’t think this issue is precisely that it is necessary to import that folder’s contents individually.

It’s likely that re-importing the acoustic model files individually fixes something minor that has gone wrong a bit earlier in the process with the addition of that folder (for instance if the target was accidentally unchecked while adding or it was added as a reference and the reference was somehow problematic or became so later) so I strongly endorse gregquinn’s method described above as a troubleshooting step if you see a crash when listening starts.

October 5, 2012 at 9:55 am in reply to: Recognizing acronyms and numbers #11534

Halle Winkler
Politepix

This is unfortunately something that doesn’t work that wonderfully with speech recognition — accuracy for this application of the functionality is never great. But it shouldn’t be any problem using the dynamic generation to create a language model that recognizes letters and numbers. In what sense is there no success, do you have low accuracy (that is unfortunately to be expected for the specified requirement) or do the language models lack entries for your letters and numbers?

One thing you can try if you are dealing with entire acronyms that are known at the time of language model creation (versus arbitrary combinations of letters) is to use the whole acronym as the corpus or array entry and then edit the phonetic dictionary.

So, instead of having these in the array: @”A”, @”B”, @”C”

You would have this in the array: @”ABC”

And then you would need to edit the phonetic dictionary which is created so that the pronunciation associated with the word “ABC” is the correct phoneme sounds for “A”, “B” and “C” in sequence.

To work with already-created language model and dictionary files instead of making new ones at runtime you can follow these instructions from the docs:

If you need to create a fixed language model ahead of time instead of creating it dynamically in your app, just use this method (or generateLanguageModelFromTextFile:withFilesNamed:) to submit your full language model using the Simulator and then use the Simulator documents folder script to get the language model and dictionary file out of the documents folder and add it to your app bundle, referencing it from there.

October 4, 2012 at 7:30 pm in reply to: [Resolved] Does the current SDK support iOS6? #11532

Halle Winkler
Politepix

No problem, glad it’s working for you.

October 4, 2012 at 7:05 pm in reply to: [Resolved] Does the current SDK support iOS6? #11530

Halle Winkler
Politepix

OK, it doesn’t actually have to be added in any particular way, however what’s important is that the entire Framework folder found at the root of the OpenEars distribution folder is dragged from the Finder into your project. That has to be the original Framework folder with the full contents in it that are found at the time of downloading the distribution. You can either add the contents as a reference or add them by copying, but if you add them as a reference, the folder has to stay where it was at the time the reference is added.

October 4, 2012 at 6:33 pm in reply to: [Resolved] Does the current SDK support iOS6? #11528

Halle Winkler
Politepix

To clarify — are you saying that with [OpenEarsLogging startOpenEarsLogging] and pocketsphinxController.verbosePocketSphinx = TRUE set you receive no logging at all up to the time of the crash?

October 4, 2012 at 6:31 pm in reply to: [Resolved] Does the current SDK support iOS6? #11527

Halle Winkler
Politepix

Hmm, yes it definitely does. That kind of sounds like a missing acoustic model or a missing part of the acoustic model (meaning perhaps something going wrong when dragging the “Framework” folder into the app). Can you run the sample app in iOS6?

October 3, 2012 at 8:37 am in reply to: OpenEars intigartion with camera control #11493

Halle Winkler
Politepix

You’re welcome, good luck with your app.

October 3, 2012 at 8:28 am in reply to: OpenEars intigartion with camera control #11491

Halle Winkler
Politepix

Solving this would be an advanced undertaking that would require you to thoroughly research the iOS audio session and do a lot of self-guided experimentation in order to learn what is needed. Maybe it’s possible but it isn’t something I can walk you through unfortunately.

October 3, 2012 at 8:11 am in reply to: OpenEars intigartion with camera control #11489

Halle Winkler
Politepix

OK, I think the issue is simply that the audio stream is not provided to PocketsphinxController since it is used by the video picker.

October 2, 2012 at 7:24 pm in reply to: Start & Stop Listening #11486

Halle Winkler
Politepix

OK, I don’t actually support simulator-only issues, not because I don’t care at all but because the simulator just hosts the local audio devices of the desktop or laptop, meaning that any individual simulator-only issue can potentially be due to just one particular arrangement of hardware that is on the local machine, meaning that any fix might only fix something that is special to one desktop or laptop version or audio conversion device.

Since the end-user of the app will never encounter the issue, I made the judgement call that it would be better to ignore simulator-only issues than to put heavy debugging time into issues that don’t affect app users. I hope that’s understandable and that you’ll let me know if you see issues with enduser devices.

October 2, 2012 at 7:11 pm in reply to: Start & Stop Listening #11484

Halle Winkler
Politepix

The logging is from the simulator, have you ever seen this on a device? I haven’t seen it in my device testing of iOS6.

October 2, 2012 at 7:05 pm in reply to: Start & Stop Listening #11482

Halle Winkler
Politepix

Hello,

Is this with the unchanged sample app or a version that has changes?
Author

Posts

Viewing 100 posts - 1,901 through 2,000 (of 2,171 total)

← 1 2 3 … 19 20 21 22 →