| Author | Posts |
|---|---|
| Author | Posts |
| July 27, 2011 at 6:11 pm #7401 | |
|
darbienapp |
I downloaded the sampleApp and tested it on my iPhone device and the program worked perfectly. I do have one question though : If I want the recognizer to only allow { MONDAY | TUESDAY} and not allow repeated word should I switch to JSGF? It seems the sampleapp accepts instances of “MONDAY MONDAY” or “TUESDAY MONDAY”. If that’s the case where can I find resources on using JSGF? I would like to learn how to create JSGF gram and dic file. Thanks! |
| July 27, 2011 at 7:22 pm #7402 | |
|
Halle |
Welcome, Correct, it should be possible to construct a JSGF grammar that will require an utterance within those parameters. Here is the Sphinx project JSGF page: http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/jsgf/JSGFGrammar.html And here is the JSGF spec: I don’t think that the feature of dynamically importing other grammars is supported in OpenEars (or at least I haven’t tested it personally and I know of one user who had trouble with it) but I believe that every other feature that is supported by Pocketsphinx is supported by OpenEars. You can also download OpenEars 0.902 in order to see a known-working grammar, but keep in mind when trying that out that the OpenEars API changed a bit in the newer versions so you can’t drag and drop the implementation code out of that version. |
| July 27, 2011 at 10:40 pm #7403 | |
|
darbienapp |
Thanks Halle, I will check out the page try to figure out how to get JSGF working. Another questions I have is for the PocketsphinxController (or OpenEarsEventsObserver), is there a way I can limit time allowed for user input? For example, in my app to allow user 10 seconds to say something before prompting them again? I suppose I can setup a timer loop and let it timeout if no pocketsphinxDidDetectSpeech is called, just wondering if there’s a cleaner way of doing it. |
| July 27, 2011 at 10:58 pm #7404 | |
|
Halle |
There’s no API method for limiting the time, but I think if it were my task I’d probably start an NSTimer in pocketsphinxDidStartListening that is cancelled in pocketsphinxDidReceiveHypothesis if it is still valid at that time. |
| July 28, 2011 at 8:28 am #7405 | |
|
darbienapp |
ok, I tried the JSGF with a simple grammar : and specified the .gram and .dic file in the method “startListeningWithLanguageModelAtPath” and it worked like a charm! However, couple of issues : - The score that is returned in “pocketsphinxDidReceiveHypothesis” seems very wierd. Sometimes when it correctly detected something it would get a score of 0, and at other times when I just say some gibberish and it return a very high score. Is there a good way of obtaining how certain is the detection results? |
| July 28, 2011 at 10:39 am #7406 | |
|
Halle |
Correct, there’s no dynamic switching or generation for JSGF. I’m not sure if it’s ever going to be added. Is it possible that you overlooked that the ‘high score’ is actually a negative number? I don’t know off the top of my head if the scoring works correctly for JSGF, but the usual scale is that the highest probability of correctness is zero and lower probability is represented by a negative number. |
| July 29, 2011 at 11:40 am #7416 | |
|
dave |
Hello, I’ve been doing some experimentation with a JSGF grammar and have had much success. I’ve dynamically created a JSGF grammar just by building up an NSString and writing this to a file which is then referenced instead of the file provided. However the main issue I’m having is responsiveness – I’m getting around 2-3 seconds response time which isn’t bad, but not ideal for a real-time recognition system that updates the UI. I’ve been doing user research around our prototype and every user has commented on the length of time it takes to get feedback. We’re using an iPad 2. Of these 2-3 seconds I think the break down is as follows: 1 second silence detection. Halle, have you any suggestions on how to reduce this processing time? From my work using the other model supported, its possible to reduce the silence detection time, however is this supported with JSGF? It doesn’t seem to be at the moment. Thank you so much again for all your responsiveness and support for this project, we are all greatly indebted to you. |
| July 29, 2011 at 12:47 pm #7417 | |
|
Halle |
Hi Dave, Which version of OpenEars are you using? And can you give me an idea of the size of the grammar? 1-2 seconds processing time seems like quite a bit on an iPad 2. |
| July 29, 2011 at 3:59 pm #7418 | |
|
dave |
Hi Halle, I’m using the latest version (0.912) and my grammar only has around 50 words. Thanks again. |
| July 29, 2011 at 4:18 pm #7419 | |
|
Halle |
Hi Dave, No problem. I’m very surprised to hear that JSGF is that much slower than ARPA. First thing is that there should be nothing preventing shortening of the detected silence length. Are you changing it in kSecondsOfSilenceToDetect in OpenEarsConfig.h or somewhere else? Try a clean all/delete DerivedData and rebuild if the change in silence setting doesn’t take. Are your rules particularly complex? What OS version is it? IIRC there is one incremental version of iOS that results in slow recognition on the iPad. I don’t have a lot of ideas here, but two things that could help: • Test it with compile for thumb both checked and unchecked and see if you get a better result. This will have no effect on accuracy. //#define kDS @”2″ // -ds Frame GMM computation downsampling ratio, defaults to 1 This will reduce accuracy (which you may or may not notice in your standard usage) but it should improve response time. |
| July 30, 2011 at 6:38 am #7422 | |
|
darbienapp |
Halle/Dave, This is an interesting discussion, I just noticed that I’m seeing about the same response time for version 0.912 (around 2 sec processing time) as well. I’m using iPhone 3GS and the ARPA language model only has two choices : YES and NO. I will try out the KDS parameter to see if it helps the processing time. Thanks for the info! |
| July 30, 2011 at 7:12 am #7423 | |
|
Halle |
Your ARPA model is taking 2 seconds, or your JSGF is taking 2 seconds and it is based on an ARPA model that small? |
| July 30, 2011 at 7:12 pm #7424 | |
|
darbienapp |
Halle, I’m only using ARPA files (no gram file). The files I’m using are the “.dic” and the “.languagemodel” file. The corpus is composed of just “YES” and “NO”. I did try the options you suggested. For my iPhone I didn’t notice any significant change in the compile for thumb option. However changing the downsample ratio seems to decrease recognition time by about 1 second when set to 2. Not sure if it changes the accuracy much, I suppose for the simple model I’m using it’s probably okay. |
| July 30, 2011 at 8:51 pm #7425 | |
|
Halle |
Would you mind outputting the exact processing time (without uncommenting -ds) by uncommenting OPENEARSLOGGING and showing the time difference in the console from: - (void) pocketsphinxDidDetectFinishedSpeech to - (void) pocketsphinxDidReceiveHypothesis: ? I see processing times that are a fraction of that on slower phones with larger language models and it might be a sign of a configuration issue, something blocking processing, or a bug. I agree with you that -ds shouldn’t hurt you much if you are only listening for two different-sounding words, although my understanding is that generally accuracy is better with somewhat larger models. I’d be slightly curious if the extreme smallness of the model might be slowing recognition down for some reason. |
| August 2, 2011 at 6:09 pm #7428 | |
|
darbienapp |
Halle, Actually, by coincidence couple of days ago I was experimenting with different things and what I noticed was that in the sample project if I commented out the TTS processing in pocketsphinxDidReceiveHypothesis (the call to [flitecontroller say:...]), then the response time is also significantly reduced to well less than a sec. Here’s the response time I measured is how long between it says detectFinishSpeech to the display of “YOU said”. So I think you are right on that the time between pocketsphinxDidDetectFinishedSpeech to pocketsphinxDidReceiveHypothesis may actually be very small. I think I was wrong to base the response on how long does it take to hear the TTS response back, which can involve some TTS processing time. Nontheless I think I will try it out tonight with the default down sampling ratio to see if that makes any difference. On a side note, for the IVR application I’m building I plan to be using pre-recorded prompt (either wav or mp3) so I suspect the processing time for this TTS will be eliminated. Do you have any recommendation on how should I go about implement the (prompt-recognition) loop? I’m deciding whether to user AVaudioplayer or OpenAL, the reason is that some of the prompt are concatenated together and I’m not sure if there are delay issues with using AVaudioplayer this way. |
| August 2, 2011 at 6:18 pm #7429 | |
|
Halle |
OK, I’m glad to hear that! Given that this is within my expectations for how long an ARPA model should take to process, I would not recommend that you take the accuracy hit by using -ds but instead just decrease kSecondsOfSilenceToDetect by whatever amount will make it feel responsive enough to you/your users. If they are only saying yes or no you don’t have to hang around and wait very long before evaluating their speech, it’s not like when they are saying a sentence and you have to distinguish between pauses, commas, and being finished. I strongly recommend using AVAudioPlayer because it is 100% happy to operate side by side with OpenEars and I don’t know if that is likely with OpenAL. If you are concerned about AVAudioPlayer latency, use prepareToPlay to prime it at some time in advance. |
| August 3, 2011 at 7:33 pm #7452 | |
|
darbienapp |
Halle, Sorry to take up so much bandwidth on this thread, but I did some more experimentation and it seems that using “prepareToPlay” does not really improve the delay. This is especially for the case when I’m dynamically playing the prerecorded mp3 files together : for example “three” + “hundred” + “and” + “seventy” + “four”. The gaps between each file is around 1 sec no matter when I call the “prepareToPlay” beforehand or just when each of them is about to be played. I did some goggling and it looks like quite a lot of people have similar problem as well so some people just resort to using OpenAL. Some of them suggested using wav format instead but it seems to make little difference. However, I think there are some workaround I can explore : 1. Concatenate mp3 files together dynamically. and load the big file during run time. I’m not sure how to even approach this since I’m new to Objetive-C, I suppose a temporary file |
| August 3, 2011 at 8:01 pm #7453 | |
|
Halle |
Sorry, I don’t think I have any great advice for you because this is fairly advanced however you slice it. FliteController creates a virtual wav in memory, which I’m sure would also be a possibility for you, but it is not at all easy to get working. I guess I might recommend that you try to take some time to understand it, though, since if you can use the same mechanism you can pretty much be guaranteed of the fact that you won’t end up with a conflict. I have no idea if OpenAL will play nice, but my gut says it’s probably going to cause at least minor issues. |
| August 3, 2011 at 10:33 pm #7456 | |
|
Halle |
Actually, maybe an easier approach for you would be to use Extended Audio File Services and just try to write out your entire audio file to disk and then play it back. That will also be a drop-in approach for AVAudioPlayer that should cause no conflicts. There’s some audio file code in version 0.902 of OpenEars in the audio driver that you can take a look at. |
| August 5, 2011 at 7:27 pm #7463 | |
|
darbienapp |
Halle, The approach I end up doing was creating a virtual wav file (NSMutableData) similiar like what you have in flite controller, and append each wave file data to the end. Finally I have to modify the number of bytes field in the main wav header. I found that was the easier approach and it seems to work as expected. It was really helpful for me that the flite controller is very well commented and easy to follow, good job by the way! |
| August 5, 2011 at 8:56 pm #7464 | |
|
Halle |
Very cool! I’m glad that was a good approach for you. |
You must be logged in to reply to this topic.

OpenEars
Our Flying Friends