Forum Replies Created
Thanks for the great answer, Joseph. My hope was to create a large JSGF grammar at runtime. It sounds like unless I can dynamically generate the FSG, I’m out of luck.
I tried with the plain implementation and the results were the same. I dropped the grammar down to about 60 words and it’s still much slower and less accurate than my ARPA model with thousands of phrases. Is this typical, or am I likely doing something wrong? Because most of my users will be using the app in an environment without internet access, I won’t be able to do a server-side implementation.
Let me give you an example to better illustrate what I desire. For arguments sake let’s say a valid command would be something in the form: “number POUNDS numberTwo OUNCES” or “numberTwo OUNCES number POUNDS.” I want to reject, therefore, any sample input that doesn’t meet these qualifications in the speech detection level.
So my grammar then was simply: | POUNDS OUNCES | OUNCES POUNDS
otherKeywords contained about 1000 phrases, averaging 3 words per phrase, and was defined to match any number, up to 1000. numberTwo was 0-15.
I’ve made no modifications to the library except the changes suggested in this thead to keep system sounds. Haven’t tried using the grammar with the sample app yet– I will give this a try in a moment.
What I meant was, when the hypothesis is calculated, will it allow for given phrases to be recombined — I figured out that the answer is “only if i want them to be”
I spent the day changing over to JSGF, and I believe that overall, it works significantly worse than the ARPA model. The accuracy is actually worse, and the processing takes 8-10 times as long. Even if the accuracy were perfect, the duration of the processing rules it out for me. Also, when I test it on an iPod Touch 4G, I get a memory warning right about the time pocket sphinx begins calibrating. I already simplified my grammar as much as possible. So now I am going back to ARPA and I am facing the same problem I had when I started this thread: Cleaning up the hypothesis of phrases that weren’t included in the grammar. I don’t know if I’ll be able to complete my project with satisfactory results using post-processing method, but unless you can offer me additional tips, I believe that it may be my last shot at getting this thing to work.
Thanks so much for your help
Thanks for the response Halle. I’m going finish reading the resources you provided, and also spend some time playing with sample JSGF grammars when I get on my development machine. I had a hunch that JSGF would be the way to go, but the added delay involved with stopping and starting the listener may rule it out for me, since my current implementation used 2 distinct ARPA models (about 1200 phrases each) that I frequently switch between. I think i could only make it work with my project if I combined both models into a JSGF grammar thus eliminating the need to switch language models. My dilemma then is deciding whether I could produce more successful results by creating a large JSGF grammar or 2 medium-sized ARPA grammars plus adding some post-processing to the hypothesis. I’m currently working on the latter option, and I think I might be able to process the hypothesis well enough to produce a valid NSString for my purposes. One issue that I see coming up constantly is that the library gets tripped up between similar sounded words — for example it might interpret EIGHTY as EIGHT or vice versa. Would changing the grammar to JSGF help avoid this at all?
Also, it is my understanding that a JSGF grammar would allow multiple language model phrases in a single hypothesis, which would be necessary for my project. Is this the case, or would every hypothesis necessarily have to be strictly specified in the language model?