February 21, 2013 at 11:03 pm #1015706
I can’t get the program to recognize any of these: Q, QUEUE or CUE.
Can you offer any hints? In a business usage, we will be using Q1, Q2, Q3 and Q4.
The program does recognize “QUARTER”, but we can’t force users to say that.
We’re buying Rejecto next week to help eliminate wrong words, but right now we’re having trouble with the right words.
Thanks!February 21, 2013 at 11:15 pm #1015707
Hmm, single letters that rhyme with other single letters are very challenging for recognition.
Since you already know the number of required quarters, something sneaky you can try is to have Q1 etc be the entire word, that is, instead of trying to recognize the combination of Q and 1, you will have a word “Q1” in your language model and you’ll edit the dictionary used so that the entry for Q1 reads as follows:
Q1 K Y UW W AH N
You’d do this for each quarter. Having the multiple syllable/sound combinations available for distinguishing between the quarters should make them recognizable.February 21, 2013 at 11:17 pm #1015708
Here’s the skinny on making custom language models before runtime:
and editing your phonetic dictionary:February 22, 2013 at 3:25 pm #1015711
Here’s what (and where) I put in the .dic file:
Q.S K Y UW Z
Q1 K Y UW W AH N
Q2 K Y UW T UW
Q3 K Y UW TH R IY
Q4 K Y UW F AO R
QANA K AA N AH
I had to edit the .dic file in Hex since at first Xcode put spaces instead of a tab.
It still doesn’t recognize Q1, Q2, Q3 or Q4.
It comes out “Two One” or “U One” no matter how clearly I speak.
I need both “Two” and “U” (U.S.) in my recognition file.
Any other thoughts? I don’t know what else to do. Would Rejecto help?February 22, 2013 at 3:49 pm #1015712
The issue of doing recognition with several individual words that are only a syllable long and all rhyme with each other is not a satisfactorily-solved issue in speech recognition. This is another variation of the general issue of recognition of the English alphabet, which you can read people trying to find fixes for in every speech-recognition-related resource, unfortunately. There is no contextual cue for which one is the “real one” in the case you’re describing so as soon as there is any distance from the mic, the sounds are going to get mixed up.
The strategy for dealing with it is going to be some combination of removing confusing words from the model and fusing multiple words together that you know will be spoken together.
An example is that you don’t need the loose letter “U” if its presence there is just in order to let “U.S.” be recognized. In that case, make the word “U.S.”:
U.S. Y UW AH S
This will also improve the accuracy of words that are spoken near utterances of “U.S.”.
The next issue I see is that the “Q1” etc segment has a couple of obscure words before and after it, which suggests to me that this is a big language model. Do you have the opportunity to switch between smaller, more contextually-specific language models?
Can you do counting in either its own language model that you switch to, or with some kind of prefix? e.g. “Category 2” instead of just “2”.
The last thing is that you haven’t shown the entry in the language model or the pocketsphinx logging output, so I don’t know for sure whether your alteration is actually in your language model as far as pocketsphinx is concerned. If you remove “U” and “2”, are you able to recognize “Q1”? If not, there might be an issue in the language model in general.
In case you have confirmed that the language model is OK, and none of these approaches are options for you (although they are almost always options for an app that you can make design decisions about), the last possibility is to do it as a JSGF ruleset rather than a statistical ARPA model. Searching this forum for JSGF should help you get started.February 22, 2013 at 3:55 pm #1015713
I also can’t get it to recognize our company name (IMS) which I also added to the .dic file as shown here:
IMRIE IH M ER IY
IMS AY EH M EH S
IMUS AY M AH S
Any suggestions for this one?
Thanks!February 22, 2013 at 3:59 pm #1015714
I’m adding these entries to the cmu07a.dic file and then generating my .dic file by adding Q1,Q2,Q3,Q4 and IMS to the language array and generating it. I’ll download my language file to make sure they ended up there…February 22, 2013 at 4:01 pm #1015715
Yes, step one is definitely making sure that these new words are present in your language model and phonetic dictionary. Also, turn on verbosePocketsphinx so you receive any complaints from pocketsphinx about your language model or dictionary.February 22, 2013 at 4:02 pm #1015716
Also turn on OpenEarsLogging and verboseCMUCLMTK so you get any relevant output from the process of generating the language models.February 22, 2013 at 4:18 pm #1015717
Even though I was generating a new .dic file, it was failing to copy it to the proper folder and still using the old one! Once I discovered that, your suggestions for Q1,Q2,Q3,Q4 and IMS are all working!
Thanks!February 22, 2013 at 4:24 pm #1015718
Love to hear that :) .
- You must be logged in to reply to this topic.