Won't Recognize Q, CUE, or QUEUE

Home Forums OpenEars Won't Recognize Q, CUE, or QUEUE

Tagged: ,

Viewing 11 posts - 1 through 11 (of 11 total)

  • Author
    Posts
  • #1015706
    giebler
    Participant

    I can’t get the program to recognize any of these: Q, QUEUE or CUE.

    Can you offer any hints? In a business usage, we will be using Q1, Q2, Q3 and Q4.

    The program does recognize “QUARTER”, but we can’t force users to say that.

    We’re buying Rejecto next week to help eliminate wrong words, but right now we’re having trouble with the right words.

    Thanks!

    #1015707
    Halle Winkler
    Politepix

    Hmm, single letters that rhyme with other single letters are very challenging for recognition.

    Since you already know the number of required quarters, something sneaky you can try is to have Q1 etc be the entire word, that is, instead of trying to recognize the combination of Q and 1, you will have a word “Q1” in your language model and you’ll edit the dictionary used so that the entry for Q1 reads as follows:

    Q1   K Y UW W AH N
    

    You’d do this for each quarter. Having the multiple syllable/sound combinations available for distinguishing between the quarters should make them recognizable.

    #1015708
    Halle Winkler
    Politepix
    #1015711
    giebler
    Participant

    Here’s what (and where) I put in the .dic file:

    Q.S K Y UW Z
    Q1 K Y UW W AH N
    Q2 K Y UW T UW
    Q3 K Y UW TH R IY
    Q4 K Y UW F AO R
    QANA K AA N AH

    I had to edit the .dic file in Hex since at first Xcode put spaces instead of a tab.

    It still doesn’t recognize Q1, Q2, Q3 or Q4.

    It comes out “Two One” or “U One” no matter how clearly I speak.

    I need both “Two” and “U” (U.S.) in my recognition file.

    Any other thoughts? I don’t know what else to do. Would Rejecto help?

    #1015712
    Halle Winkler
    Politepix

    The issue of doing recognition with several individual words that are only a syllable long and all rhyme with each other is not a satisfactorily-solved issue in speech recognition. This is another variation of the general issue of recognition of the English alphabet, which you can read people trying to find fixes for in every speech-recognition-related resource, unfortunately. There is no contextual cue for which one is the “real one” in the case you’re describing so as soon as there is any distance from the mic, the sounds are going to get mixed up.

    The strategy for dealing with it is going to be some combination of removing confusing words from the model and fusing multiple words together that you know will be spoken together.

    An example is that you don’t need the loose letter “U” if its presence there is just in order to let “U.S.” be recognized. In that case, make the word “U.S.”:

    U.S. Y UW AH S

    This will also improve the accuracy of words that are spoken near utterances of “U.S.”.

    The next issue I see is that the “Q1” etc segment has a couple of obscure words before and after it, which suggests to me that this is a big language model. Do you have the opportunity to switch between smaller, more contextually-specific language models?

    Can you do counting in either its own language model that you switch to, or with some kind of prefix? e.g. “Category 2” instead of just “2”.

    The last thing is that you haven’t shown the entry in the language model or the pocketsphinx logging output, so I don’t know for sure whether your alteration is actually in your language model as far as pocketsphinx is concerned. If you remove “U” and “2”, are you able to recognize “Q1”? If not, there might be an issue in the language model in general.

    In case you have confirmed that the language model is OK, and none of these approaches are options for you (although they are almost always options for an app that you can make design decisions about), the last possibility is to do it as a JSGF ruleset rather than a statistical ARPA model. Searching this forum for JSGF should help you get started.

    #1015713
    giebler
    Participant

    I also can’t get it to recognize our company name (IMS) which I also added to the .dic file as shown here:

    IMRIE IH M ER IY
    IMS AY EH M EH S
    IMUS AY M AH S

    Any suggestions for this one?

    Thanks!

    #1015714
    giebler
    Participant

    I’m adding these entries to the cmu07a.dic file and then generating my .dic file by adding Q1,Q2,Q3,Q4 and IMS to the language array and generating it. I’ll download my language file to make sure they ended up there…

    #1015715
    Halle Winkler
    Politepix

    Yes, step one is definitely making sure that these new words are present in your language model and phonetic dictionary. Also, turn on verbosePocketsphinx so you receive any complaints from pocketsphinx about your language model or dictionary.

    #1015716
    Halle Winkler
    Politepix

    Also turn on OpenEarsLogging and verboseCMUCLMTK so you get any relevant output from the process of generating the language models.

    #1015717
    giebler
    Participant

    Even though I was generating a new .dic file, it was failing to copy it to the proper folder and still using the old one! Once I discovered that, your suggestions for Q1,Q2,Q3,Q4 and IMS are all working!

    Thanks!

    #1015718
    Halle Winkler
    Politepix

    Love to hear that :) .

Viewing 11 posts - 1 through 11 (of 11 total)
  • You must be logged in to reply to this topic.