Multi digit number recognition

Home Forums OpenEars plugins Multi digit number recognition

Viewing 9 posts - 1 through 9 (of 9 total)

  • Author
    Posts
  • #1019021
    montage
    Participant

    I can’t seem to figure out how to get mult-digit numbers to be correctly identified. I have a requirement to be able to recognize numbers from one (1) to thirty-five (35). RapidEars seems to be recognizing one (1) to twenty (20) but when I get to twenty-one or higher it recognizes the individual components. i.e. twenty-one recognized ‘twenty’ and ‘one’. I’ve tried for example adding ‘TWENTYONE’ and ‘TWENTY ONE’ to the language model and both are getting ignored.

    Has anyone successfully implemented recognition for multi-digit numbers?

    #1019026
    Halle Winkler
    Politepix

    Welcome,

    The issue here is one with speech recognition in general. If you have the words twenty and one in your model, there is no difference in phonemes between the user utterance “twenty-one” and “twenty, one” from the perspective of the engine (we can hear some elision in the compound version but that is probably not accounted for much in the general-purpose acoustic model and it isn’t going to be written out differently in the phonetic dictionary). It’s a tossup which version will be recognized and it sounds like there is a bias for the non-compound version.

    Something that could help would be to manually increase the probability of the compound number in your language model. If you show the .arpa contents here I can give some hints about what kind of alterations to make. This is kind of hacky IMO but I’ve had good results with it as an approach to similar puzzles in the past.

    #1019034
    montage
    Participant

    Thanks for the prompt response. Here is the .arpa file. I am currently only using TWENTYONE (21) and TWENTYTWO (22) as test cases. If this works I will need to have the full range from TWENTYONE (21) to THIRTYFIVE (35).

    Beginning of data mark: \data\
    ngram 1=nr # number of 1-grams
    ngram 2=nr # number of 2-grams
    ngram 3=nr # number of 3-grams

    \1-grams:
    p_1 wd_1 bo_wt_1
    \2-grams:
    p_2 wd_1 wd_2 bo_wt_2
    \3-grams:
    p_3 wd_1 wd_2 wd_3

    end of data mark: \end\

    \data\
    ngram 1=29
    ngram 2=54
    ngram 3=27

    \1-grams:
    -98.8186 </s> -1.1303
    -0.3010 <s> 0.0000
    -1.7324 BLOCKED -0.3010
    -1.7324 EIGHT -0.3010
    -1.7324 EIGHTTEEN -0.3010
    -1.7324 ELEVEN -0.3010
    -1.7324 FIFTEEN -0.3010
    -1.7324 FIVE -0.3010
    -1.7324 FOUR -0.3010
    -1.7324 FOURTEEN -0.3010
    -1.7324 GOAL -0.3010
    -1.7324 MISS -0.3010
    -1.7324 NINE -0.3010
    -1.7324 NINETEEN -0.3010
    -1.7324 ONE -0.3010
    -1.7324 SEVEN -0.3010
    -1.7324 SEVENTEEN -0.3010
    -1.7324 SHOT -0.3010
    -1.7324 SIX -0.3010
    -1.7324 SIXTEEN -0.3010
    -1.7324 TEN -0.3010
    -1.7324 THIRTEEN -0.3010
    -1.7324 THIRTY -0.3010
    -1.7324 THREE -0.3010
    -1.7324 TWELVE -0.3010
    -1.7324 TWENTY -0.3010
    -1.7324 TWENTYONE -0.3010
    -1.7324 TWENTYTWO -0.3010
    -1.7324 TWO -0.3010

    \2-grams:
    -1.7324 <s> BLOCKED 0.0000
    -1.7324 <s> EIGHT 0.0000
    -1.7324 <s> EIGHTTEEN 0.0000
    -1.7324 <s> ELEVEN 0.0000
    -1.7324 <s> FIFTEEN 0.0000
    -1.7324 <s> FIVE 0.0000
    -1.7324 <s> FOUR 0.0000
    -1.7324 <s> FOURTEEN 0.0000
    -1.7324 <s> GOAL 0.0000
    -1.7324 <s> MISS 0.0000
    -1.7324 <s> NINE 0.0000
    -1.7324 <s> NINETEEN 0.0000
    -1.7324 <s> ONE 0.0000
    -1.7324 <s> SEVEN 0.0000
    -1.7324 <s> SEVENTEEN 0.0000
    -1.7324 <s> SHOT 0.0000
    -1.7324 <s> SIX 0.0000
    -1.7324 <s> SIXTEEN 0.0000
    -1.7324 <s> TEN 0.0000
    -1.7324 <s> THIRTEEN 0.0000
    -1.7324 <s> THIRTY 0.0000
    -1.7324 <s> THREE 0.0000
    -1.7324 <s> TWELVE 0.0000
    -1.7324 <s> TWENTY 0.0000
    -1.7324 <s> TWENTYONE 0.0000
    -1.7324 <s> TWENTYTWO 0.0000
    -1.7324 <s> TWO 0.0000
    -0.3010 BLOCKED </s> 1.1303
    -0.3010 EIGHT </s> 1.1303
    -0.3010 EIGHTTEEN </s> 1.1303
    -0.3010 ELEVEN </s> 1.1303
    -0.3010 FIFTEEN </s> 1.1303
    -0.3010 FIVE </s> 1.1303
    -0.3010 FOUR </s> 1.1303
    -0.3010 FOURTEEN </s> 1.1303
    -0.3010 GOAL </s> 1.1303
    -0.3010 MISS </s> 1.1303
    -0.3010 NINE </s> 1.1303
    -0.3010 NINETEEN </s> 1.1303
    -0.3010 ONE </s> 1.1303
    -0.3010 SEVEN </s> 1.1303
    -0.3010 SEVENTEEN </s> 1.1303
    -0.3010 SHOT </s> 1.1303
    -0.3010 SIX </s> 1.1303
    -0.3010 SIXTEEN </s> 1.1303
    -0.3010 TEN </s> 1.1303
    -0.3010 THIRTEEN </s> 1.1303
    -0.3010 THIRTY </s> 1.1303
    -0.3010 THREE </s> 1.1303
    -0.3010 TWELVE </s> 1.1303
    -0.3010 TWENTY </s> 1.1303
    -0.3010 TWENTYONE </s> 1.1303
    -0.3010 TWENTYTWO </s> 1.1303
    -0.3010 TWO </s> 1.1303

    \3-grams:
    -0.3010 <s> BLOCKED </s>
    -0.3010 <s> EIGHT </s>
    -0.3010 <s> EIGHTTEEN </s>
    -0.3010 <s> ELEVEN </s>
    -0.3010 <s> FIFTEEN </s>
    -0.3010 <s> FIVE </s>
    -0.3010 <s> FOUR </s>
    -0.3010 <s> FOURTEEN </s>
    -0.3010 <s> GOAL </s>
    -0.3010 <s> MISS </s>
    -0.3010 <s> NINE </s>
    -0.3010 <s> NINETEEN </s>
    -0.3010 <s> ONE </s>
    -0.3010 <s> SEVEN </s>
    -0.3010 <s> SEVENTEEN </s>
    -0.3010 <s> SHOT </s>
    -0.3010 <s> SIX </s>
    -0.3010 <s> SIXTEEN </s>
    -0.3010 <s> TEN </s>
    -0.3010 <s> THIRTEEN </s>
    -0.3010 <s> THIRTY </s>
    -0.3010 <s> THREE </s>
    -0.3010 <s> TWELVE </s>
    -0.3010 <s> TWENTY </s>
    -0.3010 <s> TWENTYONE </s>
    -0.3010 <s> TWENTYTWO </s>
    -0.3010 <s> TWO </s>

    \end\

    #1019036
    Halle Winkler
    Politepix

    OK, so to increase the probability you want to approach zero, which means incrementally adding positive values to your negative values indicated by -0.3010 and -1.7324 just for your 1-grams, 2-grams and 3-grams above TWENTY (just edit them directly in the file referenced above and use that edited .arpa file rather than the .DMP in startListeningWithLanguageModelAtPath:). I would set up a test to try to find the threshold where you stop experiencing the bias towards the individual values by incrementally adding the same small value to all of them (say 0.0100) until they are large enough that the recognition behavior changes to your satisfaction. You can test against the same input by using the pathToTestFile test capability of PocketsphinxController.

    #1019065
    montage
    Participant

    So the edited .arpa file path replaces the language model path in:

    startRealtimeListeningWithLanguageModelAtPath:
    dictionaryAtPath:
    acousticModelAtPath:

    ?

    #1019066
    Halle Winkler
    Politepix

    startRealtimeListeningWithLanguageModelAtPath:

    It replaces the argument you normally pass the DMP file to.

    #1019076
    montage
    Participant

    Interesting, in adjusting the probability values I was able to get TwentyOne properly recognized but I am unable to get TwentyTwo recognized. RapidEars insists on returning two terms when TWENTYTWO is spoken. I’m wondering if there a different way of specifying the dictionary terms that might lend itself to more accuracy.

    #1019089
    Halle Winkler
    Politepix

    There will be some tipping point at which the probability of TWENTYTWO is higher than the two words TWENTY and TWO. Make sure you’re catching all of the places in which TWENTYTWO is referenced in the file — it will appear in the 1-grams, 2-grams and 3-grams. If “TWENTYTWO” never matches that could be due to something wrong with its phonetic dictionary (.dic) entry.

    Silly question — why can’t you make use of the hypothesis “TWENTY TWO” in order to call the method you would call for the hypothesis “TWENTYTWO”?

    #1019122
    montage
    Participant

    That is what I have had to end up doing. Tuning the probability wasn’t giving me the accuracy I required so I ended up putting more effort into the hypothesis processor and it appears to be working better.

Viewing 9 posts - 1 through 9 (of 9 total)
  • You must be logged in to reply to this topic.