Detecting single letters in the alphabet

Home Forums OpenEars Detecting single letters in the alphabet

Viewing 16 posts - 1 through 16 (of 16 total)

  • Author
    Posts
  • #3986
    jeff-kelley
    Participant

    I’m trying to implement an app that reads letters and numbers. I’ve had some inaccuracies for certain letter combinations—“AB” is consistently heard as ”8Y,” for instance, and am wondering if there are any configuration options that might help. I’ve had some success replacing letters with equivalents: ‘b’ with “bee,” etc. I have had much more success using the NATO alphabet (alpha, beta, etc.), but we can’t expect our users to be able to use it. So… what’s the best way to approach single letters? Thanks in advance.

    #3987
    Halle Winkler
    Politepix

    Does the app read letters and numbers or recognize them in the user’s speech?

    #3988
    jeff-kelley
    Participant

    The goal is for the user to read letters and numbers to be recognized by the app.

    #3989
    Halle Winkler
    Politepix

    Would it be possible for you to show me your language model?

    #3990
    jeff-kelley
    Participant

    Sure. We used the lmtool available on CMU’s website with this corpus:

    A
    B
    C
    D
    E
    F
    G
    H
    I
    J
    K
    L
    M
    N
    O
    P
    Q
    R
    S
    T
    U
    V
    W
    X
    Y
    Z
    
    zero
    one
    two
    three
    four
    five
    six
    seven
    eight
    nine
    ten

    We got this language model:

    Language model created by QuickLM on Tue Apr 26 11:48:40 EDT 2011
    Copyright (c) 1996-2010 Carnegie Mellon University and Alexander I. Rudnicky
    
    The model is in standard ARPA format, designed by Doug Paul while he was at MITRE.
    
    The code that was used to produce this language model is available in Open Source.
    Please visit http://www.speech.cs.cmu.edu/tools/ for more information
    
    The (fixed) discount mass is 0.5. The backoffs are computed using the ratio method.
    This model based on a corpus of 37 sentences and 39 words
    
    data
    ngram 1=39
    ngram 2=74
    ngram 3=37
    
    1-grams:
    -0.7782 </s> -0.3010
    -0.7782 <s> -0.2218
    -2.3464 A -0.2218
    -2.3464 B -0.2218
    -2.3464 C -0.2218
    -2.3464 D -0.2218
    -2.3464 E -0.2218
    -2.3464 EIGHT -0.2218
    -2.3464 F -0.2218
    -2.3464 FIVE -0.2218
    -2.3464 FOUR -0.2218
    -2.3464 G -0.2218
    -2.3464 H -0.2218
    -2.3464 I -0.2218
    -2.3464 J -0.2218
    -2.3464 K -0.2218
    -2.3464 L -0.2218
    -2.3464 M -0.2218
    -2.3464 N -0.2218
    -2.3464 NINE -0.2218
    -2.3464 O -0.2218
    -2.3464 ONE -0.2218
    -2.3464 P -0.2218
    -2.3464 Q -0.2218
    -2.3464 R -0.2218
    -2.3464 S -0.2218
    -2.3464 SEVEN -0.2218
    -2.3464 SIX -0.2218
    -2.3464 T -0.2218
    -2.3464 TEN -0.2218
    -2.3464 THREE -0.2218
    -2.3464 TWO -0.2218
    -2.3464 U -0.2218
    -2.3464 V -0.2218
    -2.3464 W -0.2218
    -2.3464 X -0.2218
    -2.3464 Y -0.2218
    -2.3464 Z -0.2218
    -2.3464 ZERO -0.2218
    
    2-grams:
    -1.8692 <s> A 0.0000
    -1.8692 <s> B 0.0000
    -1.8692 <s> C 0.0000
    -1.8692 <s> D 0.0000
    -1.8692 <s> E 0.0000
    -1.8692 <s> EIGHT 0.0000
    -1.8692 <s> F 0.0000
    -1.8692 <s> FIVE 0.0000
    -1.8692 <s> FOUR 0.0000
    -1.8692 <s> G 0.0000
    -1.8692 <s> H 0.0000
    -1.8692 <s> I 0.0000
    -1.8692 <s> J 0.0000
    -1.8692 <s> K 0.0000
    -1.8692 <s> L 0.0000
    -1.8692 <s> M 0.0000
    -1.8692 <s> N 0.0000
    -1.8692 <s> NINE 0.0000
    -1.8692 <s> O 0.0000
    -1.8692 <s> ONE 0.0000
    -1.8692 <s> P 0.0000
    -1.8692 <s> Q 0.0000
    -1.8692 <s> R 0.0000
    -1.8692 <s> S 0.0000
    -1.8692 <s> SEVEN 0.0000
    -1.8692 <s> SIX 0.0000
    -1.8692 <s> T 0.0000
    -1.8692 <s> TEN 0.0000
    -1.8692 <s> THREE 0.0000
    -1.8692 <s> TWO 0.0000
    -1.8692 <s> U 0.0000
    -1.8692 <s> V 0.0000
    -1.8692 <s> W 0.0000
    -1.8692 <s> X 0.0000
    -1.8692 <s> Y 0.0000
    -1.8692 <s> Z 0.0000
    -1.8692 <s> ZERO 0.0000
    -0.3010 A </s> -0.3010
    -0.3010 B </s> -0.3010
    -0.3010 C </s> -0.3010
    -0.3010 D </s> -0.3010
    -0.3010 E </s> -0.3010
    -0.3010 EIGHT </s> -0.3010
    -0.3010 F </s> -0.3010
    -0.3010 FIVE </s> -0.3010
    -0.3010 FOUR </s> -0.3010
    -0.3010 G </s> -0.3010
    -0.3010 H </s> -0.3010
    -0.3010 I </s> -0.3010
    -0.3010 J </s> -0.3010
    -0.3010 K </s> -0.3010
    -0.3010 L </s> -0.3010
    -0.3010 M </s> -0.3010
    -0.3010 N </s> -0.3010
    -0.3010 NINE </s> -0.3010
    -0.3010 O </s> -0.3010
    -0.3010 ONE </s> -0.3010
    -0.3010 P </s> -0.3010
    -0.3010 Q </s> -0.3010
    -0.3010 R </s> -0.3010
    -0.3010 S </s> -0.3010
    -0.3010 SEVEN </s> -0.3010
    -0.3010 SIX </s> -0.3010
    -0.3010 T </s> -0.3010
    -0.3010 TEN </s> -0.3010
    -0.3010 THREE </s> -0.3010
    -0.3010 TWO </s> -0.3010
    -0.3010 U </s> -0.3010
    -0.3010 V </s> -0.3010
    -0.3010 W </s> -0.3010
    -0.3010 X </s> -0.3010
    -0.3010 Y </s> -0.3010
    -0.3010 Z </s> -0.3010
    -0.3010 ZERO </s> -0.3010
    
    3-grams:
    -0.3010 <s> A </s>
    -0.3010 <s> B </s>
    -0.3010 <s> C </s>
    -0.3010 <s> D </s>
    -0.3010 <s> E </s>
    -0.3010 <s> EIGHT </s>
    -0.3010 <s> F </s>
    -0.3010 <s> FIVE </s>
    -0.3010 <s> FOUR </s>
    -0.3010 <s> G </s>
    -0.3010 <s> H </s>
    -0.3010 <s> I </s>
    -0.3010 <s> J </s>
    -0.3010 <s> K </s>
    -0.3010 <s> L </s>
    -0.3010 <s> M </s>
    -0.3010 <s> N </s>
    -0.3010 <s> NINE </s>
    -0.3010 <s> O </s>
    -0.3010 <s> ONE </s>
    -0.3010 <s> P </s>
    -0.3010 <s> Q </s>
    -0.3010 <s> R </s>
    -0.3010 <s> S </s>
    -0.3010 <s> SEVEN </s>
    -0.3010 <s> SIX </s>
    -0.3010 <s> T </s>
    -0.3010 <s> TEN </s>
    -0.3010 <s> THREE </s>
    -0.3010 <s> TWO </s>
    -0.3010 <s> U </s>
    -0.3010 <s> V </s>
    -0.3010 <s> W </s>
    -0.3010 <s> X </s>
    -0.3010 <s> Y </s>
    -0.3010 <s> Z </s>
    -0.3010 <s> ZERO </s>
    
    end

    The trouble is that it’s just not accurate enough distinguishing letters. I’m very new at using OpenEars/PocketSphinx, so really I just don’t know how to approach improving accuracy.

    #3991
    Halle Winkler
    Politepix

    Can I also see the dictionary? I’m surprised to hear that it is recognizing EIGHT Y for A B; the EIGHT isn’t surprising but the Y is. Is EIGHT Y an accurate transcription of what Pocketsphinx heard? What is the hypothesis (verbatim)?

    #3992
    jeff-kelley
    Participant

    Sure, here’s the dictionary:

    A	AH
    A(2)	EY
    B	B IY
    C	S IY
    D	D IY
    E	IY
    EIGHT	EY T
    F	EH F
    FIVE	F AY V
    FOUR	F AO R
    G	JH IY
    H	EY CH
    I	AY
    J	JH EY
    K	K EY
    L	EH L
    M	EH M
    N	EH N
    NINE	N AY N
    O	OW
    ONE	W AH N
    ONE(2)	HH W AH N
    P	P IY
    Q	K Y UW
    R	AA R
    S	EH S
    SEVEN	S EH V AH N
    SIX	S IH K S
    T	T IY
    TEN	T EH N
    THREE	TH R IY
    TWO	T UW
    U	Y UW
    V	V IY
    W	D AH B AH L Y UW
    X	EH K S
    Y	W AY
    Z	Z IY
    ZERO	Z IH R OW
    ZERO(2)	Z IY R OW

    EIGHT Y is accurate, I don’t have the transcription from before, though. I’ll try to get it to go again and post back here.

    #3993
    jeff-kelley
    Participant

    With the dictionary/language model here, it’s giving me KB more frequently than AB (I was speaking “A B” each time):

    2011-04-26 13:35:23.315 OpenEarsSampleProject[2328:707] Pocketsphinx calibration has started.
    2011-04-26 13:35:23.368 OpenEarsSampleProject[2328:707] Pocketsphinx calibration is complete.
    2011-04-26 13:35:23.374 OpenEarsSampleProject[2328:707] Pocketsphinx has stopped listening.
    2011-04-26 13:35:23.382 OpenEarsSampleProject[2328:707] Pocketsphinx is starting up.
    2011-04-26 13:35:23.835 OpenEarsSampleProject[2328:707] Pocketsphinx calibration has started.
    2011-04-26 13:35:27.405 OpenEarsSampleProject[2328:707] Pocketsphinx calibration is complete.
    2011-04-26 13:35:27.418 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:35:30.269 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:35:32.267 OpenEarsSampleProject[2328:707] The received hypothesis is U THREE with a score of -495 and an ID of 000000000
    2011-04-26 13:35:32.328 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:35:38.896 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:35:40.254 OpenEarsSampleProject[2328:707] The received hypothesis is A B with a score of -13365 and an ID of 000000001
    2011-04-26 13:35:40.344 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:35:42.953 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:35:45.211 OpenEarsSampleProject[2328:707] The received hypothesis is J V with a score of -16556 and an ID of 000000002
    2011-04-26 13:35:45.284 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:35:47.264 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:35:49.059 OpenEarsSampleProject[2328:707] The received hypothesis is K P with a score of -15090 and an ID of 000000003
    2011-04-26 13:35:49.115 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:35:52.527 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:35:54.067 OpenEarsSampleProject[2328:707] The received hypothesis is K B with a score of -14514 and an ID of 000000004
    2011-04-26 13:35:54.136 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:35:55.405 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:35:57.607 OpenEarsSampleProject[2328:707] The received hypothesis is K V with a score of -25266 and an ID of 000000005
    2011-04-26 13:35:57.661 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:35:59.633 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:36:01.581 OpenEarsSampleProject[2328:707] The received hypothesis is A B with a score of -12390 and an ID of 000000006
    2011-04-26 13:36:01.654 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:36:03.892 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:36:05.588 OpenEarsSampleProject[2328:707] The received hypothesis is K B with a score of -6112 and an ID of 000000007
    2011-04-26 13:36:05.978 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:36:08.074 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:36:09.750 OpenEarsSampleProject[2328:707] The received hypothesis is K B O with a score of -37412 and an ID of 000000008
    2011-04-26 13:36:09.828 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:36:12.141 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:36:13.764 OpenEarsSampleProject[2328:707] The received hypothesis is A B with a score of -16145 and an ID of 000000009
    2011-04-26 13:36:13.821 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:36:16.829 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:36:18.718 OpenEarsSampleProject[2328:707] The received hypothesis is K B with a score of -18317 and an ID of 000000010
    2011-04-26 13:36:18.790 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    #3994
    Halle Winkler
    Politepix

    OK, I think the first easy step is to get rid of the pronunciations that are in the dictionary that you definitely don’t want to recognize. I realize this isn’t at all self-evident so I’ll explain briefly. If you look at this from the dictionary:

    A AH
    A(2) EY

    That means that the language model tool gave you back two possible pronunciations for the word A. The first one is the particular NA pronunciation of the article “a” as in “a dog barked” that rhymes with “huh”. Since you don’t ever want to recognize that pronunciation of “a” because the alphabet character is never pronounced that way, you should erase that pronunciation from your dictionary.

    The (2) in parentheses just means that it is the second pronunciation of the word, so the way you would want to replace

    A AH
    A(2) EY

    is with the line

    A EY

    deleting the first pronunciation, and removing the (2) from the second pronunciation since it is now the only pronunciation you are going to accept.

    The next thing that you can do is to make the sentence “A B” part of your corpus. The corpus can have individual words, but it can also contain combinations of words. Combinations of words that you have made part of your corpus will have an automatically higher probability of being detected.

    So, the corpus would say something like this:

    A
    B
    A B
    C
    D
    E
    F
    G
    H
    I
    J
    K
    L
    M
    N
    O
    P
    Q
    R
    S
    T
    U
    V
    W
    X
    Y
    Z
    ONE
    TWO
    THREE
    FOUR
    FIVE
    SIX
    SEVEN
    EIGHT
    NINE
    ZERO

    You can do this for all of the possible combinations if you want to, or just the ones where you want to raise their probability of being detected. When you look at the language model that is output, you will see that there is a 2-gram entry for A B and that it has a raised probability.

    #3995
    jeff-kelley
    Participant

    Interesting. I’ll try pruning the dictionary of ambiguous pronunciations where possible. I’d like to do more with combinations like “A B”, but the that the characters being read to this application are random, so there won’t really be a pattern to which ones get combined more.

    Thanks for this help; I’ll report back if it’s more successful.

    #3996
    jeff-kelley
    Participant

    I’m still getting a lot of Ks where I should be getting As, but I think I’m going down the right path. Thanks again.

    #3997
    Halle Winkler
    Politepix

    Just as an update on this, I’ve been gradually learning that individual letters/syllables are a challenging case and expectations for accuracy should probably be lower than for whole word or phrase recognition.

    #7496

    Letter recognition can only be done in conjunction with a spelling application. In other words, if you have a list of street names, spelling

    W O O D W A R D

    will work, if you use an n-best list and search through the dictionary for the results. As long as your task can be constrained by a dictionary, even a huge dictionary, you’re OK. You’ll have to patch OpenEars for N-best output, though, and build an FSG or LM. The LM will work better if you build it from your dictionary.

    If your letter sequences truly are random, you’re dealing with something that’s beyond the state of the art. It’s beyond the state of the art for human listeners, too. Give it a try, read some random letter sequences to people and see how many they get wrong.

    Is this something you’re still working on?

    #9916
    alexl
    Participant

    I am working on a similar project and having the same issues. As a matter of fact, recognition results are very poor. The biggest problems I am having with letters E, D, P, A. Below is the dictionary file that I compiled. In many cases it recognizes E as P, D, C. Letter A recognizes as 8 or H.

    What techniques would you recommend for improving accuracy? Doesn’t more combinations for the same letter help?

    0 Z IY R OW
    1 W AH N
    2 T UW
    3 TH R IY
    4 F OW R
    5 F AY V
    6 S IH K S
    7 S EH V AH N
    8 EY T
    9 N AY N
    A AH
    A(2) EY
    B B IY
    C S IY
    D D IY
    E IY
    E(2) EH
    E(3) IH
    F EH F
    G JH IY
    H EY CH
    H(2) EY JH
    I AY
    J JH EY
    K K EY
    L EH L
    M EH M
    M(2) AE M
    N EH N
    N(2) AE N
    O OW
    P P IY
    Q K Y UW
    R AA R
    S EH S
    S(2) AE S
    T T IY
    U Y UW
    V V IY
    W D AH B AH L Y UW
    X EH K S
    X(2) AE K S
    Y W AY
    Z Z IY

    #9960
    Halle Winkler
    Politepix

    This is not a good application of the library, unfortunately.

    #10584
    ader
    Participant

    “As long as your task can be constrained by a dictionary”

    Joseph, is there a way we can do this? e.g. only recognise “sentences” e.g. “w o o d w a r d” and not individual words e.g. “w”

    I’m attempting to create such a spelling app

Viewing 16 posts - 1 through 16 (of 16 total)
  • You must be logged in to reply to this topic.