jeff-kelley

Forum Replies Created

Viewing 6 posts - 1 through 6 (of 6 total)

  • Author
    Posts
  • in reply to: Detecting single letters in the alphabet #3996
    jeff-kelley
    Participant

    I’m still getting a lot of Ks where I should be getting As, but I think I’m going down the right path. Thanks again.

    in reply to: Detecting single letters in the alphabet #3995
    jeff-kelley
    Participant

    Interesting. I’ll try pruning the dictionary of ambiguous pronunciations where possible. I’d like to do more with combinations like “A B”, but the that the characters being read to this application are random, so there won’t really be a pattern to which ones get combined more.

    Thanks for this help; I’ll report back if it’s more successful.

    in reply to: Detecting single letters in the alphabet #3993
    jeff-kelley
    Participant

    With the dictionary/language model here, it’s giving me KB more frequently than AB (I was speaking “A B” each time):

    2011-04-26 13:35:23.315 OpenEarsSampleProject[2328:707] Pocketsphinx calibration has started.
    2011-04-26 13:35:23.368 OpenEarsSampleProject[2328:707] Pocketsphinx calibration is complete.
    2011-04-26 13:35:23.374 OpenEarsSampleProject[2328:707] Pocketsphinx has stopped listening.
    2011-04-26 13:35:23.382 OpenEarsSampleProject[2328:707] Pocketsphinx is starting up.
    2011-04-26 13:35:23.835 OpenEarsSampleProject[2328:707] Pocketsphinx calibration has started.
    2011-04-26 13:35:27.405 OpenEarsSampleProject[2328:707] Pocketsphinx calibration is complete.
    2011-04-26 13:35:27.418 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:35:30.269 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:35:32.267 OpenEarsSampleProject[2328:707] The received hypothesis is U THREE with a score of -495 and an ID of 000000000
    2011-04-26 13:35:32.328 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:35:38.896 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:35:40.254 OpenEarsSampleProject[2328:707] The received hypothesis is A B with a score of -13365 and an ID of 000000001
    2011-04-26 13:35:40.344 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:35:42.953 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:35:45.211 OpenEarsSampleProject[2328:707] The received hypothesis is J V with a score of -16556 and an ID of 000000002
    2011-04-26 13:35:45.284 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:35:47.264 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:35:49.059 OpenEarsSampleProject[2328:707] The received hypothesis is K P with a score of -15090 and an ID of 000000003
    2011-04-26 13:35:49.115 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:35:52.527 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:35:54.067 OpenEarsSampleProject[2328:707] The received hypothesis is K B with a score of -14514 and an ID of 000000004
    2011-04-26 13:35:54.136 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:35:55.405 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:35:57.607 OpenEarsSampleProject[2328:707] The received hypothesis is K V with a score of -25266 and an ID of 000000005
    2011-04-26 13:35:57.661 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:35:59.633 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:36:01.581 OpenEarsSampleProject[2328:707] The received hypothesis is A B with a score of -12390 and an ID of 000000006
    2011-04-26 13:36:01.654 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:36:03.892 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:36:05.588 OpenEarsSampleProject[2328:707] The received hypothesis is K B with a score of -6112 and an ID of 000000007
    2011-04-26 13:36:05.978 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:36:08.074 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:36:09.750 OpenEarsSampleProject[2328:707] The received hypothesis is K B O with a score of -37412 and an ID of 000000008
    2011-04-26 13:36:09.828 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:36:12.141 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:36:13.764 OpenEarsSampleProject[2328:707] The received hypothesis is A B with a score of -16145 and an ID of 000000009
    2011-04-26 13:36:13.821 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    2011-04-26 13:36:16.829 OpenEarsSampleProject[2328:707] Pocketsphinx has detected speech.
    2011-04-26 13:36:18.718 OpenEarsSampleProject[2328:707] The received hypothesis is K B with a score of -18317 and an ID of 000000010
    2011-04-26 13:36:18.790 OpenEarsSampleProject[2328:707] Pocketsphinx is now listening.
    in reply to: Detecting single letters in the alphabet #3992
    jeff-kelley
    Participant

    Sure, here’s the dictionary:

    A	AH
    A(2)	EY
    B	B IY
    C	S IY
    D	D IY
    E	IY
    EIGHT	EY T
    F	EH F
    FIVE	F AY V
    FOUR	F AO R
    G	JH IY
    H	EY CH
    I	AY
    J	JH EY
    K	K EY
    L	EH L
    M	EH M
    N	EH N
    NINE	N AY N
    O	OW
    ONE	W AH N
    ONE(2)	HH W AH N
    P	P IY
    Q	K Y UW
    R	AA R
    S	EH S
    SEVEN	S EH V AH N
    SIX	S IH K S
    T	T IY
    TEN	T EH N
    THREE	TH R IY
    TWO	T UW
    U	Y UW
    V	V IY
    W	D AH B AH L Y UW
    X	EH K S
    Y	W AY
    Z	Z IY
    ZERO	Z IH R OW
    ZERO(2)	Z IY R OW

    EIGHT Y is accurate, I don’t have the transcription from before, though. I’ll try to get it to go again and post back here.

    in reply to: Detecting single letters in the alphabet #3990
    jeff-kelley
    Participant

    Sure. We used the lmtool available on CMU’s website with this corpus:

    A
    B
    C
    D
    E
    F
    G
    H
    I
    J
    K
    L
    M
    N
    O
    P
    Q
    R
    S
    T
    U
    V
    W
    X
    Y
    Z
    
    zero
    one
    two
    three
    four
    five
    six
    seven
    eight
    nine
    ten

    We got this language model:

    Language model created by QuickLM on Tue Apr 26 11:48:40 EDT 2011
    Copyright (c) 1996-2010 Carnegie Mellon University and Alexander I. Rudnicky
    
    The model is in standard ARPA format, designed by Doug Paul while he was at MITRE.
    
    The code that was used to produce this language model is available in Open Source.
    Please visit http://www.speech.cs.cmu.edu/tools/ for more information
    
    The (fixed) discount mass is 0.5. The backoffs are computed using the ratio method.
    This model based on a corpus of 37 sentences and 39 words
    
    data
    ngram 1=39
    ngram 2=74
    ngram 3=37
    
    1-grams:
    -0.7782 </s> -0.3010
    -0.7782 <s> -0.2218
    -2.3464 A -0.2218
    -2.3464 B -0.2218
    -2.3464 C -0.2218
    -2.3464 D -0.2218
    -2.3464 E -0.2218
    -2.3464 EIGHT -0.2218
    -2.3464 F -0.2218
    -2.3464 FIVE -0.2218
    -2.3464 FOUR -0.2218
    -2.3464 G -0.2218
    -2.3464 H -0.2218
    -2.3464 I -0.2218
    -2.3464 J -0.2218
    -2.3464 K -0.2218
    -2.3464 L -0.2218
    -2.3464 M -0.2218
    -2.3464 N -0.2218
    -2.3464 NINE -0.2218
    -2.3464 O -0.2218
    -2.3464 ONE -0.2218
    -2.3464 P -0.2218
    -2.3464 Q -0.2218
    -2.3464 R -0.2218
    -2.3464 S -0.2218
    -2.3464 SEVEN -0.2218
    -2.3464 SIX -0.2218
    -2.3464 T -0.2218
    -2.3464 TEN -0.2218
    -2.3464 THREE -0.2218
    -2.3464 TWO -0.2218
    -2.3464 U -0.2218
    -2.3464 V -0.2218
    -2.3464 W -0.2218
    -2.3464 X -0.2218
    -2.3464 Y -0.2218
    -2.3464 Z -0.2218
    -2.3464 ZERO -0.2218
    
    2-grams:
    -1.8692 <s> A 0.0000
    -1.8692 <s> B 0.0000
    -1.8692 <s> C 0.0000
    -1.8692 <s> D 0.0000
    -1.8692 <s> E 0.0000
    -1.8692 <s> EIGHT 0.0000
    -1.8692 <s> F 0.0000
    -1.8692 <s> FIVE 0.0000
    -1.8692 <s> FOUR 0.0000
    -1.8692 <s> G 0.0000
    -1.8692 <s> H 0.0000
    -1.8692 <s> I 0.0000
    -1.8692 <s> J 0.0000
    -1.8692 <s> K 0.0000
    -1.8692 <s> L 0.0000
    -1.8692 <s> M 0.0000
    -1.8692 <s> N 0.0000
    -1.8692 <s> NINE 0.0000
    -1.8692 <s> O 0.0000
    -1.8692 <s> ONE 0.0000
    -1.8692 <s> P 0.0000
    -1.8692 <s> Q 0.0000
    -1.8692 <s> R 0.0000
    -1.8692 <s> S 0.0000
    -1.8692 <s> SEVEN 0.0000
    -1.8692 <s> SIX 0.0000
    -1.8692 <s> T 0.0000
    -1.8692 <s> TEN 0.0000
    -1.8692 <s> THREE 0.0000
    -1.8692 <s> TWO 0.0000
    -1.8692 <s> U 0.0000
    -1.8692 <s> V 0.0000
    -1.8692 <s> W 0.0000
    -1.8692 <s> X 0.0000
    -1.8692 <s> Y 0.0000
    -1.8692 <s> Z 0.0000
    -1.8692 <s> ZERO 0.0000
    -0.3010 A </s> -0.3010
    -0.3010 B </s> -0.3010
    -0.3010 C </s> -0.3010
    -0.3010 D </s> -0.3010
    -0.3010 E </s> -0.3010
    -0.3010 EIGHT </s> -0.3010
    -0.3010 F </s> -0.3010
    -0.3010 FIVE </s> -0.3010
    -0.3010 FOUR </s> -0.3010
    -0.3010 G </s> -0.3010
    -0.3010 H </s> -0.3010
    -0.3010 I </s> -0.3010
    -0.3010 J </s> -0.3010
    -0.3010 K </s> -0.3010
    -0.3010 L </s> -0.3010
    -0.3010 M </s> -0.3010
    -0.3010 N </s> -0.3010
    -0.3010 NINE </s> -0.3010
    -0.3010 O </s> -0.3010
    -0.3010 ONE </s> -0.3010
    -0.3010 P </s> -0.3010
    -0.3010 Q </s> -0.3010
    -0.3010 R </s> -0.3010
    -0.3010 S </s> -0.3010
    -0.3010 SEVEN </s> -0.3010
    -0.3010 SIX </s> -0.3010
    -0.3010 T </s> -0.3010
    -0.3010 TEN </s> -0.3010
    -0.3010 THREE </s> -0.3010
    -0.3010 TWO </s> -0.3010
    -0.3010 U </s> -0.3010
    -0.3010 V </s> -0.3010
    -0.3010 W </s> -0.3010
    -0.3010 X </s> -0.3010
    -0.3010 Y </s> -0.3010
    -0.3010 Z </s> -0.3010
    -0.3010 ZERO </s> -0.3010
    
    3-grams:
    -0.3010 <s> A </s>
    -0.3010 <s> B </s>
    -0.3010 <s> C </s>
    -0.3010 <s> D </s>
    -0.3010 <s> E </s>
    -0.3010 <s> EIGHT </s>
    -0.3010 <s> F </s>
    -0.3010 <s> FIVE </s>
    -0.3010 <s> FOUR </s>
    -0.3010 <s> G </s>
    -0.3010 <s> H </s>
    -0.3010 <s> I </s>
    -0.3010 <s> J </s>
    -0.3010 <s> K </s>
    -0.3010 <s> L </s>
    -0.3010 <s> M </s>
    -0.3010 <s> N </s>
    -0.3010 <s> NINE </s>
    -0.3010 <s> O </s>
    -0.3010 <s> ONE </s>
    -0.3010 <s> P </s>
    -0.3010 <s> Q </s>
    -0.3010 <s> R </s>
    -0.3010 <s> S </s>
    -0.3010 <s> SEVEN </s>
    -0.3010 <s> SIX </s>
    -0.3010 <s> T </s>
    -0.3010 <s> TEN </s>
    -0.3010 <s> THREE </s>
    -0.3010 <s> TWO </s>
    -0.3010 <s> U </s>
    -0.3010 <s> V </s>
    -0.3010 <s> W </s>
    -0.3010 <s> X </s>
    -0.3010 <s> Y </s>
    -0.3010 <s> Z </s>
    -0.3010 <s> ZERO </s>
    
    end

    The trouble is that it’s just not accurate enough distinguishing letters. I’m very new at using OpenEars/PocketSphinx, so really I just don’t know how to approach improving accuracy.

    in reply to: Detecting single letters in the alphabet #3988
    jeff-kelley
    Participant

    The goal is for the user to read letters and numbers to be recognized by the app.

Viewing 6 posts - 1 through 6 (of 6 total)