is it difficult to recognize kids’ voice?

Home Forums OpenEars is it difficult to recognize kids’ voice?

Viewing 2 posts - 1 through 2 (of 2 total)

  • Author
    Posts
  • #1032805

    casual1001
    Participant

    Hi Support team,

    I recorded a voice from a child who only said a word “Danny”. I put the language model with the array “a/apple/alligator/astronaut/danny”. But Openears always recognized it as “a”. The voice of the child is very clearly. Isn’t the engine of Openears good at kids’ voice?

    The logs were as below:

    Thanks
    Casual1001

    2019-04-09 11:39:30.636023+0800 myapp[6281:1497205] Listening.
    2019-04-09 11:39:30.637106+0800 myapp[6281:1497205] Project has these words or phrases in its dictionary:
    a
    a(2)
    alligator
    ant
    apple
    astronaut
    danny
    2019-04-09 11:39:30.637193+0800 myapp[6281:1497205] Recognition loop has started
    2019-04-09 11:39:30.671645+0800 myapp[6281:1496948] Local callback: Pocketsphinx is now listening.
    2019-04-09 11:39:30.671741+0800 myapp[6281:1496948] Local callback: Pocketsphinx started.
    2019-04-09 11:39:30.886866+0800 myapp[6281:1497205] Speech detected…
    2019-04-09 11:39:30.887020+0800 myapp[6281:1496948] Local callback: Pocketsphinx has detected speech.
    2019-04-09 11:39:30.887399+0800 myapp[6281:1497205] Pocketsphinx heard “” with a score of (-1063) and an utterance ID of 0.
    2019-04-09 11:39:30.887455+0800 myapp[6281:1497205] Hypothesis was null so we aren’t returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController’s property returnNullHypotheses to TRUE before starting OEPocketsphinxController.
    2019-04-09 11:39:30.983813+0800 myapp[6281:1497014] Pocketsphinx heard “” with a score of (-2542) and an utterance ID of 1.
    2019-04-09 11:39:30.983911+0800 myapp[6281:1497014] Hypothesis was null so we aren’t returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController’s property returnNullHypotheses to TRUE before starting OEPocketsphinxController.
    2019-04-09 11:39:31.137978+0800 myapp[6281:1497014] Pocketsphinx heard “” with a score of (-4256) and an utterance ID of 2.
    2019-04-09 11:39:31.138265+0800 myapp[6281:1497014] Hypothesis was null so we aren’t returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController’s property returnNullHypotheses to TRUE before starting OEPocketsphinxController.
    2019-04-09 11:39:31.262036+0800 myapp[6281:1497020] Pocketsphinx heard “” with a score of (-5036) and an utterance ID of 3.
    2019-04-09 11:39:31.262143+0800 myapp[6281:1497020] Hypothesis was null so we aren’t returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController’s property returnNullHypotheses to TRUE before starting OEPocketsphinxController.
    2019-04-09 11:39:31.365322+0800 myapp[6281:1497020] Pocketsphinx heard “” with a score of (-6393) and an utterance ID of 4.
    2019-04-09 11:39:31.365799+0800 myapp[6281:1497020] Hypothesis was null so we aren’t returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController’s property returnNullHypotheses to TRUE before starting OEPocketsphinxController.
    2019-04-09 11:39:31.536734+0800 myapp[6281:1497014] Pocketsphinx heard “” with a score of (-7830) and an utterance ID of 5.
    2019-04-09 11:39:31.536907+0800 myapp[6281:1497014] Hypothesis was null so we aren’t returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController’s property returnNullHypotheses to TRUE before starting OEPocketsphinxController.
    2019-04-09 11:39:31.639908+0800 myapp[6281:1497014] Pocketsphinx heard “” with a score of (-9355) and an utterance ID of 6.
    2019-04-09 11:39:31.640088+0800 myapp[6281:1497014] Hypothesis was null so we aren’t returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController’s property returnNullHypotheses to TRUE before starting OEPocketsphinxController.
    2019-04-09 11:39:31.769522+0800 myapp[6281:1497014] Pocketsphinx heard “” with a score of (-10886) and an utterance ID of 7.
    2019-04-09 11:39:31.770096+0800 myapp[6281:1497014] Hypothesis was null so we aren’t returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController’s property returnNullHypotheses to TRUE before starting OEPocketsphinxController.
    2019-04-09 11:39:31.879908+0800 myapp[6281:1497014] End of speech detected…
    2019-04-09 11:39:31.881080+0800 myapp[6281:1496948] Local callback: Pocketsphinx has detected a second of silence, concluding an utterance.
    INFO: cmn_prior.c(131): cmn_prior_update: from < 39.08 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
    INFO: cmn_prior.c(149): cmn_prior_update: to < 32.80 -7.68 -3.56 3.97 3.77 8.48 5.28 5.29 7.50 6.09 2.89 0.05 0.30 >
    INFO: ngram_search_fwdtree.c(1553): 1057 words recognized (10/fr)
    INFO: ngram_search_fwdtree.c(1555): 7288 senones evaluated (67/fr)
    INFO: ngram_search_fwdtree.c(1559): 2422 channels searched (22/fr), 520 1st, 1383 last
    INFO: ngram_search_fwdtree.c(1562): 1112 words for which last channels evaluated (10/fr)
    INFO: ngram_search_fwdtree.c(1564): 7 candidate words for entering last phone (0/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 0.07 CPU 0.065 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 1.12 wall 1.036 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 5 words
    INFO: ngram_search_fwdflat.c(948): 1009 words recognized (9/fr)
    INFO: ngram_search_fwdflat.c(950): 6241 senones evaluated (58/fr)
    INFO: ngram_search_fwdflat.c(952): 2781 channels searched (25/fr)
    INFO: ngram_search_fwdflat.c(954): 1104 words searched (10/fr)
    INFO: ngram_search_fwdflat.c(957): 224 word transitions (2/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.01 CPU 0.014 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.02 wall 0.019 xRT
    INFO: ngram_search.c(1290): lattice start node <s>.0 end node </s>.73
    INFO: ngram_search.c(1320): Eliminated 6 nodes before end node
    INFO: ngram_search.c(1445): Lattice has 599 nodes, 3149 links
    INFO: ps_lattice.c(1380): Bestpath score: -8239
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:73:106) = 1349611
    INFO: ps_lattice.c(1441): Joint P(O,S) = 1294913 P(S|O) = -54698
    INFO: ngram_search.c(901): bestpath 0.01 CPU 0.005 xRT
    INFO: ngram_search.c(904): bestpath 0.01 wall 0.012 xRT
    2019-04-09 11:39:31.916035+0800 myapp[6281:1497014] Pocketsphinx heard “a” with a score of (25514) and an utterance ID of 8.
    2019-04-09 11:39:31.916536+0800 myapp[6281:1496948] rapid finish: a score:25514
    2019-04-09 11:39:32.488264+0800 myapp[6281:1496948] Local callback: Pocketsphinx has suspended recognition.
    2019-04-09 11:39:36.889735+0800 myapp[6281:1496948] StartClass
    INFO: pocketsphinx.c(145): Parsed model-specific feature parameters from /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/feat.params
    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -allphone
    -allphone_ci no no
    -alpha 0.97 9.700000e-01
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 40
    -compallsen no no
    -debug 0
    -dict /var/mobile/Containers/Data/Application/2A0905A0-BD45-4783-9F4B-25B1FFB68CE3/Library/Caches/WordListOpenEarsDynamicLanguageModel0.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/noisedict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/feat.params
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle
    -input_endian little little
    -jsgf
    -keyphrase
    -kws
    -kws_delay 10 10
    -kws_plp 1e-1 1.000000e-01
    -kws_threshold 1 1.000000e+00
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lifter 0 22
    -lm /var/mobile/Containers/Data/Application/2A0905A0-BD45-4783-9F4B-25B1FFB68CE3/Library/Caches/WordListOpenEarsDynamicLanguageModel0.DMP
    -lmctl
    -lmname
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.300000e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf 30000 30000
    -maxwpf -1 -1
    -mdef /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/mdef
    -mean /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/means
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 25
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-10 1.000000e-10
    -pl_pip 1.0 1.000000e+00
    -pl_weight 3.0 3.000000e+00
    -pl_window 5 5
    -rawlogdir
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec 0-12/13-25/26-38
    -tmat /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/transition_matrices
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy dct
    -unit_area yes yes
    -upperf 6855.4976 6.800000e+03
    -uw 1.0 1.000000e+00
    -vad_postspeech 50 50
    -vad_prespeech 20 10
    -vad_startspeech 10 10
    -vad_threshold 2.0 2.000000e+00
    -var /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/variances
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02

    INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’current’, VARNORM=’no’, AGC=’none’
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: acmod.c(164): Using subvector specification 0-12/13-25/26-38
    INFO: mdef.c(518): Reading model definition: /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/mdef
    INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
    INFO: bin_mdef.c(336): Reading binary model definition: /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/mdef
    INFO: bin_mdef.c(516): 46 CI-phone, 168344 CD-phone, 3 emitstate/phone, 138 CI-sen, 6138 Sen, 32881 Sen-Seq
    INFO: tmat.c(206): Reading HMM transition probability matrices: /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/transition_matrices
    INFO: acmod.c(117): Attempting to use PTM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: ptm_mgau.c(805): Number of codebooks doesn’t match number of ciphones, doesn’t look like PTM: 1 != 46
    INFO: acmod.c(119): Attempting to use semi-continuous computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: s2_semi_mgau.c(904): Loading senones from dump file /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/sendump
    INFO: s2_semi_mgau.c(928): BEGIN FILE FORMAT DESCRIPTION
    INFO: s2_semi_mgau.c(991): Rows: 512, Columns: 6138
    INFO: s2_semi_mgau.c(1023): Using memory-mapped I/O for senones
    INFO: s2_semi_mgau.c(1294): Maximum top-N: 4 Top-N beams: 0 0 0
    INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 4112 * 32 bytes (128 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: /var/mobile/Containers/Data/Application/2A0905A0-BD45-4783-9F4B-25B1FFB68CE3/Library/Caches/WordListOpenEarsDynamicLanguageModel0.dic
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(336): 7 words read
    INFO: dict.c(358): Reading filler dictionary: /var/containers/Bundle/Application/7F3A3C95-817E-4EC3-A10E-A1218709513A/myapp.app/AcousticModelEnglish.bundle/noisedict
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(361): 9 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 46^3 * 2 bytes (190 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 51152 bytes (49 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 51152 bytes (49 KiB) for single-phone word triphones
    INFO: ngram_model_trie.c(424): Trying to read LM in bin format
    INFO: ngram_model_trie.c(457): Header doesn’t match
    INFO: ngram_model_trie.c(180): Trying to read LM in arpa format
    INFO: ngram_model_trie.c(71): No \data\ mark in LM file
    INFO: ngram_model_trie.c(537): Trying to read LM in DMP format
    INFO: ngram_model_trie.c(632): ngrams 1=8, 2=12, 3=6
    INFO: lm_trie.c(317): Training quantizer
    INFO: lm_trie.c(323): Building LM trie
    INFO: ngram_search_fwdtree.c(99): 5 unique initial diphones
    INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 12 single-phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 12 single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 144
    INFO: ngram_search_fwdtree.c(339): after: 5 root, 16 non-root channels, 11 single-phone words
    INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
    INFO: cmn_prior.c(149): cmn_prior_update: to < 44.17 -2.49 -2.09 -11.27 -15.94 4.16 9.40 2.00 -2.78 -4.84 -5.22 -6.34 -7.29 >
    INFO: ngram_search_fwdtree.c(1553): 2959 words recognized (10/fr)
    INFO: ngram_search_fwdtree.c(1555): 20367 senones evaluated (72/fr)
    INFO: ngram_search_fwdtree.c(1559): 6693 channels searched (23/fr), 1400 1st, 3132 last
    INFO: ngram_search_fwdtree.c(1562): 2998 words for which last channels evaluated (10/fr)
    INFO: ngram_search_fwdtree.c(1564): 97 candidate words for entering last phone (0/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 0.03 CPU 0.010 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 0.06 wall 0.021 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 4 words
    INFO: ngram_search_fwdflat.c(948): 2616 words recognized (9/fr)
    INFO: ngram_search_fwdflat.c(950): 6943 senones evaluated (24/fr)
    INFO: ngram_search_fwdflat.c(952): 2684 channels searched (9/fr)
    INFO: ngram_search_fwdflat.c(954): 2684 words searched (9/fr)
    INFO: ngram_search_fwdflat.c(957): 191 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.01 CPU 0.005 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.01 wall 0.005 xRT
    INFO: ngram_search.c(1290): lattice start node <s>.0 end node </s>.253
    INFO: ngram_search.c(1320): Eliminated 4 nodes before end node
    INFO: ngram_search.c(1445): Lattice has 1031 nodes, 7971 links
    INFO: ps_lattice.c(1380): Bestpath score: -32704
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:253:282) = 152232
    INFO: ps_lattice.c(1441): Joint P(O,S) = 52032 P(S|O) = -100200
    INFO: ngram_search.c(901): bestpath 0.01 CPU 0.004 xRT
    INFO: ngram_search.c(904): bestpath 0.01 wall 0.005 xRT
    2019-04-09 11:39:37.016792+0800 myapp[6281:1496948] Pocketsphinx heard “a” with a score of (-100200) and an utterance ID of 9.
    INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 0.03 CPU 0.010 xRT
    INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 0.06 wall 0.021 xRT
    INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.01 CPU 0.005 xRT
    INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.01 wall 0.005 xRT
    INFO: ngram_search.c(308): TOTAL bestpath 0.01 CPU 0.004 xRT
    INFO: ngram_search.c(311): TOTAL bestpath 0.01 wall 0.005 xRT

    #1032806

    Halle Winkler
    Politepix

    Hello,

    Yes, this is generally expected that an offline speech recognition engine with a small acoustic model is not going to recognize both adult and kid speech without acoustic model adaptation or creating a new model.

Viewing 2 posts - 1 through 2 (of 2 total)
  • The topic ‘is it difficult to recognize kids’ voice?’ is closed to new replies.