Getting invalid "pocketsphinxDidDetectSpeech" at the beginning of recognition

Home Forums OpenEars Getting invalid "pocketsphinxDidDetectSpeech" at the beginning of recognition

Viewing 6 posts - 1 through 6 (of 6 total)

  • Author
    Posts
  • #1027127
    dianto
    Participant

    Hi,

    First, I cannot thank you enough for making this technology available to us to use on mobile devices, and I greatly appreciate your patience and the time that you take to answer each of questions that come your way.

    I am hoping you can help me. The situation that I want to describe is the following:

    I am trying to use the “pocketsphinxDidDetectSpeech” event to signal that the there some speech or sound is detected, with no immediate need to what the recognized utterance. What I am seeing is that this event always gets generated in the the first second of the utterance although it is clear that there is no speech there. then, after that it only gets generated accurately when there is sound. So really only the first one that is generated is not valid and causing an issue in my application, and I really cannot ignore this event in my application because it could be actual speech from user in case the user spoke too soon.

    I am able to continuously replicate this issue, and the easiest way is to simply use the “testThatPocketsphinxCanPerformNormalRecognition” XCTest function that comes with your provided framework and testing code. I use this function to perform recognition on your provided audio file “word_statement_etc_short.wav”. I also use your provided “OEPocketSphinxControllerTests.m” file for running the test. I have also tried this with live audio, and I get the same behavior.

    My questions are:
    – Why is this happening?
    – Are there any parameters that I can update to make this first event go away when there is only silence at the beginning?

    The following is log output, and what you see are two “pocketsphinxDidDetectSpeech” events. The first one is not valid, while the second one is. This log is from running recognition on the audio utterance “word_statement_etc_short.wav”. I am also using your latest and greatest release :2.041.

    Here is the output log:
    [spoiler]
    Test Suite ‘Selected tests’ started at 2015-10-28 10:06:55.624
    Test Suite ‘OEPocketsphinxControllerTests’ started at 2015-10-28 10:06:55.626
    Test Case ‘-[OEPocketsphinxControllerTests testThatPocketsphinxCanPerformNormalRecognition]’ started.

    INFO: cmd_ln.c(703): Parsing command line:
    \
    -lm /Users/tony/Library/Developer/CoreSimulator/Devices/D14FCD70-16F9-4799-94B6-03D49036AD02/data/Library/Caches/WordLanguageModel.DMP \
    -vad_prespeech 10 \
    -vad_postspeech 69 \
    -vad_threshold 2.000000 \
    -remove_noise yes \
    -remove_silence yes \
    -bestpath yes \
    -lw 6.500000 \
    -dict /Users/tony/Library/Developer/CoreSimulator/Devices/D14FCD70-16F9-4799-94B6-03D49036AD02/data/Library/Caches/WordLanguageModel.dic \
    -hmm /Users/tony/Library/Developer/Xcode/DerivedData/OpenEars-ekarbrqtbdjiembzdtzjntpkuayh/Build/Products/Debug-iphonesimulator/OpenEarsTests.xctest/AcousticModelEnglish.bundle

    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -allphone
    -allphone_ci no no
    -alpha 0.97 9.700000e-01
    -argfile
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -debug 0
    -dict /Users/tony/Library/Developer/CoreSimulator/Devices/D14FCD70-16F9-4799-94B6-03D49036AD02/data/Library/Caches/WordLanguageModel.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm /Users/tony/Library/Developer/Xcode/DerivedData/OpenEars-ekarbrqtbdjiembzdtzjntpkuayh/Build/Products/Debug-iphonesimulator/OpenEarsTests.xctest/AcousticModelEnglish.bundle
    -input_endian little little
    -jsgf
    -keyphrase
    -kws
    -kws_delay 10 10
    -kws_plp 1e-1 1.000000e-01
    -kws_threshold 1 1.000000e+00
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lifter 0 0
    -lm /Users/tony/Library/Developer/CoreSimulator/Devices/D14FCD70-16F9-4799-94B6-03D49036AD02/data/Library/Caches/WordLanguageModel.DMP
    -lmctl
    -lmname
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf 30000 30000
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-10 1.000000e-10
    -pl_pip 1.0 1.000000e+00
    -pl_weight 3.0 3.000000e+00
    -pl_window 5 5
    -rawlogdir
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -uw 1.0 1.000000e+00
    -vad_postspeech 50 69
    -vad_prespeech 20 10
    -vad_startspeech 10 10
    -vad_threshold 2.0 2.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02

    INFO: cmd_ln.c(703): Parsing command line:
    \
    -nfilt 25 \
    -lowerf 130 \
    -upperf 6800 \
    -feat 1s_c_d_dd \
    -svspec 0-12/13-25/26-38 \
    -agc none \
    -cmn current \
    -varnorm no \
    -transform dct \
    -lifter 22 \
    -cmninit 40

    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 40
    -dither no no
    -doublebw no no
    -feat 1s_c_d_dd 1s_c_d_dd
    -frate 100 100
    -input_endian little little
    -lda
    -ldadim 0 0
    -lifter 0 22
    -logspec no no
    -lowerf 133.33334 1.300000e+02
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 25
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -smoothspec no no
    -svspec 0-12/13-25/26-38
    -transform legacy dct
    -unit_area yes yes
    -upperf 6855.4976 6.800000e+03
    -vad_postspeech 50 69
    -vad_prespeech 20 10
    -vad_startspeech 10 10
    -vad_threshold 2.0 2.000000e+00
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wlen 0.025625 2.562500e-02

    INFO: acmod.c(266): Parsed model-specific feature parameters from /Users/tony/Library/Developer/Xcode/DerivedData/OpenEars-ekarbrqtbdjiembzdtzjntpkuayh/Build/Products/Debug-iphonesimulator/OpenEarsTests.xctest/AcousticModelEnglish.bundle/feat.params
    INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’current’, VARNORM=’no’, AGC=’none’
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: acmod.c(173): Using subvector specification 0-12/13-25/26-38
    INFO: mdef.c(518): Reading model definition: /Users/tony/Library/Developer/Xcode/DerivedData/OpenEars-ekarbrqtbdjiembzdtzjntpkuayh/Build/Products/Debug-iphonesimulator/OpenEarsTests.xctest/AcousticModelEnglish.bundle/mdef
    INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
    INFO: bin_mdef.c(336): Reading binary model definition: /Users/tony/Library/Developer/Xcode/DerivedData/OpenEars-ekarbrqtbdjiembzdtzjntpkuayh/Build/Products/Debug-iphonesimulator/OpenEarsTests.xctest/AcousticModelEnglish.bundle/mdef
    INFO: bin_mdef.c(516): 46 CI-phone, 168344 CD-phone, 3 emitstate/phone, 138 CI-sen, 6138 Sen, 32881 Sen-Seq
    INFO: tmat.c(206): Reading HMM transition probability matrices: /Users/tony/Library/Developer/Xcode/DerivedData/OpenEars-ekarbrqtbdjiembzdtzjntpkuayh/Build/Products/Debug-iphonesimulator/OpenEarsTests.xctest/AcousticModelEnglish.bundle/transition_matrices
    INFO: acmod.c(124): Attempting to use PTM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /Users/tony/Library/Developer/Xcode/DerivedData/OpenEars-ekarbrqtbdjiembzdtzjntpkuayh/Build/Products/Debug-iphonesimulator/OpenEarsTests.xctest/AcousticModelEnglish.bundle/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /Users/tony/Library/Developer/Xcode/DerivedData/OpenEars-ekarbrqtbdjiembzdtzjntpkuayh/Build/Products/Debug-iphonesimulator/OpenEarsTests.xctest/AcousticModelEnglish.bundle/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: ptm_mgau.c(805): Number of codebooks doesn’t match number of ciphones, doesn’t look like PTM: 1 != 46
    INFO: acmod.c(126): Attempting to use semi-continuous computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /Users/tony/Library/Developer/Xcode/DerivedData/OpenEars-ekarbrqtbdjiembzdtzjntpkuayh/Build/Products/Debug-iphonesimulator/OpenEarsTests.xctest/AcousticModelEnglish.bundle/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /Users/tony/Library/Developer/Xcode/DerivedData/OpenEars-ekarbrqtbdjiembzdtzjntpkuayh/Build/Products/Debug-iphonesimulator/OpenEarsTests.xctest/AcousticModelEnglish.bundle/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: s2_semi_mgau.c(904): Loading senones from dump file /Users/tony/Library/Developer/Xcode/DerivedData/OpenEars-ekarbrqtbdjiembzdtzjntpkuayh/Build/Products/Debug-iphonesimulator/OpenEarsTests.xctest/AcousticModelEnglish.bundle/sendump
    INFO: s2_semi_mgau.c(928): BEGIN FILE FORMAT DESCRIPTION
    INFO: s2_semi_mgau.c(991): Rows: 512, Columns: 6138
    INFO: s2_semi_mgau.c(1023): Using memory-mapped I/O for senones
    INFO: s2_semi_mgau.c(1294): Maximum top-N: 4 Top-N beams: 0 0 0
    INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 4112 * 32 bytes (128 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: /Users/tony/Library/Developer/CoreSimulator/Devices/D14FCD70-16F9-4799-94B6-03D49036AD02/data/Library/Caches/WordLanguageModel.dic
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(336): 7 words read
    INFO: dict.c(358): Reading filler dictionary: /Users/tony/Library/Developer/Xcode/DerivedData/OpenEars-ekarbrqtbdjiembzdtzjntpkuayh/Build/Products/Debug-iphonesimulator/OpenEarsTests.xctest/AcousticModelEnglish.bundle/noisedict
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(361): 9 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 46^3 * 2 bytes (190 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 51152 bytes (49 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 51152 bytes (49 KiB) for single-phone word triphones
    INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
    INFO: ngram_model_dmp.c(166): Will use memory-mapped I/O for LM file
    INFO: ngram_model_dmp.c(220): ngrams 1=8, 2=11, 3=6
    INFO: ngram_model_dmp.c(266): 8 = LM.unigrams(+trailer) read
    INFO: ngram_model_dmp.c(312): 11 = LM.bigrams(+trailer) read
    INFO: ngram_model_dmp.c(338): 6 = LM.trigrams read
    INFO: ngram_model_dmp.c(363): 3 = LM.prob2 entries read
    INFO: ngram_model_dmp.c(383): 3 = LM.bo_wt2 entries read
    INFO: ngram_model_dmp.c(403): 2 = LM.prob3 entries read
    INFO: ngram_model_dmp.c(431): 1 = LM.tseg_base entries read
    INFO: ngram_model_dmp.c(487): 8 = ascii word strings read
    INFO: ngram_search_fwdtree.c(99): 5 unique initial diphones
    INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 12 single-phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 12 single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 143
    INFO: ngram_search_fwdtree.c(339): after: 5 root, 15 non-root channels, 11 single-phone words
    INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
    2015-10-28 10:06:57.113 xctest[65723:3294736] micPermissionCheckCompleted:TRUE
    2015-10-28 10:06:57.113 xctest[65723:3294736] pocketsphinxDidStartListening
    2015-10-28 10:06:57.114 xctest[65723:3294736] pocketsphinxRecognitionLoopDidStart
    2015-10-28 10:06:57.322 xctest[65723:3294736] pocketsphinxDidDetectSpeech
    INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
    INFO: cmn_prior.c(149): cmn_prior_update: to < 27.12 -8.61 -0.62 2.95 -1.22 2.19 -1.45 0.19 -3.44 0.63 -2.10 -0.27 -0.46 >
    2015-10-28 10:06:58.337 xctest[65723:3294736] pocketsphinxDidDetectFinishedSpeech
    INFO: ngram_search_fwdtree.c(1553): 932 words recognized (8/fr)
    INFO: ngram_search_fwdtree.c(1555): 7054 senones evaluated (62/fr)
    INFO: ngram_search_fwdtree.c(1559): 2024 channels searched (17/fr), 550 1st, 1213 last
    INFO: ngram_search_fwdtree.c(1562): 1213 words for which last channels evaluated (10/fr)
    INFO: ngram_search_fwdtree.c(1564): 0 candidate words for entering last phone (0/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 0.06 CPU 0.057 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 1.16 wall 1.013 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 2 words
    INFO: ngram_search_fwdflat.c(948): 813 words recognized (7/fr)
    INFO: ngram_search_fwdflat.c(950): 2319 senones evaluated (20/fr)
    INFO: ngram_search_fwdflat.c(952): 993 channels searched (8/fr)
    INFO: ngram_search_fwdflat.c(954): 993 words searched (8/fr)
    INFO: ngram_search_fwdflat.c(957): 55 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.01 CPU 0.010 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.01 wall 0.010 xRT
    INFO: ngram_search.c(1280): lattice start node <s>.0 end node </s>.3
    INFO: ngram_search.c(1306): Eliminated 5 nodes before end node
    INFO: ngram_search.c(1411): Lattice has 453 nodes, 1 links
    INFO: ps_lattice.c(1380): Bestpath score: -21450
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:3:112) = -1817116
    INFO: ps_lattice.c(1441): Joint P(O,S) = -1817117 P(S|O) = -1
    INFO: ngram_search.c(899): bestpath 0.00 CPU 0.001 xRT
    INFO: ngram_search.c(902): bestpath 0.00 wall 0.001 xRT
    2015-10-28 10:06:58.351 xctest[65723:3294736] hypothesis for test ‘Confirms “WORD STATEMENT SOMEONE’S OTHER WORD A PHRASE” is recognized w/score better than |-1200000| is “”‘ with a score of -1
    2015-10-28 10:07:01.921 xctest[65723:3294736] pocketsphinxDidDetectSpeech
    2015-10-28 10:07:06.439 xctest[65723:3294736] pocketsphinxTestRecognitionCompleted
    INFO: cmn_prior.c(131): cmn_prior_update: from < 27.12 -8.61 -0.62 2.95 -1.22 2.19 -1.45 0.19 -3.44 0.63 -2.10 -0.27 -0.46 >
    INFO: cmn_prior.c(149): cmn_prior_update: to < 36.49 2.59 7.35 7.35 4.34 2.97 -15.20 -4.60 -10.92 5.85 -2.93 -2.82 5.84 >
    2015-10-28 10:07:06.535 xctest[65723:3294736] pocketsphinxDidDetectFinishedSpeech
    INFO: ngram_search_fwdtree.c(1553): 3868 words recognized (8/fr)
    INFO: ngram_search_fwdtree.c(1555): 51602 senones evaluated (110/fr)
    INFO: ngram_search_fwdtree.c(1559): 21581 channels searched (45/fr), 2170 1st, 16529 last
    INFO: ngram_search_fwdtree.c(1562): 4879 words for which last channels evaluated (10/fr)
    INFO: ngram_search_fwdtree.c(1564): 385 candidate words for entering last phone (0/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 0.39 CPU 0.084 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 8.19 wall 1.742 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 9 words
    INFO: ngram_search_fwdflat.c(948): 3298 words recognized (7/fr)
    INFO: ngram_search_fwdflat.c(950): 48994 senones evaluated (104/fr)
    INFO: ngram_search_fwdflat.c(952): 23079 channels searched (49/fr)
    INFO: ngram_search_fwdflat.c(954): 5710 words searched (12/fr)
    INFO: ngram_search_fwdflat.c(957): 1083 word transitions (2/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.10 CPU 0.022 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.10 wall 0.022 xRT
    INFO: ngram_search.c(1280): lattice start node <s>.0 end node </s>.440
    INFO: ngram_search.c(1306): Eliminated 5 nodes before end node
    INFO: ngram_search.c(1411): Lattice has 1273 nodes, 5863 links
    INFO: ps_lattice.c(1380): Bestpath score: -58838
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:440:468) = -3076824
    INFO: ps_lattice.c(1441): Joint P(O,S) = -3260383 P(S|O) = -183559
    INFO: ngram_search.c(899): bestpath 0.02 CPU 0.005 xRT
    INFO: ngram_search.c(902): bestpath 0.02 wall 0.005 xRT
    2015-10-28 10:07:06.662 xctest[65723:3294736] hypothesis for test ‘Confirms “WORD STATEMENT SOMEONE’S OTHER WORD A PHRASE” is recognized w/score better than |-1200000| is “WORD STATEMENT SOMEONE’S OTHER WORD A PHRASE”‘ with a score of -183559
    2015-10-28 10:07:06.663 xctest[65723:3294736] Passing test because the first recognition has recognized the spoken phrase WORD STATEMENT SOMEONE’S OTHER WORD A PHRASE and with a score of -183559 which is better than the minimum of -1200000
    INFO: cmn_prior.c(131): cmn_prior_update: from < 36.49 2.59 7.35 7.35 4.34 2.97 -15.20 -4.60 -10.92 5.85 -2.93 -2.82 5.84 >
    INFO: cmn_prior.c(149): cmn_prior_update: to < 36.49 2.59 7.35 7.35 4.34 2.97 -15.20 -4.60 -10.92 5.85 -2.93 -2.82 5.84 >
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 0 words
    INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 0.46 CPU 0.080 xRT
    INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 9.50 wall 1.632 xRT
    INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.12 CPU 0.020 xRT
    INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.12 wall 0.020 xRT
    INFO: ngram_search.c(307): TOTAL bestpath 0.02 CPU 0.004 xRT
    INFO: ngram_search.c(310): TOTAL bestpath 0.02 wall 0.004 xRT
    INFO: fe_noise.c(315): Low SNR [21] frames; Low volume [89] frames
    Test Case ‘-[OEPocketsphinxControllerTests testThatPocketsphinxCanPerformNormalRecognition]’ passed (11.197 seconds).
    Test Suite ‘OEPocketsphinxControllerTests’ passed at 2015-10-28 10:07:06.824.
    Executed 1 test, with 0 failures (0 unexpected) in 11.197 (11.198) seconds
    Test Suite ‘Selected tests’ passed at 2015-10-28 10:07:06.825.
    Executed 1 test, with 0 failures (0 unexpected) in 11.197 (11.200) seconds
    Program ended with exit code: 0
    [/spoiler]

    #1027128
    Halle Winkler
    Politepix

    Welcome,

    Thanks for showing your logs. Unfortunately the test output can’t be used to report an issue because it is simulator behavior, but simulator behavior can’t be used to report issues because it uses a different audio driver, mic, etc. There isn’t a known issue like this for any devices. What happens when you test on a real device?

    #1027129
    dianto
    Participant

    I did not test on a real device yet. I am still in simulation world and it will be a few weeks before moving to a real device. I have read many of your posts and I understand the difference between simulator mode and real device mode. I can wait and see what happens when I run on a real device.

    One last observation, the logs that I gave you are for a recognition on a file without using the audio driver. So I am not sure if using a simulator or not will affect recognition on your provided audio file. If anything I would expect that the front end audio speech detector of PocketSphinx to work flawlessly, using a simulator or not, on a file that has 3-4 seconds of silence in its beginning.

    Is it possible that you run this simple test case on the file with the test function that I mentioned above using a simulator and send me the output log?

    Thank you much.

    #1027130
    Halle Winkler
    Politepix

    Hi,

    Sorry, you need to test live input on a device before reporting any issues. It isn’t the same as Pocketsphinx’s audio file reading interface, and the test function tests the driver, the behavior of which is affected by whether it is a real device or a simulated host.

    #1027132
    dianto
    Participant

    Ok. Thank you very much for your time. I do want to let you know that this one has baffled me for quite some time. So I got the PocketSphinx and LibSphinx source codes, and I ran those independent of the OpenEars platform with the audio file content sent directly to “acmod”, “fe” and “fe_noise.c” and I see the same exact behavior as with OpenEars; “fe_track_snr” function shows SNR high for the first 70 frames then comes down to silence and works accurately for the remainder of the audio data in the file.

    Anyway, I think that this level of discussion is outside the scope of this support forum. Once again, thank you very much for taking the time to support. I will revisit after I run the application on a real device, or if I run into any other OpenEars challenges.

    Thank you.

    #1027134
    Halle Winkler
    Politepix

    I appreciate your understanding – we can look into it when it’s confirmed happening in live mode from a device mic. If it’s always about 70 frames it could be some kind of VAD calibration artifact since that sounds vaguely like the window size.

Viewing 6 posts - 1 through 6 (of 6 total)
  • You must be logged in to reply to this topic.