reduce false positivse

Home Forums OpenEars plugins reduce false positivse

Viewing 6 posts - 1 through 6 (of 6 total)

  • Author
    Posts
  • #1027632
    doron
    Participant

    Hi,
    I’ve a dictionary with a single made-up-word (Nexar – the company name) detected successfully by OpenEars. My problem is a very large amount of false positives. A test file of 10 minutes of rapid american radio talk yielded 18 false positives. Can you provide me any pointers in trying to reduce those?

    Some additional information:
    1) I’m using Rejecto with RapidEars
    2) Playing with parameters, I found that vad of 2.5 and rejection weight of 1.085 gave me the best results so far.
    3) Aside from the rejection tokens, my dictionary file consists of a single entry: NECKSAHR N EH K S AA R

    pasting below logs of a short false-positive detection:

    /Users/user/Library/Caches/AppCode33/DerivedData/VoiceCommandTest-8c9721f1/Build/Products/Debug-iphonesimulator/VoiceCommandTest.app
    Simulator session started with process 72311
    2015-12-28 12:22:46.147 VoiceCommandTest[72311:6341951] Starting OpenEars logging for OpenEars version 2.041 on 64-bit device (or build): iPhone running iOS version: 9.200000
    2015-12-28 12:22:46.177 VoiceCommandTest[72311:6341951] The word NECKSAHR was not found in the dictionary /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/LanguageModelGeneratorLookupList.text/LanguageModelGeneratorLookupList.text.
    2015-12-28 12:22:46.177 VoiceCommandTest[72311:6341951] Now using the fallback method to look up the word NECKSAHR
    2015-12-28 12:22:46.177 VoiceCommandTest[72311:6341951] If this is happening more frequently than you would expect, the most likely cause for it is since you are using the English phonetic lookup dictionary is that your words are not in English or aren’t dictionary words.
    2015-12-28 12:22:46.177 VoiceCommandTest[72311:6341951] Using convertGraphemes for the word or phrase NECKSAHR which doesn’t appear in the dictionary
    2015-12-28 12:22:46.182 VoiceCommandTest[72311:6341951] I’m done running performDictionaryLookup and it took 0.023127 seconds
    2015-12-28 12:22:46.183 VoiceCommandTest[72311:6341951] I’m done running performDictionaryLookup and it took 0.025443 seconds
    2015-12-28 12:22:46.183 VoiceCommandTest[72311:6341951] Starting dynamic language model generation
    wfreq2vocab : Done.
    text2idngram
    Vocab : /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.vocab
    Output idngram : /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.idngram
    N-gram buffer size : 10
    Hash table size : 5000
    Temp directory : /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/cmuclmtk-KcWuFB
    Max open files : 20
    FOF size : 10
    n : 3
    Initialising hash table…
    Reading vocabulary…
    Allocating memory for the n-gram buffer…
    Reading text into the n-gram buffer…
    20,000 n-grams processed for each “.”, 1,000,000 for each line.

    Sorting n-grams…
    Writing sorted n-grams to temporary file /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/cmuclmtk-KcWuFB/1
    Merging 1 temporary files…

    ## Vocab generated by v2 of the CMU-Cambridge Statistcal
    ## Language Modeling toolkit.
    ##
    2-grams occurring: N times > N times Sug. -spec_num value
    0 81 91
    1 80 1 11
    ## Includes 42 words ##
    2 0 1 11
    3 0 1 11
    4 0 1 11
    5 0 1 11
    6 0 1 11
    7 0 1 11
    8 0 1 11
    9 0 1 11
    10 0 1 11

    3-grams occurring: N times > N times Sug. -spec_num value
    0 120 131
    1 120 0 10
    2 0 0 10
    3 0 0 10
    4 0 0 10
    5 0 0 10
    6 0 0 10
    7 0 0 10
    8 0 0 10
    9 0 0 10
    10 0 0 10
    text2idngram : Done.
    read_wlist_into_siht: a list of 42 words was read from “/Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.vocab”.
    read_wlist_into_array: a list of 42 words was read from “/Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.vocab”.

    Unigram was renormalized to absorb a mass of 0.5
    prob[UNK] = 1e-99
    ARPA-style 3-gram will be written to /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.arpa
    idngram2lm : Done.
    INFO: cmd_ln.c(703): Parsing command line:
    sphinx_lm_convert \
    -i /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.arpa \
    -o /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.DMP \
    -debug 10

    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -case
    -debug 10
    -help no no
    -i /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.arpa
    -ifmt
    -logbase 1.0001 1.000100e+00
    -mmap no no
    -o /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.DMP
    -ofmt

    INFO: ngram_model_arpa.c(503): ngrams 1=42, 2=80, 3=40
    INFO: ngram_model_arpa.c(135): Reading unigrams
    INFO: ngram_model_arpa.c(542): 42 = #unigrams created
    INFO: ngram_model_arpa.c(195): Reading bigrams
    INFO: ngram_model_arpa.c(560): 80 = #bigrams created
    INFO: ngram_model_arpa.c(561): 3 = #prob2 entries
    INFO: ngram_model_arpa.c(569): 3 = #bo_wt2 entries
    INFO: ngram_model_arpa.c(292): Reading trigrams
    INFO: ngram_model_arpa.c(582): 40 = #trigrams created
    INFO: ngram_model_arpa.c(583): 2 = #prob3 entries
    INFO: ngram_model_dmp.c(518): Building DMP model…
    INFO: ngram_model_dmp.c(548): 42 = #unigrams created
    INFO: ngram_model_dmp.c(649): 80 = #bigrams created
    INFO: ngram_model_dmp.c(650): 3 = #prob2 entries
    INFO: ngram_model_dmp.c(657): 3 = #bo_wt2 entries
    INFO: ngram_model_dmp.c(661): 40 = #trigrams created
    INFO: ngram_model_dmp.c(662): 2 = #prob3 entries
    2015-12-28 12:22:46.191 VoiceCommandTest[72311:6341951] Done creating language model with CMUCLMTK in 0.007703 seconds.
    INFO: cmd_ln.c(703): Parsing command line:
    sphinx_lm_convert \
    -i /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.arpa \
    -o /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.DMP \
    -debug 10

    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -case
    -debug 10
    -help no no
    -i /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.arpa
    -ifmt
    -logbase 1.0001 1.000100e+00
    -mmap no no
    -o /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.DMP
    -ofmt

    INFO: ngram_model_arpa.c(503): ngrams 1=42, 2=80, 3=40
    INFO: ngram_model_arpa.c(135): Reading unigrams
    INFO: ngram_model_arpa.c(542): 42 = #unigrams created
    INFO: ngram_model_arpa.c(195): Reading bigrams
    INFO: ngram_model_arpa.c(560): 80 = #bigrams created
    INFO: ngram_model_arpa.c(561): 5 = #prob2 entries
    INFO: ngram_model_arpa.c(569): 3 = #bo_wt2 entries
    INFO: ngram_model_arpa.c(292): Reading trigrams
    INFO: ngram_model_arpa.c(582): 40 = #trigrams created
    INFO: ngram_model_arpa.c(583): 3 = #prob3 entries
    INFO: ngram_model_dmp.c(518): Building DMP model…
    INFO: ngram_model_dmp.c(548): 42 = #unigrams created
    INFO: ngram_model_dmp.c(649): 80 = #bigrams created
    INFO: ngram_model_dmp.c(650): 5 = #prob2 entries
    INFO: ngram_model_dmp.c(657): 3 = #bo_wt2 entries
    INFO: ngram_model_dmp.c(661): 40 = #trigrams created
    INFO: ngram_model_dmp.c(662): 3 = #prob3 entries
    2015-12-28 12:22:46.199 VoiceCommandTest[72311:6341951] I’m done running dynamic language model generation and it took 0.047171 seconds
    2015-12-28 12:22:46.200 VoiceCommandTest[72311:6341951] [[OEPocketsphinxController sharedInstance] stopListening] was called while listening was not in progress. This is not necessarily an exception, just a notification that that a request to stop a listening session was ignored because there was no active listening session to stop.
    2015-12-28 12:22:46.201 VoiceCommandTest[72311:6341951] User gave mic permission for this app.
    2015-12-28 12:22:46.201 VoiceCommandTest[72311:6341951] setSecondsOfSilence wasn’t set, using default of 0.700000.
    2015-12-28 12:22:46.201 VoiceCommandTest[72311:6342018] Starting listening.
    2015-12-28 12:22:46.201 VoiceCommandTest[72311:6342018] about to set up audio session
    2015-12-28 12:22:46.202 VoiceCommandTest[72311:6342018] Creating audio session with default settings.
    2015-12-28 12:22:46.347 VoiceCommandTest[72311:6342018] done starting audio unit
    INFO: cmd_ln.c(703): Parsing command line:
    \
    -lm /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.DMP \
    -vad_prespeech 10 \
    -vad_postspeech 69 \
    -vad_threshold 3.000000 \
    -remove_noise yes \
    -remove_silence yes \
    -bestpath yes \
    -lw 6.500000 \
    -dict /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.dic \
    -hmm /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle

    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -allphone
    -allphone_ci no no
    -alpha 0.97 9.700000e-01
    -argfile
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -debug 0
    -dict /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle
    -input_endian little little
    -jsgf
    -keyphrase
    -kws
    -kws_delay 10 10
    -kws_plp 1e-1 1.000000e-01
    -kws_threshold 1 1.000000e+00
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lifter 0 0
    -lm /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.DMP
    -lmctl
    -lmname
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf 30000 30000
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-10 1.000000e-10
    -pl_pip 1.0 1.000000e+00
    -pl_weight 3.0 3.000000e+00
    -pl_window 5 5
    -rawlogdir
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -uw 1.0 1.000000e+00
    -vad_postspeech 50 69
    -vad_prespeech 20 10
    -vad_startspeech 10 10
    -vad_threshold 2.0 3.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02

    INFO: cmd_ln.c(703): Parsing command line:
    \
    -nfilt 25 \
    -lowerf 130 \
    -upperf 6800 \
    -feat 1s_c_d_dd \
    -svspec 0-12/13-25/26-38 \
    -agc none \
    -cmn current \
    -varnorm no \
    -transform dct \
    -lifter 22 \
    -cmninit 40

    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 40
    -dither no no
    -doublebw no no
    -feat 1s_c_d_dd 1s_c_d_dd
    -frate 100 100
    -input_endian little little
    -lda
    -ldadim 0 0
    -lifter 0 22
    -logspec no no
    -lowerf 133.33334 1.300000e+02
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 25
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -smoothspec no no
    -svspec 0-12/13-25/26-38
    -transform legacy dct
    -unit_area yes yes
    -upperf 6855.4976 6.800000e+03
    -vad_postspeech 50 69
    -vad_prespeech 20 10
    -vad_startspeech 10 10
    -vad_threshold 2.0 3.000000e+00
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wlen 0.025625 2.562500e-02

    INFO: acmod.c(252): Parsed model-specific feature parameters from /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/feat.params
    INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’current’, VARNORM=’no’, AGC=’none’
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: acmod.c(171): Using subvector specification 0-12/13-25/26-38
    INFO: mdef.c(518): Reading model definition: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/mdef
    INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
    INFO: bin_mdef.c(336): Reading binary model definition: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/mdef
    INFO: bin_mdef.c(516): 46 CI-phone, 168344 CD-phone, 3 emitstate/phone, 138 CI-sen, 6138 Sen, 32881 Sen-Seq
    INFO: tmat.c(206): Reading HMM transition probability matrices: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/transition_matrices
    INFO: acmod.c(124): Attempting to use PTM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: ptm_mgau.c(805): Number of codebooks doesn’t match number of ciphones, doesn’t look like PTM: 1 != 46
    INFO: acmod.c(126): Attempting to use semi-continuous computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: s2_semi_mgau.c(904): Loading senones from dump file /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/sendump
    INFO: s2_semi_mgau.c(928): BEGIN FILE FORMAT DESCRIPTION
    INFO: s2_semi_mgau.c(991): Rows: 512, Columns: 6138
    INFO: s2_semi_mgau.c(1023): Using memory-mapped I/O for senones
    INFO: s2_semi_mgau.c(1294): Maximum top-N: 4 Top-N beams: 0 0 0
    INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 4145 * 32 bytes (129 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.dic
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(336): 40 words read
    INFO: dict.c(358): Reading filler dictionary: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/noisedict
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(361): 9 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 46^3 * 2 bytes (190 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 51152 bytes (49 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 51152 bytes (49 KiB) for single-phone word triphones
    INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
    INFO: ngram_model_dmp.c(166): Will use memory-mapped I/O for LM file
    INFO: ngram_model_dmp.c(220): ngrams 1=42, 2=80, 3=40
    INFO: ngram_model_dmp.c(266): 42 = LM.unigrams(+trailer) read
    INFO: ngram_model_dmp.c(312): 80 = LM.bigrams(+trailer) read
    INFO: ngram_model_dmp.c(338): 40 = LM.trigrams read
    INFO: ngram_model_dmp.c(363): 5 = LM.prob2 entries read
    INFO: ngram_model_dmp.c(383): 3 = LM.bo_wt2 entries read
    INFO: ngram_model_dmp.c(403): 3 = LM.prob3 entries read
    INFO: ngram_model_dmp.c(431): 1 = LM.tseg_base entries read
    INFO: ngram_model_dmp.c(487): 42 = ascii word strings read
    INFO: ngram_search_fwdtree.c(99): 1 unique initial diphones
    INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 49 single-phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 49 single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 132
    INFO: ngram_search_fwdtree.c(339): after: 1 root, 4 non-root channels, 48 single-phone words
    INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
    2015-12-28 12:22:46.386 VoiceCommandTest[72311:6342018] Listening.
    2015-12-28 12:22:46.386 VoiceCommandTest[72311:6342018] Project has these words or phrases in its dictionary:
    ___REJ_ZH
    ___REJ_Z
    ___REJ_Y
    ___REJ_W
    ___REJ_V
    ___REJ_UW
    ___REJ_UH
    ___REJ_TH
    ___REJ_T
    ___REJ_SH
    ___REJ_S
    ___REJ_R
    ___REJ_P
    ___REJ_OY
    ___REJ_OW
    ___REJ_NG
    ___REJ_N
    ___REJ_M
    ___REJ_L
    ___REJ_K
    ___REJ_JH
    ___REJ_IY
    ___REJ_IH
    ___REJ_HH
    ___REJ_G
    ___REJ_F
    ___REJ_EY
    ___REJ_ER
    ___REJ_EH
    ___REJ_DH
    ___REJ_D
    …and 10 more.
    2015-12-28 12:22:46.386 VoiceCommandTest[72311:6342018] Recognition loop has started
    2015-12-28 12:22:46.999 VoiceCommandTest[72311:6342018] Speech detected…
    2015-12-28 12:22:46.999 VoiceCommandTest[72311:6342017] Pocketsphinx heard ” ” with a score of (-7525) and an utterance ID of 0.
    2015-12-28 12:22:46.999 VoiceCommandTest[72311:6342017] Hypothesis was null so we aren’t returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController’s property returnNullHypotheses to TRUE before starting OEPocketsphinxController.
    2015-12-28 12:22:47.468 VoiceCommandTest[72311:6342018] Pocketsphinx heard “NECKSAHR” with a score of (-15012) and an utterance ID of 1.
    INFO: ngram_search.c(463): Resized backpointer table to 10000 entries
    2015-12-28 12:22:47.963 VoiceCommandTest[72311:6342017] Pocketsphinx heard “NECKSAHR” with a score of (-22823) and an utterance ID of 2.
    2015-12-28 12:22:48.461 VoiceCommandTest[72311:6342017] Pocketsphinx heard “NECKSAHR” with a score of (-30895) and an utterance ID of 3.
    2015-12-28 12:22:48.948 VoiceCommandTest[72311:6342017] Pocketsphinx heard “NECKSAHR” with a score of (-35850) and an utterance ID of 4.
    2015-12-28 12:22:49.451 VoiceCommandTest[72311:6342017] End of speech detected…
    INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
    INFO: cmn_prior.c(149): cmn_prior_update: to < 58.35 12.25 -3.21 4.82 -11.31 -13.14 -17.84 3.35 -3.89 3.06 16.72 -4.16 -4.84 >
    INFO: ngram_search_fwdtree.c(1553): 7933 words recognized (30/fr)
    INFO: ngram_search_fwdtree.c(1555): 40252 senones evaluated (153/fr)
    INFO: ngram_search_fwdtree.c(1559): 14421 channels searched (54/fr), 259 1st, 13544 last
    INFO: ngram_search_fwdtree.c(1562): 12090 words for which last channels evaluated (45/fr)
    INFO: ngram_search_fwdtree.c(1564): 49 candidate words for entering last phone (0/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 0.08 CPU 0.032 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 2.50 wall 0.950 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 42 words
    INFO: ngram_search_fwdflat.c(948): 6133 words recognized (23/fr)
    INFO: ngram_search_fwdflat.c(950): 34144 senones evaluated (130/fr)
    INFO: ngram_search_fwdflat.c(952): 12777 channels searched (48/fr)
    INFO: ngram_search_fwdflat.c(954): 9961 words searched (37/fr)
    INFO: ngram_search_fwdflat.c(957): 6854 word transitions (26/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.02 CPU 0.007 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.02 wall 0.007 xRT
    INFO: ngram_search.c(1280): lattice start node <s>.0 end node </s>.197
    INFO: ngram_search.c(1306): Eliminated 5 nodes before end node
    INFO: ngram_search.c(1411): Lattice has 2731 nodes, 64952 links
    INFO: ps_lattice.c(1380): Bestpath score: -31208
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:197:261) = -1802783
    INFO: ps_lattice.c(1441): Joint P(O,S) = -1989268 P(S|O) = -186485
    INFO: ngram_search.c(899): bestpath 0.17 CPU 0.066 xRT
    INFO: ngram_search.c(902): bestpath 0.17 wall 0.065 xRT
    2015-12-28 12:22:49.640 VoiceCommandTest[72311:6342017] Pocketsphinx heard “NECKSAHR” with a score of (-37522) and an utterance ID of 5.

    #1027637
    Halle Winkler
    Politepix

    Welcome,

    The logs look a little bit unusual, have changes been made to the library? I can’t help with the issue under request because it is being tested with the simulator, unfortunately – can you test on a real device and see if you still have the same problem? A good piece of general advice is to make your keyword that you want to spot a little bit longer/more unique, e.g. instead of:

    NEXAR N EH K S AA R

    You could try something like:

    YONEXAR Y OW N EH K S AA R

    and the user can say “Yo, Nexar” (replacing with your preferred address for your product). Two syllables might be a bit too brief.

    #1027643
    doron
    Participant

    Hi Halle,
    Though switching to “Hey, Nexar” did reduce the false-positives rate, I’m trying to find a way to reduce them even more. Have anything else up your sleeve that might be useful?

    Pasting newly generated logs, run on a device this time. Did not make any changes to the library, I wonder why the logs are different.

    /private/var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/VoiceCommandTest
    2015-12-29 22:04:56.598 VoiceCommandTest[586:238056] Starting OpenEars logging for OpenEars version 2.041 on 64-bit device (or build): iPhone running iOS version: 9.100000
    2015-12-29 22:04:56.785 VoiceCommandTest[586:238056] The word HEYNECKSAHR was not found in the dictionary /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/LanguageModelGeneratorLookupList.text/LanguageModelGeneratorLookupList.text.
    2015-12-29 22:04:56.785 VoiceCommandTest[586:238056] Now using the fallback method to look up the word HEYNECKSAHR
    2015-12-29 22:04:56.785 VoiceCommandTest[586:238056] If this is happening more frequently than you would expect, the most likely cause for it is since you are using the English phonetic lookup dictionary is that your words are not in English or aren’t dictionary words.
    2015-12-29 22:04:56.786 VoiceCommandTest[586:238056] Using convertGraphemes for the word or phrase HEYNECKSAHR which doesn’t appear in the dictionary
    2015-12-29 22:04:56.803 VoiceCommandTest[586:238056] I’m done running performDictionaryLookup and it took 0.066977 seconds
    2015-12-29 22:04:56.804 VoiceCommandTest[586:238056] I’m done running performDictionaryLookup and it took 0.079497 seconds
    2015-12-29 22:04:56.817 VoiceCommandTest[586:238056] Starting dynamic language model generation
    ## Vocab generated by v2 of the CMU-Cambridge Statistcal
    ## Language Modeling toolkit.
    ##
    ## Includes 42 words ##
    wfreq2vocab : Done.
    text2idngram
    Vocab : /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.vocab
    Output idngram : /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.idngram
    N-gram buffer size : 10
    Hash table size : 5000
    Temp directory : /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/cmuclmtk-NxiDCy
    Max open files : 20
    FOF size : 10
    n : 3
    Initialising hash table…
    Reading vocabulary…
    Allocating memory for the n-gram buffer…
    Reading text into the n-gram buffer…
    20,000 n-grams processed for each “.”, 1,000,000 for each line.

    Sorting n-grams…
    Writing sorted n-grams to temporary file /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/cmuclmtk-NxiDCy/1
    Merging 1 temporary files…

    2-grams occurring: N times > N times Sug. -spec_num value
    0 81 91
    1 80 1 11
    2 0 1 11
    3 0 1 11
    4 0 1 11
    5 0 1 11
    6 0 1 11
    7 0 1 11
    8 0 1 11
    9 0 1 11
    10 0 1 11

    3-grams occurring: N times > N times Sug. -spec_num value
    0 120 131
    1 120 0 10
    2 0 0 10
    3 0 0 10
    4 0 0 10
    5 0 0 10
    6 0 0 10
    7 0 0 10
    8 0 0 10
    9 0 0 10
    10 0 0 10
    text2idngram : Done.

    read_wlist_into_siht: a list of 42 words was read from “/var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.vocab”.
    read_wlist_into_array: a list of 42 words was read from “/var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.vocab”.
    Unigram was renormalized to absorb a mass of 0.5
    prob[UNK] = 1e-99
    ARPA-style 3-gram will be written to /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.arpa
    idngram2lm : Done.
    INFO: cmd_ln.c(703): Parsing command line:
    sphinx_lm_convert \
    -i /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.arpa \
    -o /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.DMP \
    -debug 10

    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -case
    -debug 10
    -help no no
    -i /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.arpa
    -ifmt
    -logbase 1.0001 1.000100e+00
    -mmap no no
    -o /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.DMP
    -ofmt

    INFO: ngram_model_arpa.c(503): ngrams 1=42, 2=80, 3=40
    INFO: ngram_model_arpa.c(135): Reading unigrams
    INFO: ngram_model_arpa.c(542): 42 = #unigrams created
    INFO: ngram_model_arpa.c(195): Reading bigrams
    INFO: ngram_model_arpa.c(560): 80 = #bigrams created
    INFO: ngram_model_arpa.c(561): 3 = #prob2 entries
    INFO: ngram_model_arpa.c(569): 3 = #bo_wt2 entries
    INFO: ngram_model_arpa.c(292): Reading trigrams
    INFO: ngram_model_arpa.c(582): 40 = #trigrams created
    INFO: ngram_model_arpa.c(583): 2 = #prob3 entries
    INFO: ngram_model_dmp.c(518): Building DMP model…
    INFO: ngram_model_dmp.c(548): 42 = #unigrams created
    INFO: ngram_model_dmp.c(649): 80 = #bigrams created
    INFO: ngram_model_dmp.c(650): 3 = #prob2 entries
    INFO: ngram_model_dmp.c(657): 3 = #bo_wt2 entries
    INFO: ngram_model_dmp.c(661): 40 = #trigrams created
    INFO: ngram_model_dmp.c(662): 2 = #prob3 entries
    2015-12-29 22:04:56.896 VoiceCommandTest[586:238056] Done creating language model with CMUCLMTK in 0.079343 seconds.
    INFO: cmd_ln.c(703): Parsing command line:
    sphinx_lm_convert \
    -i /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.arpa \
    -o /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.DMP \
    -debug 10

    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -case
    -debug 10
    -help no no
    -i /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.arpa
    -ifmt
    -logbase 1.0001 1.000100e+00
    -mmap no no
    -o /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.DMP
    -ofmt

    INFO: ngram_model_arpa.c(503): ngrams 1=42, 2=80, 3=40
    INFO: ngram_model_arpa.c(135): Reading unigrams
    INFO: ngram_model_arpa.c(542): 42 = #unigrams created
    INFO: ngram_model_arpa.c(195): Reading bigrams
    INFO: ngram_model_arpa.c(560): 80 = #bigrams created
    INFO: ngram_model_arpa.c(561): 5 = #prob2 entries
    INFO: ngram_model_arpa.c(569): 3 = #bo_wt2 entries
    INFO: ngram_model_arpa.c(292): Reading trigrams
    INFO: ngram_model_arpa.c(582): 40 = #trigrams created
    INFO: ngram_model_arpa.c(583): 3 = #prob3 entries
    INFO: ngram_model_dmp.c(518): Building DMP model…
    INFO: ngram_model_dmp.c(548): 42 = #unigrams created
    INFO: ngram_model_dmp.c(649): 80 = #bigrams created
    INFO: ngram_model_dmp.c(650): 5 = #prob2 entries
    INFO: ngram_model_dmp.c(657): 3 = #bo_wt2 entries
    INFO: ngram_model_dmp.c(661): 40 = #trigrams created
    INFO: ngram_model_dmp.c(662): 3 = #prob3 entries
    2015-12-29 22:04:56.929 VoiceCommandTest[586:238056] I’m done running dynamic language model generation and it took 0.296972 seconds
    2015-12-29 22:04:56.930 VoiceCommandTest[586:238056] [[OEPocketsphinxController sharedInstance] stopListening] was called while listening was not in progress. This is not necessarily an exception, just a notification that that a request to stop a listening session was ignored because there was no active listening session to stop.
    2015-12-29 22:04:56.937 VoiceCommandTest[586:238056] User gave mic permission for this app.
    2015-12-29 22:04:56.938 VoiceCommandTest[586:238056] setSecondsOfSilence wasn’t set, using default of 0.700000.
    2015-12-29 22:04:56.938 VoiceCommandTest[586:238100] Starting listening.
    2015-12-29 22:04:56.939 VoiceCommandTest[586:238100] about to set up audio session
    2015-12-29 22:04:56.942 VoiceCommandTest[586:238100] Creating audio session with default settings.
    2015-12-29 22:04:57.517 VoiceCommandTest[586:238115] Audio route has changed for the following reason:
    2015-12-29 22:04:57.519 VoiceCommandTest[586:238115] There was a category change. The new category is AVAudioSessionCategoryPlayAndRecord
    2015-12-29 22:04:57.526 VoiceCommandTest[586:238115] This is not a case in which OpenEars notifies of a route change. At the close of this function, the new audio route is —SpeakerMicrophoneBuiltIn—. The previous route before changing to this route was <AVAudioSessionRouteDescription: 0x1475a4ec0,
    inputs = (null);
    outputs = (
    “<AVAudioSessionPortDescription: 0x1475a4ef0, type = Speaker; name = Speaker; UID = Speaker; selectedDataSource = (null)>”
    )>.
    2015-12-29 22:04:57.598 VoiceCommandTest[586:238100] done starting audio unit
    INFO: cmd_ln.c(703): Parsing command line:
    \
    -lm /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.DMP \
    -vad_prespeech 10 \
    -vad_postspeech 69 \
    -vad_threshold 2.500000 \
    -remove_noise yes \
    -remove_silence yes \
    -bestpath yes \
    -lw 6.500000 \
    -dict /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.dic \
    -hmm /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle

    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -allphone
    -allphone_ci no no
    -alpha 0.97 9.700000e-01
    -argfile
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -debug 0
    -dict /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle
    -input_endian little little
    -jsgf
    -keyphrase
    -kws
    -kws_delay 10 10
    -kws_plp 1e-1 1.000000e-01
    -kws_threshold 1 1.000000e+00
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lifter 0 0
    -lm /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.DMP
    -lmctl
    -lmname
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf 30000 30000
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-10 1.000000e-10
    -pl_pip 1.0 1.000000e+00
    -pl_weight 3.0 3.000000e+00
    -pl_window 5 5
    -rawlogdir
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -uw 1.0 1.000000e+00
    -vad_postspeech 50 69
    -vad_prespeech 20 10
    -vad_startspeech 10 10
    -vad_threshold 2.0 2.500000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02

    INFO: cmd_ln.c(703): Parsing command line:
    \
    -nfilt 25 \
    -lowerf 130 \
    -upperf 6800 \
    -feat 1s_c_d_dd \
    -svspec 0-12/13-25/26-38 \
    -agc none \
    -cmn current \
    -varnorm no \
    -transform dct \
    -lifter 22 \
    -cmninit 40

    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 40
    -dither no no
    -doublebw no no
    -feat 1s_c_d_dd 1s_c_d_dd
    -frate 100 100
    -input_endian little little
    -lda
    -ldadim 0 0
    -lifter 0 22
    -logspec no no
    -lowerf 133.33334 1.300000e+02
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 25
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -smoothspec no no
    -svspec 0-12/13-25/26-38
    -transform legacy dct
    -unit_area yes yes
    -upperf 6855.4976 6.800000e+03
    -vad_postspeech 50 69
    -vad_prespeech 20 10
    -vad_startspeech 10 10
    -vad_threshold 2.0 2.500000e+00
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wlen 0.025625 2.562500e-02

    INFO: acmod.c(252): Parsed model-specific feature parameters from /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/feat.params
    INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’current’, VARNORM=’no’, AGC=’none’
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: acmod.c(171): Using subvector specification 0-12/13-25/26-38
    INFO: mdef.c(518): Reading model definition: /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/mdef
    INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
    INFO: bin_mdef.c(336): Reading binary model definition: /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/mdef
    INFO: bin_mdef.c(516): 46 CI-phone, 168344 CD-phone, 3 emitstate/phone, 138 CI-sen, 6138 Sen, 32881 Sen-Seq
    INFO: tmat.c(206): Reading HMM transition probability matrices: /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/transition_matrices
    INFO: acmod.c(124): Attempting to use PTM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: ptm_mgau.c(805): Number of codebooks doesn’t match number of ciphones, doesn’t look like PTM: 1 != 46
    INFO: acmod.c(126): Attempting to use semi-continuous computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: s2_semi_mgau.c(904): Loading senones from dump file /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/sendump
    INFO: s2_semi_mgau.c(928): BEGIN FILE FORMAT DESCRIPTION
    INFO: s2_semi_mgau.c(991): Rows: 512, Columns: 6138
    INFO: s2_semi_mgau.c(1023): Using memory-mapped I/O for senones
    INFO: s2_semi_mgau.c(1294): Maximum top-N: 4 Top-N beams: 0 0 0
    INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 4145 * 32 bytes (129 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.dic
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(336): 40 words read
    INFO: dict.c(358): Reading filler dictionary: /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/noisedict
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(361): 9 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 46^3 * 2 bytes (190 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 51152 bytes (49 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 51152 bytes (49 KiB) for single-phone word triphones
    INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
    INFO: ngram_model_dmp.c(166): Will use memory-mapped I/O for LM file
    INFO: ngram_model_dmp.c(220): ngrams 1=42, 2=80, 3=40
    INFO: ngram_model_dmp.c(266): 42 = LM.unigrams(+trailer) read
    INFO: ngram_model_dmp.c(312): 80 = LM.bigrams(+trailer) read
    INFO: ngram_model_dmp.c(338): 40 = LM.trigrams read
    INFO: ngram_model_dmp.c(363): 5 = LM.prob2 entries read
    INFO: ngram_model_dmp.c(383): 3 = LM.bo_wt2 entries read
    INFO: ngram_model_dmp.c(403): 3 = LM.prob3 entries read
    INFO: ngram_model_dmp.c(431): 1 = LM.tseg_base entries read
    INFO: ngram_model_dmp.c(487): 42 = ascii word strings read
    INFO: ngram_search_fwdtree.c(99): 1 unique initial diphones
    INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 49 single-phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 49 single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 134
    INFO: ngram_search_fwdtree.c(339): after: 1 root, 6 non-root channels, 48 single-phone words
    INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
    2015-12-29 22:04:57.707 VoiceCommandTest[586:238100] Listening.
    2015-12-29 22:04:57.708 VoiceCommandTest[586:238100] Project has these words or phrases in its dictionary:
    ___REJ_ZH
    ___REJ_Z
    ___REJ_Y
    ___REJ_W
    ___REJ_V
    ___REJ_UW
    ___REJ_UH
    ___REJ_TH
    ___REJ_T
    ___REJ_SH
    ___REJ_S
    ___REJ_R
    ___REJ_P
    ___REJ_OY
    ___REJ_OW
    ___REJ_NG
    ___REJ_N
    ___REJ_M
    ___REJ_L
    ___REJ_K
    ___REJ_JH
    ___REJ_IY
    ___REJ_IH
    ___REJ_HH
    ___REJ_G
    ___REJ_F
    ___REJ_EY
    ___REJ_ER
    ___REJ_EH
    ___REJ_DH
    ___REJ_D
    …and 10 more.
    2015-12-29 22:04:57.709 VoiceCommandTest[586:238100] Recognition loop has started
    2015-12-29 22:04:58.287 VoiceCommandTest[586:238100] Speech detected…
    2015-12-29 22:04:58.288 VoiceCommandTest[586:238100] Pocketsphinx heard ” ” with a score of (-4684) and an utterance ID of 0.
    2015-12-29 22:04:58.288 VoiceCommandTest[586:238100] Hypothesis was null so we aren’t returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController’s property returnNullHypotheses to TRUE before starting OEPocketsphinxController.
    2015-12-29 22:04:58.749 VoiceCommandTest[586:238209] Pocketsphinx heard ” ” with a score of (-12048) and an utterance ID of 1.
    2015-12-29 22:04:59.244 VoiceCommandTest[586:238098] Pocketsphinx heard “HEYNECKSAHR” with a score of (-20524) and an utterance ID of 2.
    INFO: ngram_search.c(463): Resized backpointer table to 10000 entries
    2015-12-29 22:04:59.735 VoiceCommandTest[586:238098] Pocketsphinx heard “HEYNECKSAHR” with a score of (-28153) and an utterance ID of 3.
    2015-12-29 22:05:00.183 VoiceCommandTest[586:238100] Pocketsphinx heard “HEYNECKSAHR” with a score of (-32965) and an utterance ID of 4.
    2015-12-29 22:05:00.615 VoiceCommandTest[586:238100] End of speech detected…
    INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
    INFO: cmn_prior.c(149): cmn_prior_update: to < 59.48 12.16 -3.65 4.21 -11.22 -13.39 -19.59 3.30 -5.11 2.89 17.80 -4.60 -5.12 >
    INFO: ngram_search_fwdtree.c(1553): 7314 words recognized (30/fr)
    INFO: ngram_search_fwdtree.c(1555): 37297 senones evaluated (153/fr)
    INFO: ngram_search_fwdtree.c(1559): 13178 channels searched (54/fr), 240 1st, 12134 last
    INFO: ngram_search_fwdtree.c(1562): 11232 words for which last channels evaluated (46/fr)
    INFO: ngram_search_fwdtree.c(1564): 33 candidate words for entering last phone (0/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 0.30 CPU 0.123 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 2.43 wall 0.995 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 42 words
    INFO: ngram_search_fwdflat.c(948): 5550 words recognized (23/fr)
    INFO: ngram_search_fwdflat.c(950): 28035 senones evaluated (115/fr)
    INFO: ngram_search_fwdflat.c(952): 10161 channels searched (41/fr)
    INFO: ngram_search_fwdflat.c(954): 8612 words searched (35/fr)
    INFO: ngram_search_fwdflat.c(957): 5748 word transitions (23/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.06 CPU 0.025 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.06 wall 0.026 xRT
    INFO: ngram_search.c(1280): lattice start node <s>.0 end node </s>.186
    INFO: ngram_search.c(1306): Eliminated 5 nodes before end node
    INFO: ngram_search.c(1411): Lattice has 2442 nodes, 61543 links
    INFO: ps_lattice.c(1380): Bestpath score: -30077
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:186:242) = -1668390
    INFO: ps_lattice.c(1441): Joint P(O,S) = -1819560 P(S|O) = -151170
    INFO: ngram_search.c(899): bestpath 0.41 CPU 0.169 xRT
    INFO: ngram_search.c(902): bestpath 0.41 wall 0.170 xRT
    2015-12-29 22:05:01.095 VoiceCommandTest[586:238100] Pocketsphinx heard ” ” with a score of (-35071) and an utterance ID of 5.
    2015-12-29 22:05:01.095 VoiceCommandTest[586:238100] Hypothesis was null so we aren’t returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController’s property returnNullHypotheses to TRUE before starting OEPocketsphinxController.

    #1027644
    Halle Winkler
    Politepix

    Have anything else up your sleeve that might be useful?

    Not sure, but I would start by adding the entry I showed above to the file LanguageModelGeneratorLookupList.text in the acoustic model in the alphabetically-correct position so that the pronunciation isn’t generated by the fallback method and it is the correct pronunciation. Make sure the word and pronunciation are separated by a tab and not any spaces.

    #1027645
    doron
    Participant

    Thanks Halle, will give that a try.
    Anything I should be doing to cause the modified LanguageModelGeneratorLookupList.text to be picked up and used?
    I modified it, but I see a different pronunciation in the generated dictionary file (*.dic file).

    #1027646
    Halle Winkler
    Politepix

    You’re welcome. Do you also see the new entry in your LanguageModelGeneratorLookupList.text that is in your app bundle? If it is in there, it is in the correct alphabetical position, and matches the other entries as far as there being a tab between the uppercase word and the uppercase pronunciation (where each part of the pronunciation is separated by exactly one space) it ought to be picked up with no issue. If you’re using the sample app build and the acoustic model is being rebuilt each time, make sure you’re making your original change to the copy of the text file that is in the acoustic model project. If you aren’t, you should be able to directly edit the file that is in the acoustic model bundle that has been added to your app.

    Keep in mind that there can’t be a space between “HEY” and “NEXAR”, they have to be one word in the lookup list, i.e.:

    HEYNE	HH EY N
    HEYNEXAR	HH EY N EH K S AA R
    HEYS	HH EY Z
    
Viewing 6 posts - 1 through 6 (of 6 total)
  • You must be logged in to reply to this topic.