reduce false positivse

Tagged: rapidears falsepositives rejecto

This topic has 5 replies, 2 voices, and was last updated 8 years, 3 months ago by Halle Winkler.

Viewing 6 posts - 1 through 6 (of 6 total)

Advertisement: “RuleORama is an OpenEars™ plugin that lets you create rules-based grammars for fixed phrase recognition, fast enough for RapidEars!”

Author

Posts
December 28, 2015 at 11:25 am #1027632

doron
Participant

Hi,
I’ve a dictionary with a single made-up-word (Nexar – the company name) detected successfully by OpenEars. My problem is a very large amount of false positives. A test file of 10 minutes of rapid american radio talk yielded 18 false positives. Can you provide me any pointers in trying to reduce those?

Some additional information:
1) I’m using Rejecto with RapidEars
2) Playing with parameters, I found that vad of 2.5 and rejection weight of 1.085 gave me the best results so far.
3) Aside from the rejection tokens, my dictionary file consists of a single entry: NECKSAHR N EH K S AA R

pasting below logs of a short false-positive detection:

/Users/user/Library/Caches/AppCode33/DerivedData/VoiceCommandTest-8c9721f1/Build/Products/Debug-iphonesimulator/VoiceCommandTest.app
Simulator session started with process 72311
2015-12-28 12:22:46.147 VoiceCommandTest[72311:6341951] Starting OpenEars logging for OpenEars version 2.041 on 64-bit device (or build): iPhone running iOS version: 9.200000
2015-12-28 12:22:46.177 VoiceCommandTest[72311:6341951] The word NECKSAHR was not found in the dictionary /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/LanguageModelGeneratorLookupList.text/LanguageModelGeneratorLookupList.text.
2015-12-28 12:22:46.177 VoiceCommandTest[72311:6341951] Now using the fallback method to look up the word NECKSAHR
2015-12-28 12:22:46.177 VoiceCommandTest[72311:6341951] If this is happening more frequently than you would expect, the most likely cause for it is since you are using the English phonetic lookup dictionary is that your words are not in English or aren’t dictionary words.
2015-12-28 12:22:46.177 VoiceCommandTest[72311:6341951] Using convertGraphemes for the word or phrase NECKSAHR which doesn’t appear in the dictionary
2015-12-28 12:22:46.182 VoiceCommandTest[72311:6341951] I’m done running performDictionaryLookup and it took 0.023127 seconds
2015-12-28 12:22:46.183 VoiceCommandTest[72311:6341951] I’m done running performDictionaryLookup and it took 0.025443 seconds
2015-12-28 12:22:46.183 VoiceCommandTest[72311:6341951] Starting dynamic language model generation
wfreq2vocab : Done.
text2idngram
Vocab : /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.vocab
Output idngram : /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.idngram
N-gram buffer size : 10
Hash table size : 5000
Temp directory : /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/cmuclmtk-KcWuFB
Max open files : 20
FOF size : 10
n : 3
Initialising hash table…
Reading vocabulary…
Allocating memory for the n-gram buffer…
Reading text into the n-gram buffer…
20,000 n-grams processed for each “.”, 1,000,000 for each line.

Sorting n-grams…
Writing sorted n-grams to temporary file /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/cmuclmtk-KcWuFB/1
Merging 1 temporary files…

## Vocab generated by v2 of the CMU-Cambridge Statistcal
## Language Modeling toolkit.
##
2-grams occurring: N times > N times Sug. -spec_num value
0 81 91
1 80 1 11
## Includes 42 words ##
2 0 1 11
3 0 1 11
4 0 1 11
5 0 1 11
6 0 1 11
7 0 1 11
8 0 1 11
9 0 1 11
10 0 1 11

3-grams occurring: N times > N times Sug. -spec_num value
0 120 131
1 120 0 10
2 0 0 10
3 0 0 10
4 0 0 10
5 0 0 10
6 0 0 10
7 0 0 10
8 0 0 10
9 0 0 10
10 0 0 10
text2idngram : Done.
read_wlist_into_siht: a list of 42 words was read from “/Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.vocab”.
read_wlist_into_array: a list of 42 words was read from “/Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.vocab”.

Unigram was renormalized to absorb a mass of 0.5
prob[UNK] = 1e-99
ARPA-style 3-gram will be written to /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.arpa
idngram2lm : Done.
INFO: cmd_ln.c(703): Parsing command line:
sphinx_lm_convert \
-i /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.arpa \
-o /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.DMP \
-debug 10

Current configuration:
[NAME] [DEFLT] [VALUE]
-case
-debug 10
-help no no
-i /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.arpa
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.DMP
-ofmt

INFO: ngram_model_arpa.c(503): ngrams 1=42, 2=80, 3=40
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(542): 42 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
INFO: ngram_model_arpa.c(560): 80 = #bigrams created
INFO: ngram_model_arpa.c(561): 3 = #prob2 entries
INFO: ngram_model_arpa.c(569): 3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
INFO: ngram_model_arpa.c(582): 40 = #trigrams created
INFO: ngram_model_arpa.c(583): 2 = #prob3 entries
INFO: ngram_model_dmp.c(518): Building DMP model…
INFO: ngram_model_dmp.c(548): 42 = #unigrams created
INFO: ngram_model_dmp.c(649): 80 = #bigrams created
INFO: ngram_model_dmp.c(650): 3 = #prob2 entries
INFO: ngram_model_dmp.c(657): 3 = #bo_wt2 entries
INFO: ngram_model_dmp.c(661): 40 = #trigrams created
INFO: ngram_model_dmp.c(662): 2 = #prob3 entries
2015-12-28 12:22:46.191 VoiceCommandTest[72311:6341951] Done creating language model with CMUCLMTK in 0.007703 seconds.
INFO: cmd_ln.c(703): Parsing command line:
sphinx_lm_convert \
-i /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.arpa \
-o /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.DMP \
-debug 10

Current configuration:
[NAME] [DEFLT] [VALUE]
-case
-debug 10
-help no no
-i /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.arpa
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.DMP
-ofmt

INFO: ngram_model_arpa.c(503): ngrams 1=42, 2=80, 3=40
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(542): 42 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
INFO: ngram_model_arpa.c(560): 80 = #bigrams created
INFO: ngram_model_arpa.c(561): 5 = #prob2 entries
INFO: ngram_model_arpa.c(569): 3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
INFO: ngram_model_arpa.c(582): 40 = #trigrams created
INFO: ngram_model_arpa.c(583): 3 = #prob3 entries
INFO: ngram_model_dmp.c(518): Building DMP model…
INFO: ngram_model_dmp.c(548): 42 = #unigrams created
INFO: ngram_model_dmp.c(649): 80 = #bigrams created
INFO: ngram_model_dmp.c(650): 5 = #prob2 entries
INFO: ngram_model_dmp.c(657): 3 = #bo_wt2 entries
INFO: ngram_model_dmp.c(661): 40 = #trigrams created
INFO: ngram_model_dmp.c(662): 3 = #prob3 entries
2015-12-28 12:22:46.199 VoiceCommandTest[72311:6341951] I’m done running dynamic language model generation and it took 0.047171 seconds
2015-12-28 12:22:46.200 VoiceCommandTest[72311:6341951] [[OEPocketsphinxController sharedInstance] stopListening] was called while listening was not in progress. This is not necessarily an exception, just a notification that that a request to stop a listening session was ignored because there was no active listening session to stop.
2015-12-28 12:22:46.201 VoiceCommandTest[72311:6341951] User gave mic permission for this app.
2015-12-28 12:22:46.201 VoiceCommandTest[72311:6341951] setSecondsOfSilence wasn’t set, using default of 0.700000.
2015-12-28 12:22:46.201 VoiceCommandTest[72311:6342018] Starting listening.
2015-12-28 12:22:46.201 VoiceCommandTest[72311:6342018] about to set up audio session
2015-12-28 12:22:46.202 VoiceCommandTest[72311:6342018] Creating audio session with default settings.
2015-12-28 12:22:46.347 VoiceCommandTest[72311:6342018] done starting audio unit
INFO: cmd_ln.c(703): Parsing command line:
\
-lm /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.DMP \
-vad_prespeech 10 \
-vad_postspeech 69 \
-vad_threshold 3.000000 \
-remove_noise yes \
-remove_silence yes \
-bestpath yes \
-lw 6.500000 \
-dict /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.dic \
-hmm /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle

Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-01
-argfile
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle
-input_endian little little
-jsgf
-keyphrase
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-01
-kws_threshold 1 1.000000e+00
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 0
-lm /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.DMP
-lmctl
-lmname
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-10 1.000000e-10
-pl_pip 1.0 1.000000e+00
-pl_weight 3.0 3.000000e+00
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-uw 1.0 1.000000e+00
-vad_postspeech 50 69
-vad_prespeech 20 10
-vad_startspeech 10 10
-vad_threshold 2.0 3.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02

INFO: cmd_ln.c(703): Parsing command line:
\
-nfilt 25 \
-lowerf 130 \
-upperf 6800 \
-feat 1s_c_d_dd \
-svspec 0-12/13-25/26-38 \
-agc none \
-cmn current \
-varnorm no \
-transform dct \
-lifter 22 \
-cmninit 40

Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 40
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 22
-logspec no no
-lowerf 133.33334 1.300000e+02
-ncep 13 13
-nfft 512 512
-nfilt 40 25
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec 0-12/13-25/26-38
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 6.800000e+03
-vad_postspeech 50 69
-vad_prespeech 20 10
-vad_startspeech 10 10
-vad_threshold 2.0 3.000000e+00
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.562500e-02

INFO: acmod.c(252): Parsed model-specific feature parameters from /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/feat.params
INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’current’, VARNORM=’no’, AGC=’none’
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(171): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/mdef
INFO: bin_mdef.c(516): 46 CI-phone, 168344 CD-phone, 3 emitstate/phone, 138 CI-sen, 6138 Sen, 32881 Sen-Seq
INFO: tmat.c(206): Reading HMM transition probability matrices: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/transition_matrices
INFO: acmod.c(124): Attempting to use PTM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: ptm_mgau.c(805): Number of codebooks doesn’t match number of ciphones, doesn’t look like PTM: 1 != 46
INFO: acmod.c(126): Attempting to use semi-continuous computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(904): Loading senones from dump file /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/sendump
INFO: s2_semi_mgau.c(928): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(991): Rows: 512, Columns: 6138
INFO: s2_semi_mgau.c(1023): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1294): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 4145 * 32 bytes (129 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Data/Application/D74EFDBB-76D9-41AF-B1DE-7D5573E67A8F/Library/Caches/NexarLanguageModel_1.dic
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(336): 40 words read
INFO: dict.c(358): Reading filler dictionary: /Users/user/Library/Developer/CoreSimulator/Devices/082255CD-48F6-415A-98AE-A789F208A0A3/data/Containers/Bundle/Application/522BD74C-30D7-48F9-86A8-40300AE76F8F/VoiceCommandTest.app/AcousticModelEnglish.bundle/noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 9 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 46^3 * 2 bytes (190 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 51152 bytes (49 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 51152 bytes (49 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(166): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(220): ngrams 1=42, 2=80, 3=40
INFO: ngram_model_dmp.c(266): 42 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(312): 80 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(338): 40 = LM.trigrams read
INFO: ngram_model_dmp.c(363): 5 = LM.prob2 entries read
INFO: ngram_model_dmp.c(383): 3 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(403): 3 = LM.prob3 entries read
INFO: ngram_model_dmp.c(431): 1 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(487): 42 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 1 unique initial diphones
INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 49 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 49 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 132
INFO: ngram_search_fwdtree.c(339): after: 1 root, 4 non-root channels, 48 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
2015-12-28 12:22:46.386 VoiceCommandTest[72311:6342018] Listening.
2015-12-28 12:22:46.386 VoiceCommandTest[72311:6342018] Project has these words or phrases in its dictionary:
___REJ_ZH
___REJ_Z
___REJ_Y
___REJ_W
___REJ_V
___REJ_UW
___REJ_UH
___REJ_TH
___REJ_T
___REJ_SH
___REJ_S
___REJ_R
___REJ_P
___REJ_OY
___REJ_OW
___REJ_NG
___REJ_N
___REJ_M
___REJ_L
___REJ_K
___REJ_JH
___REJ_IY
___REJ_IH
___REJ_HH
___REJ_G
___REJ_F
___REJ_EY
___REJ_ER
___REJ_EH
___REJ_DH
___REJ_D
…and 10 more.
2015-12-28 12:22:46.386 VoiceCommandTest[72311:6342018] Recognition loop has started
2015-12-28 12:22:46.999 VoiceCommandTest[72311:6342018] Speech detected…
2015-12-28 12:22:46.999 VoiceCommandTest[72311:6342017] Pocketsphinx heard ” ” with a score of (-7525) and an utterance ID of 0.
2015-12-28 12:22:46.999 VoiceCommandTest[72311:6342017] Hypothesis was null so we aren’t returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController’s property returnNullHypotheses to TRUE before starting OEPocketsphinxController.
2015-12-28 12:22:47.468 VoiceCommandTest[72311:6342018] Pocketsphinx heard “NECKSAHR” with a score of (-15012) and an utterance ID of 1.
INFO: ngram_search.c(463): Resized backpointer table to 10000 entries
2015-12-28 12:22:47.963 VoiceCommandTest[72311:6342017] Pocketsphinx heard “NECKSAHR” with a score of (-22823) and an utterance ID of 2.
2015-12-28 12:22:48.461 VoiceCommandTest[72311:6342017] Pocketsphinx heard “NECKSAHR” with a score of (-30895) and an utterance ID of 3.
2015-12-28 12:22:48.948 VoiceCommandTest[72311:6342017] Pocketsphinx heard “NECKSAHR” with a score of (-35850) and an utterance ID of 4.
2015-12-28 12:22:49.451 VoiceCommandTest[72311:6342017] End of speech detected…
INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 58.35 12.25 -3.21 4.82 -11.31 -13.14 -17.84 3.35 -3.89 3.06 16.72 -4.16 -4.84 >
INFO: ngram_search_fwdtree.c(1553): 7933 words recognized (30/fr)
INFO: ngram_search_fwdtree.c(1555): 40252 senones evaluated (153/fr)
INFO: ngram_search_fwdtree.c(1559): 14421 channels searched (54/fr), 259 1st, 13544 last
INFO: ngram_search_fwdtree.c(1562): 12090 words for which last channels evaluated (45/fr)
INFO: ngram_search_fwdtree.c(1564): 49 candidate words for entering last phone (0/fr)
INFO: ngram_search_fwdtree.c(1567): fwdtree 0.08 CPU 0.032 xRT
INFO: ngram_search_fwdtree.c(1570): fwdtree 2.50 wall 0.950 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 42 words
INFO: ngram_search_fwdflat.c(948): 6133 words recognized (23/fr)
INFO: ngram_search_fwdflat.c(950): 34144 senones evaluated (130/fr)
INFO: ngram_search_fwdflat.c(952): 12777 channels searched (48/fr)
INFO: ngram_search_fwdflat.c(954): 9961 words searched (37/fr)
INFO: ngram_search_fwdflat.c(957): 6854 word transitions (26/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.02 CPU 0.007 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.02 wall 0.007 xRT
INFO: ngram_search.c(1280): lattice start node <s>.0 end node </s>.197
INFO: ngram_search.c(1306): Eliminated 5 nodes before end node
INFO: ngram_search.c(1411): Lattice has 2731 nodes, 64952 links
INFO: ps_lattice.c(1380): Bestpath score: -31208
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:197:261) = -1802783
INFO: ps_lattice.c(1441): Joint P(O,S) = -1989268 P(S|O) = -186485
INFO: ngram_search.c(899): bestpath 0.17 CPU 0.066 xRT
INFO: ngram_search.c(902): bestpath 0.17 wall 0.065 xRT
2015-12-28 12:22:49.640 VoiceCommandTest[72311:6342017] Pocketsphinx heard “NECKSAHR” with a score of (-37522) and an utterance ID of 5.

December 29, 2015 at 12:45 pm #1027637

Halle Winkler
Politepix

Welcome,

The logs look a little bit unusual, have changes been made to the library? I can’t help with the issue under request because it is being tested with the simulator, unfortunately – can you test on a real device and see if you still have the same problem? A good piece of general advice is to make your keyword that you want to spot a little bit longer/more unique, e.g. instead of:

NEXAR N EH K S AA R

You could try something like:

YONEXAR Y OW N EH K S AA R

and the user can say “Yo, Nexar” (replacing with your preferred address for your product). Two syllables might be a bit too brief.

December 29, 2015 at 9:11 pm #1027643

doron
Participant

Hi Halle,
Though switching to “Hey, Nexar” did reduce the false-positives rate, I’m trying to find a way to reduce them even more. Have anything else up your sleeve that might be useful?

Pasting newly generated logs, run on a device this time. Did not make any changes to the library, I wonder why the logs are different.

/private/var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/VoiceCommandTest
2015-12-29 22:04:56.598 VoiceCommandTest[586:238056] Starting OpenEars logging for OpenEars version 2.041 on 64-bit device (or build): iPhone running iOS version: 9.100000
2015-12-29 22:04:56.785 VoiceCommandTest[586:238056] The word HEYNECKSAHR was not found in the dictionary /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/LanguageModelGeneratorLookupList.text/LanguageModelGeneratorLookupList.text.
2015-12-29 22:04:56.785 VoiceCommandTest[586:238056] Now using the fallback method to look up the word HEYNECKSAHR
2015-12-29 22:04:56.785 VoiceCommandTest[586:238056] If this is happening more frequently than you would expect, the most likely cause for it is since you are using the English phonetic lookup dictionary is that your words are not in English or aren’t dictionary words.
2015-12-29 22:04:56.786 VoiceCommandTest[586:238056] Using convertGraphemes for the word or phrase HEYNECKSAHR which doesn’t appear in the dictionary
2015-12-29 22:04:56.803 VoiceCommandTest[586:238056] I’m done running performDictionaryLookup and it took 0.066977 seconds
2015-12-29 22:04:56.804 VoiceCommandTest[586:238056] I’m done running performDictionaryLookup and it took 0.079497 seconds
2015-12-29 22:04:56.817 VoiceCommandTest[586:238056] Starting dynamic language model generation
## Vocab generated by v2 of the CMU-Cambridge Statistcal
## Language Modeling toolkit.
##
## Includes 42 words ##
wfreq2vocab : Done.
text2idngram
Vocab : /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.vocab
Output idngram : /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.idngram
N-gram buffer size : 10
Hash table size : 5000
Temp directory : /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/cmuclmtk-NxiDCy
Max open files : 20
FOF size : 10
n : 3
Initialising hash table…
Reading vocabulary…
Allocating memory for the n-gram buffer…
Reading text into the n-gram buffer…
20,000 n-grams processed for each “.”, 1,000,000 for each line.

Sorting n-grams…
Writing sorted n-grams to temporary file /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/cmuclmtk-NxiDCy/1
Merging 1 temporary files…

2-grams occurring: N times > N times Sug. -spec_num value
0 81 91
1 80 1 11
2 0 1 11
3 0 1 11
4 0 1 11
5 0 1 11
6 0 1 11
7 0 1 11
8 0 1 11
9 0 1 11
10 0 1 11

3-grams occurring: N times > N times Sug. -spec_num value
0 120 131
1 120 0 10
2 0 0 10
3 0 0 10
4 0 0 10
5 0 0 10
6 0 0 10
7 0 0 10
8 0 0 10
9 0 0 10
10 0 0 10
text2idngram : Done.

read_wlist_into_siht: a list of 42 words was read from “/var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.vocab”.
read_wlist_into_array: a list of 42 words was read from “/var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.vocab”.
Unigram was renormalized to absorb a mass of 0.5
prob[UNK] = 1e-99
ARPA-style 3-gram will be written to /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.arpa
idngram2lm : Done.
INFO: cmd_ln.c(703): Parsing command line:
sphinx_lm_convert \
-i /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.arpa \
-o /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.DMP \
-debug 10

Current configuration:
[NAME] [DEFLT] [VALUE]
-case
-debug 10
-help no no
-i /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.arpa
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.DMP
-ofmt

INFO: ngram_model_arpa.c(503): ngrams 1=42, 2=80, 3=40
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(542): 42 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
INFO: ngram_model_arpa.c(560): 80 = #bigrams created
INFO: ngram_model_arpa.c(561): 3 = #prob2 entries
INFO: ngram_model_arpa.c(569): 3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
INFO: ngram_model_arpa.c(582): 40 = #trigrams created
INFO: ngram_model_arpa.c(583): 2 = #prob3 entries
INFO: ngram_model_dmp.c(518): Building DMP model…
INFO: ngram_model_dmp.c(548): 42 = #unigrams created
INFO: ngram_model_dmp.c(649): 80 = #bigrams created
INFO: ngram_model_dmp.c(650): 3 = #prob2 entries
INFO: ngram_model_dmp.c(657): 3 = #bo_wt2 entries
INFO: ngram_model_dmp.c(661): 40 = #trigrams created
INFO: ngram_model_dmp.c(662): 2 = #prob3 entries
2015-12-29 22:04:56.896 VoiceCommandTest[586:238056] Done creating language model with CMUCLMTK in 0.079343 seconds.
INFO: cmd_ln.c(703): Parsing command line:
sphinx_lm_convert \
-i /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.arpa \
-o /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.DMP \
-debug 10

Current configuration:
[NAME] [DEFLT] [VALUE]
-case
-debug 10
-help no no
-i /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.arpa
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.DMP
-ofmt

INFO: ngram_model_arpa.c(503): ngrams 1=42, 2=80, 3=40
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(542): 42 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
INFO: ngram_model_arpa.c(560): 80 = #bigrams created
INFO: ngram_model_arpa.c(561): 5 = #prob2 entries
INFO: ngram_model_arpa.c(569): 3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
INFO: ngram_model_arpa.c(582): 40 = #trigrams created
INFO: ngram_model_arpa.c(583): 3 = #prob3 entries
INFO: ngram_model_dmp.c(518): Building DMP model…
INFO: ngram_model_dmp.c(548): 42 = #unigrams created
INFO: ngram_model_dmp.c(649): 80 = #bigrams created
INFO: ngram_model_dmp.c(650): 5 = #prob2 entries
INFO: ngram_model_dmp.c(657): 3 = #bo_wt2 entries
INFO: ngram_model_dmp.c(661): 40 = #trigrams created
INFO: ngram_model_dmp.c(662): 3 = #prob3 entries
2015-12-29 22:04:56.929 VoiceCommandTest[586:238056] I’m done running dynamic language model generation and it took 0.296972 seconds
2015-12-29 22:04:56.930 VoiceCommandTest[586:238056] [[OEPocketsphinxController sharedInstance] stopListening] was called while listening was not in progress. This is not necessarily an exception, just a notification that that a request to stop a listening session was ignored because there was no active listening session to stop.
2015-12-29 22:04:56.937 VoiceCommandTest[586:238056] User gave mic permission for this app.
2015-12-29 22:04:56.938 VoiceCommandTest[586:238056] setSecondsOfSilence wasn’t set, using default of 0.700000.
2015-12-29 22:04:56.938 VoiceCommandTest[586:238100] Starting listening.
2015-12-29 22:04:56.939 VoiceCommandTest[586:238100] about to set up audio session
2015-12-29 22:04:56.942 VoiceCommandTest[586:238100] Creating audio session with default settings.
2015-12-29 22:04:57.517 VoiceCommandTest[586:238115] Audio route has changed for the following reason:
2015-12-29 22:04:57.519 VoiceCommandTest[586:238115] There was a category change. The new category is AVAudioSessionCategoryPlayAndRecord
2015-12-29 22:04:57.526 VoiceCommandTest[586:238115] This is not a case in which OpenEars notifies of a route change. At the close of this function, the new audio route is —SpeakerMicrophoneBuiltIn—. The previous route before changing to this route was <AVAudioSessionRouteDescription: 0x1475a4ec0,
inputs = (null);
outputs = (
“<AVAudioSessionPortDescription: 0x1475a4ef0, type = Speaker; name = Speaker; UID = Speaker; selectedDataSource = (null)>”
)>.
2015-12-29 22:04:57.598 VoiceCommandTest[586:238100] done starting audio unit
INFO: cmd_ln.c(703): Parsing command line:
\
-lm /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.DMP \
-vad_prespeech 10 \
-vad_postspeech 69 \
-vad_threshold 2.500000 \
-remove_noise yes \
-remove_silence yes \
-bestpath yes \
-lw 6.500000 \
-dict /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.dic \
-hmm /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle

Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-01
-argfile
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle
-input_endian little little
-jsgf
-keyphrase
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-01
-kws_threshold 1 1.000000e+00
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 0
-lm /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.DMP
-lmctl
-lmname
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-10 1.000000e-10
-pl_pip 1.0 1.000000e+00
-pl_weight 3.0 3.000000e+00
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-uw 1.0 1.000000e+00
-vad_postspeech 50 69
-vad_prespeech 20 10
-vad_startspeech 10 10
-vad_threshold 2.0 2.500000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02

INFO: cmd_ln.c(703): Parsing command line:
\
-nfilt 25 \
-lowerf 130 \
-upperf 6800 \
-feat 1s_c_d_dd \
-svspec 0-12/13-25/26-38 \
-agc none \
-cmn current \
-varnorm no \
-transform dct \
-lifter 22 \
-cmninit 40

Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 40
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 22
-logspec no no
-lowerf 133.33334 1.300000e+02
-ncep 13 13
-nfft 512 512
-nfilt 40 25
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec 0-12/13-25/26-38
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 6.800000e+03
-vad_postspeech 50 69
-vad_prespeech 20 10
-vad_startspeech 10 10
-vad_threshold 2.0 2.500000e+00
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.562500e-02

INFO: acmod.c(252): Parsed model-specific feature parameters from /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/feat.params
INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’current’, VARNORM=’no’, AGC=’none’
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(171): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/mdef
INFO: bin_mdef.c(516): 46 CI-phone, 168344 CD-phone, 3 emitstate/phone, 138 CI-sen, 6138 Sen, 32881 Sen-Seq
INFO: tmat.c(206): Reading HMM transition probability matrices: /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/transition_matrices
INFO: acmod.c(124): Attempting to use PTM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: ptm_mgau.c(805): Number of codebooks doesn’t match number of ciphones, doesn’t look like PTM: 1 != 46
INFO: acmod.c(126): Attempting to use semi-continuous computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(904): Loading senones from dump file /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/sendump
INFO: s2_semi_mgau.c(928): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(991): Rows: 512, Columns: 6138
INFO: s2_semi_mgau.c(1023): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1294): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 4145 * 32 bytes (129 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /var/mobile/Containers/Data/Application/5CCD0147-A992-44E8-B452-E77466EB39DA/Library/Caches/NexarLanguageModel_1.dic
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(336): 40 words read
INFO: dict.c(358): Reading filler dictionary: /var/mobile/Containers/Bundle/Application/9330DE76-900F-4AA2-A3F9-132CF807C6B8/VoiceCommandTest.app/AcousticModelEnglish.bundle/noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 9 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 46^3 * 2 bytes (190 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 51152 bytes (49 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 51152 bytes (49 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(166): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(220): ngrams 1=42, 2=80, 3=40
INFO: ngram_model_dmp.c(266): 42 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(312): 80 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(338): 40 = LM.trigrams read
INFO: ngram_model_dmp.c(363): 5 = LM.prob2 entries read
INFO: ngram_model_dmp.c(383): 3 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(403): 3 = LM.prob3 entries read
INFO: ngram_model_dmp.c(431): 1 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(487): 42 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 1 unique initial diphones
INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 49 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 49 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 134
INFO: ngram_search_fwdtree.c(339): after: 1 root, 6 non-root channels, 48 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
2015-12-29 22:04:57.707 VoiceCommandTest[586:238100] Listening.
2015-12-29 22:04:57.708 VoiceCommandTest[586:238100] Project has these words or phrases in its dictionary:
___REJ_ZH
___REJ_Z
___REJ_Y
___REJ_W
___REJ_V
___REJ_UW
___REJ_UH
___REJ_TH
___REJ_T
___REJ_SH
___REJ_S
___REJ_R
___REJ_P
___REJ_OY
___REJ_OW
___REJ_NG
___REJ_N
___REJ_M
___REJ_L
___REJ_K
___REJ_JH
___REJ_IY
___REJ_IH
___REJ_HH
___REJ_G
___REJ_F
___REJ_EY
___REJ_ER
___REJ_EH
___REJ_DH
___REJ_D
…and 10 more.
2015-12-29 22:04:57.709 VoiceCommandTest[586:238100] Recognition loop has started
2015-12-29 22:04:58.287 VoiceCommandTest[586:238100] Speech detected…
2015-12-29 22:04:58.288 VoiceCommandTest[586:238100] Pocketsphinx heard ” ” with a score of (-4684) and an utterance ID of 0.
2015-12-29 22:04:58.288 VoiceCommandTest[586:238100] Hypothesis was null so we aren’t returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController’s property returnNullHypotheses to TRUE before starting OEPocketsphinxController.
2015-12-29 22:04:58.749 VoiceCommandTest[586:238209] Pocketsphinx heard ” ” with a score of (-12048) and an utterance ID of 1.
2015-12-29 22:04:59.244 VoiceCommandTest[586:238098] Pocketsphinx heard “HEYNECKSAHR” with a score of (-20524) and an utterance ID of 2.
INFO: ngram_search.c(463): Resized backpointer table to 10000 entries
2015-12-29 22:04:59.735 VoiceCommandTest[586:238098] Pocketsphinx heard “HEYNECKSAHR” with a score of (-28153) and an utterance ID of 3.
2015-12-29 22:05:00.183 VoiceCommandTest[586:238100] Pocketsphinx heard “HEYNECKSAHR” with a score of (-32965) and an utterance ID of 4.
2015-12-29 22:05:00.615 VoiceCommandTest[586:238100] End of speech detected…
INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 59.48 12.16 -3.65 4.21 -11.22 -13.39 -19.59 3.30 -5.11 2.89 17.80 -4.60 -5.12 >
INFO: ngram_search_fwdtree.c(1553): 7314 words recognized (30/fr)
INFO: ngram_search_fwdtree.c(1555): 37297 senones evaluated (153/fr)
INFO: ngram_search_fwdtree.c(1559): 13178 channels searched (54/fr), 240 1st, 12134 last
INFO: ngram_search_fwdtree.c(1562): 11232 words for which last channels evaluated (46/fr)
INFO: ngram_search_fwdtree.c(1564): 33 candidate words for entering last phone (0/fr)
INFO: ngram_search_fwdtree.c(1567): fwdtree 0.30 CPU 0.123 xRT
INFO: ngram_search_fwdtree.c(1570): fwdtree 2.43 wall 0.995 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 42 words
INFO: ngram_search_fwdflat.c(948): 5550 words recognized (23/fr)
INFO: ngram_search_fwdflat.c(950): 28035 senones evaluated (115/fr)
INFO: ngram_search_fwdflat.c(952): 10161 channels searched (41/fr)
INFO: ngram_search_fwdflat.c(954): 8612 words searched (35/fr)
INFO: ngram_search_fwdflat.c(957): 5748 word transitions (23/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.06 CPU 0.025 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.06 wall 0.026 xRT
INFO: ngram_search.c(1280): lattice start node <s>.0 end node </s>.186
INFO: ngram_search.c(1306): Eliminated 5 nodes before end node
INFO: ngram_search.c(1411): Lattice has 2442 nodes, 61543 links
INFO: ps_lattice.c(1380): Bestpath score: -30077
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:186:242) = -1668390
INFO: ps_lattice.c(1441): Joint P(O,S) = -1819560 P(S|O) = -151170
INFO: ngram_search.c(899): bestpath 0.41 CPU 0.169 xRT
INFO: ngram_search.c(902): bestpath 0.41 wall 0.170 xRT
2015-12-29 22:05:01.095 VoiceCommandTest[586:238100] Pocketsphinx heard ” ” with a score of (-35071) and an utterance ID of 5.
2015-12-29 22:05:01.095 VoiceCommandTest[586:238100] Hypothesis was null so we aren’t returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController’s property returnNullHypotheses to TRUE before starting OEPocketsphinxController.

December 29, 2015 at 9:17 pm #1027644

Halle Winkler
Politepix

Have anything else up your sleeve that might be useful?

Not sure, but I would start by adding the entry I showed above to the file LanguageModelGeneratorLookupList.text in the acoustic model in the alphabetically-correct position so that the pronunciation isn’t generated by the fallback method and it is the correct pronunciation. Make sure the word and pronunciation are separated by a tab and not any spaces.

December 29, 2015 at 10:31 pm #1027645

doron
Participant

Thanks Halle, will give that a try.
Anything I should be doing to cause the modified LanguageModelGeneratorLookupList.text to be picked up and used?
I modified it, but I see a different pronunciation in the generated dictionary file (*.dic file).

December 29, 2015 at 10:41 pm #1027646
Halle Winkler
Politepix
You’re welcome. Do you also see the new entry in your LanguageModelGeneratorLookupList.text that is in your app bundle? If it is in there, it is in the correct alphabetical position, and matches the other entries as far as there being a tab between the uppercase word and the uppercase pronunciation (where each part of the pronunciation is separated by exactly one space) it ought to be picked up with no issue. If you’re using the sample app build and the acoustic model is being rebuilt each time, make sure you’re making your original change to the copy of the text file that is in the acoustic model project. If you aren’t, you should be able to directly edit the file that is in the acoustic model bundle that has been added to your app.

Keep in mind that there can’t be a space between “HEY” and “NEXAR”, they have to be one word in the lookup list, i.e.:
```
HEYNE	HH EY N
HEYNEXAR	HH EY N EH K S AA R
HEYS	HH EY Z
```
Author

Posts

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.