Home › Forums › OpenEars › openEars detect the word without speaking it › Reply To: openEars detect the word without speaking it
January 20, 2015 at 11:52 am
#1024454
maheshvaghela
Participant
Ok, I have updated version to 2.03 and also uncomment that two lines.
This is a log information.
2015-01-20 16:12:19.509 OpenEarsSampleApp[5469:2180532] Starting OpenEars logging for OpenEars version 2.03 on 32-bit device (or build): iPad running iOS version: 8.000000
2015-01-20 16:12:19.512 OpenEarsSampleApp[5469:2180532] Creating shared instance of OEPocketsphinxController
2015-01-20 16:12:19.669 OpenEarsSampleApp[5469:2180532] Starting dynamic language model generation
INFO: cmd_ln.c(702): Parsing command line:
sphinx_lm_convert \
-i /var/mobile/Containers/Data/Application/6A690B2A-2799-46D9-BF15-9037FDC56EF7/Library/Caches/FirstOpenEarsDynamicLanguageModel.arpa \
-o /var/mobile/Containers/Data/Application/6A690B2A-2799-46D9-BF15-9037FDC56EF7/Library/Caches/FirstOpenEarsDynamicLanguageModel.DMP
Current configuration:
[NAME] [DEFLT] [VALUE]
-case
-debug 0
-help no no
-i /var/mobile/Containers/Data/Application/6A690B2A-2799-46D9-BF15-9037FDC56EF7/Library/Caches/FirstOpenEarsDynamicLanguageModel.arpa
-ienc
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o /var/mobile/Containers/Data/Application/6A690B2A-2799-46D9-BF15-9037FDC56EF7/Library/Caches/FirstOpenEarsDynamicLanguageModel.DMP
-oenc utf8 utf8
-ofmt
INFO: ngram_model_arpa.c(504): ngrams 1=22, 2=46, 3=32
INFO: ngram_model_arpa.c(137): Reading unigrams
INFO: ngram_model_arpa.c(543): 22 = #unigrams created
INFO: ngram_model_arpa.c(197): Reading bigrams
INFO: ngram_model_arpa.c(561): 46 = #bigrams created
INFO: ngram_model_arpa.c(562): 6 = #prob2 entries
INFO: ngram_model_arpa.c(570): 5 = #bo_wt2 entries
INFO: ngram_model_arpa.c(294): Reading trigrams
INFO: ngram_model_arpa.c(583): 32 = #trigrams created
INFO: ngram_model_arpa.c(584): 3 = #prob3 entries
INFO: ngram_model_dmp.c(518): Building DMP model...
INFO: ngram_model_dmp.c(548): 22 = #unigrams created
INFO: ngram_model_dmp.c(649): 46 = #bigrams created
INFO: ngram_model_dmp.c(650): 6 = #prob2 entries
INFO: ngram_model_dmp.c(657): 5 = #bo_wt2 entries
INFO: ngram_model_dmp.c(661): 32 = #trigrams created
INFO: ngram_model_dmp.c(662): 3 = #prob3 entries
2015-01-20 16:12:19.745 OpenEarsSampleApp[5469:2180532] Done creating language model with CMUCLMTK in 0.075357 seconds.
2015-01-20 16:12:19.912 OpenEarsSampleApp[5469:2180532] I'm done running performDictionaryLookup and it took 0.134176 seconds
2015-01-20 16:12:19.922 OpenEarsSampleApp[5469:2180532] I'm done running dynamic language model generation and it took 0.304282 seconds
2015-01-20 16:12:19.929 OpenEarsSampleApp[5469:2180532] Starting dynamic language model generation
INFO: cmd_ln.c(702): Parsing command line:
sphinx_lm_convert \
-i /var/mobile/Containers/Data/Application/6A690B2A-2799-46D9-BF15-9037FDC56EF7/Library/Caches/SecondOpenEarsDynamicLanguageModel.arpa \
-o /var/mobile/Containers/Data/Application/6A690B2A-2799-46D9-BF15-9037FDC56EF7/Library/Caches/SecondOpenEarsDynamicLanguageModel.DMP
Current configuration:
[NAME] [DEFLT] [VALUE]
-case
-debug 0
-help no no
-i /var/mobile/Containers/Data/Application/6A690B2A-2799-46D9-BF15-9037FDC56EF7/Library/Caches/SecondOpenEarsDynamicLanguageModel.arpa
-ienc
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o /var/mobile/Containers/Data/Application/6A690B2A-2799-46D9-BF15-9037FDC56EF7/Library/Caches/SecondOpenEarsDynamicLanguageModel.DMP
-oenc utf8 utf8
-ofmt
INFO: ngram_model_arpa.c(504): ngrams 1=12, 2=19, 3=10
INFO: ngram_model_arpa.c(137): Reading unigrams
INFO: ngram_model_arpa.c(543): 12 = #unigrams created
INFO: ngram_model_arpa.c(197): Reading bigrams
INFO: ngram_model_arpa.c(561): 19 = #bigrams created
INFO: ngram_model_arpa.c(562): 3 = #prob2 entries
INFO: ngram_model_arpa.c(570): 3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(294): Reading trigrams
INFO: ngram_model_arpa.c(583): 10 = #trigrams created
INFO: ngram_model_arpa.c(584): 2 = #prob3 entries
INFO: ngram_model_dmp.c(518): Building DMP model...
INFO: ngram_model_dmp.c(548): 12 = #unigrams created
INFO: ngram_model_dmp.c(649): 19 = #bigrams created
INFO: ngram_model_dmp.c(650): 3 = #prob2 entries
INFO: ngram_model_dmp.c(657): 3 = #bo_wt2 entries
INFO: ngram_model_dmp.c(661): 10 = #trigrams created
INFO: ngram_model_dmp.c(662): 2 = #prob3 entries
2015-01-20 16:12:20.005 OpenEarsSampleApp[5469:2180532] Done creating language model with CMUCLMTK in 0.074512 seconds.
2015-01-20 16:12:20.178 OpenEarsSampleApp[5469:2180532] The word QUIDNUNC was not found in the dictionary /private/var/mobile/Containers/Bundle/Application/03075EFA-2B35-4613-BBF0-2E3BD17986E8/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/LanguageModelGeneratorLookupList.text/LanguageModelGeneratorLookupList.text.
2015-01-20 16:12:20.179 OpenEarsSampleApp[5469:2180532] Now using the fallback method to look up the word QUIDNUNC
2015-01-20 16:12:20.179 OpenEarsSampleApp[5469:2180532] If this is happening more frequently than you would expect, the most likely cause for it is since you are using the English phonetic lookup dictionary is that your words are not in English or aren't dictionary words, or that you are submitting the words in lowercase when they need to be entirely written in uppercase. This can also happen if you submit words with punctuation attached – consider removing punctuation from language models or grammars you create before submitting them.
2015-01-20 16:12:20.181 OpenEarsSampleApp[5469:2180532] Using convertGraphemes for the word or phrase QUIDNUNC which doesn't appear in the dictionary
2015-01-20 16:12:20.239 OpenEarsSampleApp[5469:2180532] I'm done running performDictionaryLookup and it took 0.204421 seconds
2015-01-20 16:12:20.249 OpenEarsSampleApp[5469:2180532] I'm done running dynamic language model generation and it took 0.325570 seconds
2015-01-20 16:12:20.250 OpenEarsSampleApp[5469:2180532] Attempting to start listening session from startListeningWithLanguageModelAtPath:
2015-01-20 16:12:20.259 OpenEarsSampleApp[5469:2180532] User gave mic permission for this app.
2015-01-20 16:12:20.260 OpenEarsSampleApp[5469:2180532] Valid setSecondsOfSilence value of 0.200000 will be used.
2015-01-20 16:12:20.262 OpenEarsSampleApp[5469:2180532] Successfully started listening session from startListeningWithLanguageModelAtPath:
2015-01-20 16:12:20.262 OpenEarsSampleApp[5469:2180540] Starting listening.
2015-01-20 16:12:20.263 OpenEarsSampleApp[5469:2180540] about to set up audio session
2015-01-20 16:12:20.340 OpenEarsSampleApp[5469:2180549] Audio route has changed for the following reason:
2015-01-20 16:12:20.355 OpenEarsSampleApp[5469:2180549] There was a category change. The new category is AVAudioSessionCategoryPlayAndRecord
2015-01-20 16:12:20.724 OpenEarsSampleApp[5469:2180540] done starting audio unit
INFO: cmd_ln.c(702): Parsing command line:
\
-lm /var/mobile/Containers/Data/Application/6A690B2A-2799-46D9-BF15-9037FDC56EF7/Library/Caches/FirstOpenEarsDynamicLanguageModel.DMP \
-vad_prespeech 10 \
-vad_postspeech 20 \
-vad_threshold 3.000000 \
-remove_noise yes \
-remove_silence yes \
-bestpath yes \
-lw 6.500000 \
-dict /var/mobile/Containers/Data/Application/6A690B2A-2799-46D9-BF15-9037FDC56EF7/Library/Caches/FirstOpenEarsDynamicLanguageModel.dic \
-hmm /private/var/mobile/Containers/Bundle/Application/03075EFA-2B35-4613-BBF0-2E3BD17986E8/OpenEarsSampleApp.app/AcousticModelEnglish.bundle
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-01
-argfile
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /var/mobile/Containers/Data/Application/6A690B2A-2799-46D9-BF15-9037FDC56EF7/Library/Caches/FirstOpenEarsDynamicLanguageModel.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /private/var/mobile/Containers/Bundle/Application/03075EFA-2B35-4613-BBF0-2E3BD17986E8/OpenEarsSampleApp.app/AcousticModelEnglish.bundle
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-keyphrase
-kws
-kws_plp 1e-1 1.000000e-01
-kws_threshold 1 1.000000e+00
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm /var/mobile/Containers/Data/Application/6A690B2A-2799-46D9-BF15-9037FDC56EF7/Library/Caches/FirstOpenEarsDynamicLanguageModel.DMP
-lmctl
-lmname
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf 10000 10000
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-vad_postspeech 50 20
-vad_prespeech 10 10
-vad_threshold 2.0 3.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
2015-01-20 16:12:20.731 OpenEarsSampleApp[5469:2180549] This is not a case in which OpenEars notifies of a route change. At the close of this function, the new audio route is ---SpeakerMicrophoneBuiltIn---. The previous route before changing to this route was <AVAudioSessionRouteDescription: 0x17e6a8e0,
inputs = (null);
outputs = (
"<AVAudioSessionPortDescription: 0x17e69120, type = Speaker; name = Speaker; UID = Built-In Speaker; selectedDataSource = (null)>"
)>.
INFO: cmd_ln.c(702): Parsing command line:
\
-nfilt 25 \
-lowerf 130 \
-upperf 6800 \
-feat 1s_c_d_dd \
-svspec 0-12/13-25/26-38 \
-agc none \
-cmn current \
-varnorm no \
-transform dct \
-lifter 22 \
-cmninit 40
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 40
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 22
-logspec no no
-lowerf 133.33334 1.300000e+02
-ncep 13 13
-nfft 512 512
-nfilt 40 25
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec 0-12/13-25/26-38
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 6.800000e+03
-vad_postspeech 50 20
-vad_prespeech 10 10
-vad_threshold 2.0 3.000000e+00
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.562500e-02
INFO: acmod.c(252): Parsed model-specific feature parameters from /private/var/mobile/Containers/Bundle/Application/03075EFA-2B35-4613-BBF0-2E3BD17986E8/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/feat.params
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(171): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: /private/var/mobile/Containers/Bundle/Application/03075EFA-2B35-4613-BBF0-2E3BD17986E8/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /private/var/mobile/Containers/Bundle/Application/03075EFA-2B35-4613-BBF0-2E3BD17986E8/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/mdef
INFO: bin_mdef.c(516): 46 CI-phone, 168344 CD-phone, 3 emitstate/phone, 138 CI-sen, 6138 Sen, 32881 Sen-Seq
INFO: tmat.c(206): Reading HMM transition probability matrices: /private/var/mobile/Containers/Bundle/Application/03075EFA-2B35-4613-BBF0-2E3BD17986E8/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/transition_matrices
INFO: acmod.c(124): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /private/var/mobile/Containers/Bundle/Application/03075EFA-2B35-4613-BBF0-2E3BD17986E8/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 512x13
INFO: ms_gauden.c(294): 512x13
INFO: ms_gauden.c(294): 512x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /private/var/mobile/Containers/Bundle/Application/03075EFA-2B35-4613-BBF0-2E3BD17986E8/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 512x13
INFO: ms_gauden.c(294): 512x13
INFO: ms_gauden.c(294): 512x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(904): Loading senones from dump file /private/var/mobile/Containers/Bundle/Application/03075EFA-2B35-4613-BBF0-2E3BD17986E8/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/sendump
INFO: s2_semi_mgau.c(928): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(991): Rows: 512, Columns: 6138
INFO: s2_semi_mgau.c(1023): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1294): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(320): Allocating 4128 * 20 bytes (80 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /var/mobile/Containers/Data/Application/6A690B2A-2799-46D9-BF15-9037FDC56EF7/Library/Caches/FirstOpenEarsDynamicLanguageModel.dic
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(336): 23 words read
INFO: dict.c(342): Reading filler dictionary: /private/var/mobile/Containers/Bundle/Application/03075EFA-2B35-4613-BBF0-2E3BD17986E8/OpenEarsSampleApp.app/AcousticModelEnglish.bundle/noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(345): 9 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 46^3 * 2 bytes (190 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 25576 bytes (24 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 25576 bytes (24 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(79): No \data\ mark in LM file
INFO: ngram_model_dmp.c(166): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(220): ngrams 1=22, 2=46, 3=32
INFO: ngram_model_dmp.c(266): 22 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(312): 46 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(338): 32 = LM.trigrams read
INFO: ngram_model_dmp.c(363): 6 = LM.prob2 entries read
INFO: ngram_model_dmp.c(383): 5 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(403): 3 = LM.prob3 entries read
INFO: ngram_model_dmp.c(431): 1 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(487): 22 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 16 unique initial diphones
INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 10 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 10 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 177
INFO: ngram_search_fwdtree.c(339): after: 16 root, 49 non-root channels, 9 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
2015-01-20 16:12:20.903 OpenEarsSampleApp[5469:2180540] There is no CMN plist so we are using the fresh CMN value 42.000000.
2015-01-20 16:12:20.904 OpenEarsSampleApp[5469:2180540] Listening.
2015-01-20 16:12:20.906 OpenEarsSampleApp[5469:2180540] Project has these words or phrases in its dictionary:
EIGHT
EIGHTEEN
ELEVEN
ELEVEN(2)
FIFTEEN
FIVE
FOUR
FOURTEEN
NINE
NINETEEN
ONE
ONE(2)
SEVEN
SEVENTEEN
SIX
SIXTEEN
TEN
THIRTEEN
THREE
TWELVE
TWENTY
TWENTY(2)
TWO
2015-01-20 16:12:20.907 OpenEarsSampleApp[5469:2180540] Recognition loop has started
2015-01-20 16:12:48.623 OpenEarsSampleApp[5469:2180544] Speech detected...
2015-01-20 16:12:48.881 OpenEarsSampleApp[5469:2180544] End of speech detected...
INFO: cmn_prior.c(131): cmn_prior_update: from < 42.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 64.65 9.76 3.14 5.08 -10.27 13.92 -8.86 0.30 -4.35 -7.43 7.40 -6.46 -4.60 >
INFO: ngram_search_fwdtree.c(1550): 385 words recognized (10/fr)
INFO: ngram_search_fwdtree.c(1552): 8544 senones evaluated (219/fr)
INFO: ngram_search_fwdtree.c(1556): 4700 channels searched (120/fr), 560 1st, 3013 last
INFO: ngram_search_fwdtree.c(1559): 431 words for which last channels evaluated (11/fr)
INFO: ngram_search_fwdtree.c(1561): 334 candidate words for entering last phone (8/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 1.45 CPU 3.724 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 27.81 wall 71.296 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 6 words
INFO: ngram_search_fwdflat.c(938): 298 words recognized (8/fr)
INFO: ngram_search_fwdflat.c(940): 8125 senones evaluated (208/fr)
INFO: ngram_search_fwdflat.c(942): 5526 channels searched (141/fr)
INFO: ngram_search_fwdflat.c(944): 453 words searched (11/fr)
INFO: ngram_search_fwdflat.c(947): 172 word transitions (4/fr)
INFO: ngram_search_fwdflat.c(950): fwdflat 0.02 CPU 0.043 xRT
INFO: ngram_search_fwdflat.c(953): fwdflat 0.02 wall 0.056 xRT
INFO: ngram_search.c(1215): </s> not found in last frame, using TWO.37 instead
INFO: ngram_search.c(1268): lattice start node <s>.0 end node TWO.2
INFO: ngram_search.c(1294): Eliminated 54 nodes before end node
INFO: ngram_search.c(1399): Lattice has 63 nodes, 1 links
INFO: ps_lattice.c(1368): Normalizer P(O) = alpha(TWO:2:37) = -356386
INFO: ps_lattice.c(1403): Joint P(O,S) = -356386 P(S|O) = 0
INFO: ngram_search.c(890): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(893): bestpath 0.00 wall 0.001 xRT
2015-01-20 16:12:48.906 OpenEarsSampleApp[5469:2180544] Pocketsphinx heard "TWO" with a score of (0) and an utterance ID of 0.
2015-01-20 16:12:48.908 OpenEarsSampleApp[5469:2180532] Flite sending interrupt speech request.
2015-01-20 16:12:48.910 OpenEarsSampleApp[5469:2180532] I'm running flite
2015-01-20 16:12:49.110 OpenEarsSampleApp[5469:2180532] I'm done running flite and it took 0.198875 seconds
2015-01-20 16:12:49.111 OpenEarsSampleApp[5469:2180532] Flite audio player was nil when referenced so attempting to allocate a new audio player.
2015-01-20 16:12:49.112 OpenEarsSampleApp[5469:2180532] Loading speech data for Flite concluded successfully.
2015-01-20 16:12:49.146 OpenEarsSampleApp[5469:2180532] Flite sending suspend recognition notification.
2015-01-20 16:12:50.355 OpenEarsSampleApp[5469:2180532] AVAudioPlayer did finish playing with success flag of 1
2015-01-20 16:12:50.508 OpenEarsSampleApp[5469:2180532] Flite sending resume recognition notification.
2015-01-20 16:12:51.016 OpenEarsSampleApp[5469:2180532] Valid setSecondsOfSilence value of 0.200000 will be used.
INFO: cmn_prior.c(131): cmn_prior_update: from < 64.65 9.76 3.14 5.08 -10.27 13.92 -8.86 0.30 -4.35 -7.43 7.40 -6.46 -4.60 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 64.65 9.76 3.14 5.08 -10.27 13.92 -8.86 0.30 -4.35 -7.43 7.40 -6.46 -4.60 >
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 0 words
2015-01-20 16:13:05.254 OpenEarsSampleApp[5469:2180540] Speech detected...
2015-01-20 16:13:05.494 OpenEarsSampleApp[5469:2180540] End of speech detected...
INFO: cmn_prior.c(131): cmn_prior_update: from < 64.65 9.76 3.14 5.08 -10.27 13.92 -8.86 0.30 -4.35 -7.43 7.40 -6.46 -4.60 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 69.03 2.31 -5.25 0.93 -12.55 10.68 -7.56 -3.24 -14.34 -1.36 8.05 -6.54 -1.84 >
INFO: ngram_search_fwdtree.c(1550): 228 words recognized (8/fr)
INFO: ngram_search_fwdtree.c(1552): 4294 senones evaluated (148/fr)
INFO: ngram_search_fwdtree.c(1556): 1923 channels searched (66/fr), 400 1st, 909 last
INFO: ngram_search_fwdtree.c(1559): 256 words for which last channels evaluated (8/fr)
INFO: ngram_search_fwdtree.c(1561): 143 candidate words for entering last phone (4/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 0.75 CPU 2.584 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 14.33 wall 49.420 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 3 words
INFO: ngram_search_fwdflat.c(938): 186 words recognized (6/fr)
INFO: ngram_search_fwdflat.c(940): 3059 senones evaluated (105/fr)
INFO: ngram_search_fwdflat.c(942): 1589 channels searched (54/fr)
INFO: ngram_search_fwdflat.c(944): 255 words searched (8/fr)
INFO: ngram_search_fwdflat.c(947): 82 word transitions (2/fr)
INFO: ngram_search_fwdflat.c(950): fwdflat 0.01 CPU 0.045 xRT
INFO: ngram_search_fwdflat.c(953): fwdflat 0.01 wall 0.040 xRT
INFO: ngram_search.c(1215): </s> not found in last frame, using [COUGH].27 instead
INFO: ngram_search.c(1268): lattice start node <s>.0 end node [COUGH].2
INFO: ngram_search.c(1294): Eliminated 62 nodes before end node
INFO: ngram_search.c(1399): Lattice has 66 nodes, 1 links
INFO: ps_lattice.c(1368): Normalizer P(O) = alpha([COUGH]:2:27) = -4795305
INFO: ps_lattice.c(1403): Joint P(O,S) = -4795306 P(S|O) = -1
INFO: ngram_search.c(890): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(893): bestpath 0.00 wall 0.001 xRT
2015-01-20 16:13:05.508 OpenEarsSampleApp[5469:2180540] Pocketsphinx heard "" with a score of (-1) and an utterance ID of 1.
2015-01-20 16:13:05.509 OpenEarsSampleApp[5469:2180540] Hypothesis was null so we aren't returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController's property returnNullHypotheses to TRUE before starting OEPocketsphinxController.
2015-01-20 16:13:11.532 OpenEarsSampleApp[5469:2180544] Speech detected...
2015-01-20 16:13:12.026 OpenEarsSampleApp[5469:2180544] End of speech detected...
INFO: cmn_prior.c(131): cmn_prior_update: from < 69.03 2.31 -5.25 0.93 -12.55 10.68 -7.56 -3.24 -14.34 -1.36 8.05 -6.54 -1.84 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 66.26 8.59 -4.91 -2.19 -13.70 11.91 -10.44 -1.99 -5.42 0.23 6.81 -2.35 -2.18 >
INFO: ngram_search_fwdtree.c(1550): 800 words recognized (12/fr)
INFO: ngram_search_fwdtree.c(1552): 18873 senones evaluated (282/fr)
INFO: ngram_search_fwdtree.c(1556): 10567 channels searched (157/fr), 1008 1st, 7260 last
INFO: ngram_search_fwdtree.c(1559): 878 words for which last channels evaluated (13/fr)
INFO: ngram_search_fwdtree.c(1561): 594 candidate words for entering last phone (8/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 0.38 CPU 0.571 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 6.52 wall 9.732 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 10 words
INFO: ngram_search_fwdflat.c(938): 642 words recognized (10/fr)
INFO: ngram_search_fwdflat.c(940): 21996 senones evaluated (328/fr)
INFO: ngram_search_fwdflat.c(942): 15153 channels searched (226/fr)
INFO: ngram_search_fwdflat.c(944): 1039 words searched (15/fr)
INFO: ngram_search_fwdflat.c(947): 426 word transitions (6/fr)
INFO: ngram_search_fwdflat.c(950): fwdflat 0.03 CPU 0.051 xRT
INFO: ngram_search_fwdflat.c(953): fwdflat 0.03 wall 0.051 xRT
INFO: ngram_search.c(1215): </s> not found in last frame, using <sil>.65 instead
INFO: ngram_search.c(1268): lattice start node <s>.0 end node <sil>.54
INFO: ngram_search.c(1294): Eliminated 13 nodes before end node
INFO: ngram_search.c(1399): Lattice has 115 nodes, 683 links
INFO: ps_lattice.c(1368): Normalizer P(O) = alpha(<sil>:54:65) = -486219
INFO: ps_lattice.c(1403): Joint P(O,S) = -486443 P(S|O) = -224
INFO: ngram_search.c(890): bestpath 0.00 CPU 0.004 xRT
INFO: ngram_search.c(893): bestpath 0.00 wall 0.006 xRT
2015-01-20 16:13:12.069 OpenEarsSampleApp[5469:2180544] Pocketsphinx heard "FOUR" with a score of (-224) and an utterance ID of 2.
2015-01-20 16:13:12.070 OpenEarsSampleApp[5469:2180532] Flite sending interrupt speech request.
2015-01-20 16:13:12.073 OpenEarsSampleApp[5469:2180532] I'm running flite
2015-01-20 16:13:12.264 OpenEarsSampleApp[5469:2180532] I'm done running flite and it took 0.190912 seconds
2015-01-20 16:13:12.265 OpenEarsSampleApp[5469:2180532] Flite audio player was nil when referenced so attempting to allocate a new audio player.
2015-01-20 16:13:12.266 OpenEarsSampleApp[5469:2180532] Loading speech data for Flite concluded successfully.
2015-01-20 16:13:12.367 OpenEarsSampleApp[5469:2180532] Flite sending suspend recognition notification.
2015-01-20 16:13:13.575 OpenEarsSampleApp[5469:2180532] AVAudioPlayer did finish playing with success flag of 1
2015-01-20 16:13:13.727 OpenEarsSampleApp[5469:2180532] Flite sending resume recognition notification.
2015-01-20 16:13:14.235 OpenEarsSampleApp[5469:2180532] Valid setSecondsOfSilence value of 0.200000 will be used.
INFO: cmn_prior.c(131): cmn_prior_update: from < 66.26 8.59 -4.91 -2.19 -13.70 11.91 -10.44 -1.99 -5.42 0.23 6.81 -2.35 -2.18 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 66.26 8.59 -4.91 -2.19 -13.70 11.91 -10.44 -1.99 -5.42 0.23 6.81 -2.35 -2.18 >
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 0 words
2015-01-20 16:13:25.601 OpenEarsSampleApp[5469:2180544] Speech detected...
2015-01-20 16:13:26.249 OpenEarsSampleApp[5469:2180544] End of speech detected...
INFO: cmn_prior.c(131): cmn_prior_update: from < 66.26 8.59 -4.91 -2.19 -13.70 11.91 -10.44 -1.99 -5.42 0.23 6.81 -2.35 -2.18 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 64.91 9.58 -3.64 -3.85 -14.94 12.37 -11.15 -0.83 -3.95 1.28 5.64 -0.12 -2.27 >
INFO: ngram_search_fwdtree.c(1550): 728 words recognized (11/fr)
INFO: ngram_search_fwdtree.c(1552): 16222 senones evaluated (239/fr)
INFO: ngram_search_fwdtree.c(1556): 8497 channels searched (124/fr), 1024 1st, 5287 last
INFO: ngram_search_fwdtree.c(1559): 812 words for which last channels evaluated (11/fr)
INFO: ngram_search_fwdtree.c(1561): 533 candidate words for entering last phone (7/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 0.65 CPU 0.954 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 11.91 wall 17.514 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 8 words
INFO: ngram_search_fwdflat.c(938): 544 words recognized (8/fr)
INFO: ngram_search_fwdflat.c(940): 17484 senones evaluated (257/fr)
INFO: ngram_search_fwdflat.c(942): 11455 channels searched (168/fr)
INFO: ngram_search_fwdflat.c(944): 933 words searched (13/fr)
INFO: ngram_search_fwdflat.c(947): 315 word transitions (4/fr)
INFO: ngram_search_fwdflat.c(950): fwdflat 0.03 CPU 0.045 xRT
INFO: ngram_search_fwdflat.c(953): fwdflat 0.03 wall 0.047 xRT
INFO: ngram_search.c(1215): </s> not found in last frame, using <sil>.66 instead
INFO: ngram_search.c(1268): lattice start node <s>.0 end node <sil>.21
INFO: ngram_search.c(1294): Eliminated 103 nodes before end node
INFO: ngram_search.c(1399): Lattice has 152 nodes, 171 links
INFO: ps_lattice.c(1368): Normalizer P(O) = alpha(<sil>:21:66) = -589713
INFO: ps_lattice.c(1403): Joint P(O,S) = -589713 P(S|O) = 0
INFO: ngram_search.c(890): bestpath 0.01 CPU 0.008 xRT
INFO: ngram_search.c(893): bestpath 0.00 wall 0.001 xRT
2015-01-20 16:13:26.284 OpenEarsSampleApp[5469:2180544] Pocketsphinx heard "ONE" with a score of (0) and an utterance ID of 3.
2015-01-20 16:13:26.285 OpenEarsSampleApp[5469:2180532] Flite sending interrupt speech request.
2015-01-20 16:13:26.287 OpenEarsSampleApp[5469:2180532] I'm running flite
2015-01-20 16:13:26.488 OpenEarsSampleApp[5469:2180532] I'm done running flite and it took 0.199754 seconds
2015-01-20 16:13:26.489 OpenEarsSampleApp[5469:2180532] Flite audio player was nil when referenced so attempting to allocate a new audio player.
2015-01-20 16:13:26.489 OpenEarsSampleApp[5469:2180532] Loading speech data for Flite concluded successfully.
2015-01-20 16:13:26.579 OpenEarsSampleApp[5469:2180532] Flite sending suspend recognition notification.
2015-01-20 16:13:27.786 OpenEarsSampleApp[5469:2180532] AVAudioPlayer did finish playing with success flag of 1
2015-01-20 16:13:27.938 OpenEarsSampleApp[5469:2180532] Flite sending resume recognition notification.
2015-01-20 16:13:28.446 OpenEarsSampleApp[5469:2180532] Valid setSecondsOfSilence value of 0.200000 will be used.
INFO: cmn_prior.c(131): cmn_prior_update: from < 64.91 9.58 -3.64 -3.85 -14.94 12.37 -11.15 -0.83 -3.95 1.28 5.64 -0.12 -2.27 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 64.91 9.58 -3.64 -3.85 -14.94 12.37 -11.15 -0.83 -3.95 1.28 5.64 -0.12 -2.27 >
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 0 words
and this is a viewController.m code
#import "ViewController.h"
#import <OpenEars/OEPocketsphinxController.h>
#import <OpenEars/OEFliteController.h>
#import <OpenEars/OELanguageModelGenerator.h>
#import <OpenEars/OELogging.h>
#import <OpenEars/OEAcousticModel.h>
#import <Slt/Slt.h>
@interface ViewController()
// UI actions, not specifically related to OpenEars other than the fact that they invoke OpenEars methods.
- (IBAction) stopButtonAction;
- (IBAction) startButtonAction;
- (IBAction) suspendListeningButtonAction;
- (IBAction) resumeListeningButtonAction;
// Example for reading out the input audio levels without locking the UI using an NSTimer
- (void) startDisplayingLevels;
- (void) stopDisplayingLevels;
// These three are the important OpenEars objects that this class demonstrates the use of.
@property (nonatomic, strong) Slt *slt;
@property (nonatomic, strong) OEEventsObserver *openEarsEventsObserver;
@property (nonatomic, strong) OEPocketsphinxController *pocketsphinxController;
@property (nonatomic, strong) OEFliteController *fliteController;
// Some UI, not specifically related to OpenEars.
@property (nonatomic, strong) IBOutlet UIButton *stopButton;
@property (nonatomic, strong) IBOutlet UIButton *startButton;
@property (nonatomic, strong) IBOutlet UIButton *suspendListeningButton;
@property (nonatomic, strong) IBOutlet UIButton *resumeListeningButton;
@property (nonatomic, strong) IBOutlet UITextView *statusTextView;
@property (nonatomic, strong) IBOutlet UITextView *heardTextView;
@property (nonatomic, strong) IBOutlet UILabel *pocketsphinxDbLabel;
@property (nonatomic, strong) IBOutlet UILabel *fliteDbLabel;
@property (nonatomic, assign) BOOL usingStartingLanguageModel;
@property (nonatomic, assign) int restartAttemptsDueToPermissionRequests;
@property (nonatomic, assign) BOOL startupFailedDueToLackOfPermissions;
// Things which help us show off the dynamic language features.
@property (nonatomic, copy) NSString *pathToFirstDynamicallyGeneratedLanguageModel;
@property (nonatomic, copy) NSString *pathToFirstDynamicallyGeneratedDictionary;
@property (nonatomic, copy) NSString *pathToSecondDynamicallyGeneratedLanguageModel;
@property (nonatomic, copy) NSString *pathToSecondDynamicallyGeneratedDictionary;
// Our NSTimer that will help us read and display the input and output levels without locking the UI
@property (nonatomic, strong) NSTimer *uiUpdateTimer;
@end
@implementation ViewController
#define kLevelUpdatesPerSecond 18 // We'll have the ui update 18 times a second to show some fluidity without hitting the CPU too hard.
//#define kGetNbest // Uncomment this if you want to try out nbest
#pragma mark -
#pragma mark Memory Management
- (void)dealloc {
[self stopDisplayingLevels];
}
#pragma mark - int to word converter
-(NSString *)GetWordOfInteger:(int)anInt
{
NSNumber *numberValue = [NSNumber numberWithInt:anInt]; //needs to be NSNumber!
NSNumberFormatter *numberFormatter = [[NSNumberFormatter alloc] init];
[numberFormatter setNumberStyle:NSNumberFormatterSpellOutStyle];
NSString *wordNumber = [numberFormatter stringFromNumber:numberValue];
wordNumber = [[wordNumber stringByReplacingOccurrencesOfString:@"-" withString:@" "] uppercaseString];
////NSLog(@"Answer: %@", wordNumber);
return wordNumber;
}
#pragma mark View Lifecycle
- (void)viewDidLoad {
[super viewDidLoad];
self.fliteController = [[OEFliteController alloc] init];
self.openEarsEventsObserver = [[OEEventsObserver alloc] init];
self.openEarsEventsObserver.delegate = self;
self.slt = [[Slt alloc] init];
self.restartAttemptsDueToPermissionRequests = 0;
self.startupFailedDueToLackOfPermissions = FALSE;
[OELogging startOpenEarsLogging]; // Uncomment me for OELogging, which is verbose logging about internal OpenEars operations such as audio settings. If you have issues, show this logging in the forums.
[OEPocketsphinxController sharedInstance].verbosePocketSphinx = TRUE; // Uncomment this for much more verbose speech recognition engine output. If you have issues, show this logging in the forums.
[[OEPocketsphinxController sharedInstance] setSecondsOfSilenceToDetect:0.2];
[[OEPocketsphinxController sharedInstance] setVadThreshold:3.0];
[self.openEarsEventsObserver setDelegate:self]; // Make this class the delegate of OpenEarsObserver so we can get all of the messages about what OpenEars is doing.
[[OEPocketsphinxController sharedInstance] setActive:TRUE error:nil]; // Call this before setting any OEPocketsphinxController characteristics
// This is the language model we're going to start up with. The only reason I'm making it a class property is that I reuse it a bunch of times in this example,
// but you can pass the string contents directly to OEPocketsphinxController:startListeningWithLanguageModelAtPath:dictionaryAtPath:languageModelIsJSGF:
NSMutableArray *numberArray = [[NSMutableArray alloc] init];
for(int i=1;i<=26;i++)
{
[numberArray addObject:[self GetWordOfInteger:i]];
}
NSArray *firstLanguageArray = [[NSArray alloc] initWithArray:numberArray];
/* NSArray *firstLanguageArray = @[@"BACKWARD",
@"CHANGE",
@"FORWARD",
@"GO",
@"LEFT",
@"MODEL",
@"RIGHT",
@"TURN"];*/
OELanguageModelGenerator *languageModelGenerator = [[OELanguageModelGenerator alloc] init];
// languageModelGenerator.verboseLanguageModelGenerator = TRUE; // Uncomment me for verbose language model generator debug output.
NSError *error = [languageModelGenerator generateLanguageModelFromArray:firstLanguageArray withFilesNamed:@"FirstOpenEarsDynamicLanguageModel" forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"]]; // Change "AcousticModelEnglish" to "AcousticModelSpanish" in order to create a language model for Spanish recognition instead of English.
if(error) {
//NSLog(@"Dynamic language generator reported error %@", [error description]);
} else {
self.pathToFirstDynamicallyGeneratedLanguageModel = [languageModelGenerator pathToSuccessfullyGeneratedLanguageModelWithRequestedName:@"FirstOpenEarsDynamicLanguageModel"];
self.pathToFirstDynamicallyGeneratedDictionary = [languageModelGenerator pathToSuccessfullyGeneratedDictionaryWithRequestedName:@"FirstOpenEarsDynamicLanguageModel"];
}
self.usingStartingLanguageModel = TRUE; // This is not an OpenEars thing, this is just so I can switch back and forth between the two models in this sample app.
// Here is an example of dynamically creating an in-app grammar.
// We want it to be able to response to the speech "CHANGE MODEL" and a few other things. Items we want to have recognized as a whole phrase (like "CHANGE MODEL")
// we put into the array as one string (e.g. "CHANGE MODEL" instead of "CHANGE" and "MODEL"). This increases the probability that they will be recognized as a phrase. This works even better starting with version 1.0 of OpenEars.
NSArray *secondLanguageArray = @[@"SUNDAY",
@"MONDAY",
@"TUESDAY",
@"WEDNESDAY",
@"THURSDAY",
@"FRIDAY",
@"SATURDAY",
@"QUIDNUNC",
@"CHANGE MODEL"];
// The last entry, quidnunc, is an example of a word which will not be found in the lookup dictionary and will be passed to the fallback method. The fallback method is slower,
// so, for instance, creating a new language model from dictionary words will be pretty fast, but a model that has a lot of unusual names in it or invented/rare/recent-slang
// words will be slower to generate. You can use this information to give your users good UI feedback about what the expectations for wait times should be.
// I don't think it's beneficial to lazily instantiate OELanguageModelGenerator because you only need to give it a single message and then release it.
// If you need to create a very large model or any size of model that has many unusual words that have to make use of the fallback generation method,
// you will want to run this on a background thread so you can give the user some UI feedback that the task is in progress.
// generateLanguageModelFromArray:withFilesNamed returns an NSError which will either have a value of noErr if everything went fine or a specific error if it didn't.
error = [languageModelGenerator generateLanguageModelFromArray:secondLanguageArray withFilesNamed:@"SecondOpenEarsDynamicLanguageModel" forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"]]; // Change "AcousticModelEnglish" to "AcousticModelSpanish" in order to create a language model for Spanish recognition instead of English.
// NSError *error = [languageModelGenerator generateLanguageModelFromTextFile:[NSString stringWithFormat:@"%@/%@",[[NSBundle mainBundle] resourcePath], @"OpenEarsCorpus.txt"] withFilesNamed:@"SecondOpenEarsDynamicLanguageModel" forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"]]; // Try this out to see how generating a language model from a corpus works.
if(error) {
//NSLog(@"Dynamic language generator reported error %@", [error description]);
} else {
self.pathToSecondDynamicallyGeneratedLanguageModel = [languageModelGenerator pathToSuccessfullyGeneratedLanguageModelWithRequestedName:@"SecondOpenEarsDynamicLanguageModel"]; // We'll set our new .languagemodel file to be the one to get switched to when the words "CHANGE MODEL" are recognized.
self.pathToSecondDynamicallyGeneratedDictionary = [languageModelGenerator pathToSuccessfullyGeneratedDictionaryWithRequestedName:@"SecondOpenEarsDynamicLanguageModel"];; // We'll set our new dictionary to be the one to get switched to when the words "CHANGE MODEL" are recognized.
// Next, an informative message.
////NSLog(@"\n\nWelcome to the OpenEars sample project. This project understands the words:\nBACKWARD,\nCHANGE,\nFORWARD,\nGO,\nLEFT,\nMODEL,\nRIGHT,\nTURN,\nand if you say \"CHANGE MODEL\" it will switch to its dynamically-generated model which understands the words:\nCHANGE,\nMODEL,\nMONDAY,\nTUESDAY,\nWEDNESDAY,\nTHURSDAY,\nFRIDAY,\nSATURDAY,\nSUNDAY,\nQUIDNUNC");
// This is how to start the continuous listening loop of an available instance of OEPocketsphinxController. We won't do this if the language generation failed since it will be listening for a command to change over to the generated language.
[[OEPocketsphinxController sharedInstance] setActive:TRUE error:nil]; // Call this once before setting properties of the OEPocketsphinxController instance.
// [OEPocketsphinxController sharedInstance].pathToTestFile = [[NSBundle mainBundle] pathForResource:@"change_model_short" ofType:@"wav"]; // This is how you could use a test WAV (mono/16-bit/16k) rather than live recognition
if(![OEPocketsphinxController sharedInstance].isListening) {
[[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"] languageModelIsJSGF:FALSE]; // Start speech recognition if we aren't already listening.
}
// [self startDisplayingLevels] is not an OpenEars method, just a very simple approach for level reading
// that I've included with this sample app. My example implementation does make use of two OpenEars
// methods: the pocketsphinxInputLevel method of OEPocketsphinxController and the fliteOutputLevel
// method of fliteController.
//
// The example is meant to show one way that you can read those levels continuously without locking the UI,
// by using an NSTimer, but the OpenEars level-reading methods
// themselves do not include multithreading code since I believe that you will want to design your own
// code approaches for level display that are tightly-integrated with your interaction design and the
// graphics API you choose.
[self startDisplayingLevels];
// Here is some UI stuff that has nothing specifically to do with OpenEars implementation
self.startButton.hidden = TRUE;
self.stopButton.hidden = TRUE;
self.suspendListeningButton.hidden = TRUE;
self.resumeListeningButton.hidden = TRUE;
}
}
#pragma mark -
#pragma mark OEEventsObserver delegate methods
// What follows are all of the delegate methods you can optionally use once you've instantiated an OEEventsObserver and set its delegate to self.
// I've provided some pretty granular information about the exact phase of the Pocketsphinx listening loop, the Audio Session, and Flite, but I'd expect
// that the ones that will really be needed by most projects are the following:
//
//- (void) pocketsphinxDidReceiveHypothesis:(NSString *)hypothesis recognitionScore:(NSString *)recognitionScore utteranceID:(NSString *)utteranceID;
//- (void) audioSessionInterruptionDidBegin;
//- (void) audioSessionInterruptionDidEnd;
//- (void) audioRouteDidChangeToRoute:(NSString *)newRoute;
//- (void) pocketsphinxDidStartListening;
//- (void) pocketsphinxDidStopListening;
//
// It isn't necessary to have a OEPocketsphinxController or a OEFliteController instantiated in order to use these methods. If there isn't anything instantiated that will
// send messages to an OEEventsObserver, all that will happen is that these methods will never fire. You also do not have to create a OEEventsObserver in
// the same class or view controller in which you are doing things with a OEPocketsphinxController or OEFliteController; you can receive updates from those objects in
// any class in which you instantiate an OEEventsObserver and set its delegate to self.
// This is an optional delegate method of OEEventsObserver which delivers the text of speech that Pocketsphinx heard and analyzed, along with its accuracy score and utterance ID.
- (void) pocketsphinxDidReceiveHypothesis:(NSString *)hypothesis recognitionScore:(NSString *)recognitionScore utteranceID:(NSString *)utteranceID {
//NSLog(@"Local callback: The received hypothesis is %@ with a score of %@ and an ID of %@", hypothesis, recognitionScore, utteranceID); // Log it.
if([hypothesis isEqualToString:@"CHANGE MODEL"]) { // If the user says "CHANGE MODEL", we will switch to the alternate model (which happens to be the dynamically generated model).
// Here is an example of language model switching in OpenEars. Deciding on what logical basis to switch models is your responsibility.
// For instance, when you call a customer service line and get a response tree that takes you through different options depending on what you say to it,
// the models are being switched as you progress through it so that only relevant choices can be understood. The construction of that logical branching and
// how to react to it is your job, OpenEars just lets you send the signal to switch the language model when you've decided it's the right time to do so.
if(self.usingStartingLanguageModel) { // If we're on the starting model, switch to the dynamically generated one.
// You can only change language models with ARPA grammars in OpenEars (the ones that end in .languagemodel or .DMP).
// Trying to switch between JSGF models (the ones that end in .gram) will return no result.
[[OEPocketsphinxController sharedInstance] changeLanguageModelToFile:self.pathToSecondDynamicallyGeneratedLanguageModel withDictionary:self.pathToSecondDynamicallyGeneratedDictionary];
self.usingStartingLanguageModel = FALSE;
} else { // If we're on the dynamically generated model, switch to the start model (this is just an example of a trigger and method for switching models).
[[OEPocketsphinxController sharedInstance] changeLanguageModelToFile:self.pathToFirstDynamicallyGeneratedLanguageModel withDictionary:self.pathToFirstDynamicallyGeneratedDictionary];
self.usingStartingLanguageModel = TRUE;
}
}
self.heardTextView.text = [NSString stringWithFormat:@"Heard: \"%@\"", hypothesis]; // Show it in the status box.
// This is how to use an available instance of OEFliteController. We're going to repeat back the command that we heard with the voice we've chosen.
[self.fliteController say:[NSString stringWithFormat:@"You said %@",hypothesis] withVoice:self.slt];
}
#ifdef kGetNbest
- (void) pocketsphinxDidReceiveNBestHypothesisArray:(NSArray *)hypothesisArray { // Pocketsphinx has an n-best hypothesis dictionary.
//NSLog(@"Local callback: hypothesisArray is %@",hypothesisArray);
}
#endif
// An optional delegate method of OEEventsObserver which informs that there was an interruption to the audio session (e.g. an incoming phone call).
- (void) audioSessionInterruptionDidBegin {
//NSLog(@"Local callback: AudioSession interruption began."); // Log it.
self.statusTextView.text = @"Status: AudioSession interruption began."; // Show it in the status box.
NSError *error = nil;
if([OEPocketsphinxController sharedInstance].isListening) {
error = [[OEPocketsphinxController sharedInstance] stopListening]; // React to it by telling Pocketsphinx to stop listening (if it is listening) since it will need to restart its loop after an interruption.
if(error) {
//NSLog(@"Error while stopping listening in audioSessionInterruptionDidBegin: %@", error);
}
}
}
// An optional delegate method of OEEventsObserver which informs that the interruption to the audio session ended.
- (void) audioSessionInterruptionDidEnd {
//NSLog(@"Local callback: AudioSession interruption ended."); // Log it.
self.statusTextView.text = @"Status: AudioSession interruption ended."; // Show it in the status box.
// We're restarting the previously-stopped listening loop.
if(![OEPocketsphinxController sharedInstance].isListening){
[[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"] languageModelIsJSGF:FALSE]; // Start speech recognition if we aren't currently listening.
}
}
// An optional delegate method of OEEventsObserver which informs that the audio input became unavailable.
- (void) audioInputDidBecomeUnavailable {
//NSLog(@"Local callback: The audio input has become unavailable"); // Log it.
self.statusTextView.text = @"Status: The audio input has become unavailable"; // Show it in the status box.
NSError *error = nil;
if([OEPocketsphinxController sharedInstance].isListening){
error = [[OEPocketsphinxController sharedInstance] stopListening]; // React to it by telling Pocketsphinx to stop listening since there is no available input (but only if we are listening).
if(error) {
//NSLog(@"Error while stopping listening in audioInputDidBecomeUnavailable: %@", error);
}
}
}
// An optional delegate method of OEEventsObserver which informs that the unavailable audio input became available again.
- (void) audioInputDidBecomeAvailable {
//NSLog(@"Local callback: The audio input is available"); // Log it.
self.statusTextView.text = @"Status: The audio input is available"; // Show it in the status box.
if(![OEPocketsphinxController sharedInstance].isListening) {
[[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"] languageModelIsJSGF:FALSE]; // Start speech recognition, but only if we aren't already listening.
}
}
// An optional delegate method of OEEventsObserver which informs that there was a change to the audio route (e.g. headphones were plugged in or unplugged).
- (void) audioRouteDidChangeToRoute:(NSString *)newRoute {
//NSLog(@"Local callback: Audio route change. The new audio route is %@", newRoute); // Log it.
self.statusTextView.text = [NSString stringWithFormat:@"Status: Audio route change. The new audio route is %@",newRoute]; // Show it in the status box.
NSError *error = [[OEPocketsphinxController sharedInstance] stopListening]; // React to it by telling the Pocketsphinx loop to shut down and then start listening again on the new route
if(error){
//NSLog(@"Local callback: error while stopping listening in audioRouteDidChangeToRoute: %@",error);
}
if(![OEPocketsphinxController sharedInstance].isListening) {
[[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"] languageModelIsJSGF:FALSE]; // Start speech recognition if we aren't already listening.
}
}
// An optional delegate method of OEEventsObserver which informs that the Pocketsphinx recognition loop has entered its actual loop.
// This might be useful in debugging a conflict between another sound class and Pocketsphinx.
- (void) pocketsphinxRecognitionLoopDidStart {
//NSLog(@"Local callback: Pocketsphinx started."); // Log it.
self.statusTextView.text = @"Status: Pocketsphinx started."; // Show it in the status box.
}
// An optional delegate method of OEEventsObserver which informs that Pocketsphinx is now listening for speech.
- (void) pocketsphinxDidStartListening {
//NSLog(@"Local callback: Pocketsphinx is now listening."); // Log it.
self.statusTextView.text = @"Status: Pocketsphinx is now listening."; // Show it in the status box.
self.startButton.hidden = TRUE; // React to it with some UI changes.
self.stopButton.hidden = FALSE;
self.suspendListeningButton.hidden = FALSE;
self.resumeListeningButton.hidden = TRUE;
}
// An optional delegate method of OEEventsObserver which informs that Pocketsphinx detected speech and is starting to process it.
- (void) pocketsphinxDidDetectSpeech {
//NSLog(@"Local callback: Pocketsphinx has detected speech."); // Log it.
self.statusTextView.text = @"Status: Pocketsphinx has detected speech."; // Show it in the status box.
}
// An optional delegate method of OEEventsObserver which informs that Pocketsphinx detected a second of silence, indicating the end of an utterance.
// This was added because developers requested being able to time the recognition speed without the speech time. The processing time is the time between
// this method being called and the hypothesis being returned.
- (void) pocketsphinxDidDetectFinishedSpeech {
//NSLog(@"Local callback: Pocketsphinx has detected a second of silence, concluding an utterance."); // Log it.
self.statusTextView.text = @"Status: Pocketsphinx has detected finished speech."; // Show it in the status box.
}
// An optional delegate method of OEEventsObserver which informs that Pocketsphinx has exited its recognition loop, most
// likely in response to the OEPocketsphinxController being told to stop listening via the stopListening method.
- (void) pocketsphinxDidStopListening {
//NSLog(@"Local callback: Pocketsphinx has stopped listening."); // Log it.
self.statusTextView.text = @"Status: Pocketsphinx has stopped listening."; // Show it in the status box.
}
// An optional delegate method of OEEventsObserver which informs that Pocketsphinx is still in its listening loop but it is not
// Going to react to speech until listening is resumed. This can happen as a result of Flite speech being
// in progress on an audio route that doesn't support simultaneous Flite speech and Pocketsphinx recognition,
// or as a result of the OEPocketsphinxController being told to suspend recognition via the suspendRecognition method.
- (void) pocketsphinxDidSuspendRecognition {
//NSLog(@"Local callback: Pocketsphinx has suspended recognition."); // Log it.
self.statusTextView.text = @"Status: Pocketsphinx has suspended recognition."; // Show it in the status box.
}
// An optional delegate method of OEEventsObserver which informs that Pocketsphinx is still in its listening loop and after recognition
// having been suspended it is now resuming. This can happen as a result of Flite speech completing
// on an audio route that doesn't support simultaneous Flite speech and Pocketsphinx recognition,
// or as a result of the OEPocketsphinxController being told to resume recognition via the resumeRecognition method.
- (void) pocketsphinxDidResumeRecognition {
//NSLog(@"Local callback: Pocketsphinx has resumed recognition."); // Log it.
self.statusTextView.text = @"Status: Pocketsphinx has resumed recognition."; // Show it in the status box.
}
// An optional delegate method which informs that Pocketsphinx switched over to a new language model at the given URL in the course of
// recognition. This does not imply that it is a valid file or that recognition will be successful using the file.
- (void) pocketsphinxDidChangeLanguageModelToFile:(NSString *)newLanguageModelPathAsString andDictionary:(NSString *)newDictionaryPathAsString {
//NSLog(@"Local callback: Pocketsphinx is now using the following language model: \n%@ and the following dictionary: %@",newLanguageModelPathAsString,newDictionaryPathAsString);
}
// An optional delegate method of OEEventsObserver which informs that Flite is speaking, most likely to be useful if debugging a
// complex interaction between sound classes. You don't have to do anything yourself in order to prevent Pocketsphinx from listening to Flite talk and trying to recognize the speech.
- (void) fliteDidStartSpeaking {
//NSLog(@"Local callback: Flite has started speaking"); // Log it.
self.statusTextView.text = @"Status: Flite has started speaking."; // Show it in the status box.
}
// An optional delegate method of OEEventsObserver which informs that Flite is finished speaking, most likely to be useful if debugging a
// complex interaction between sound classes.
- (void) fliteDidFinishSpeaking {
//NSLog(@"Local callback: Flite has finished speaking"); // Log it.
self.statusTextView.text = @"Status: Flite has finished speaking."; // Show it in the status box.
}
- (void) pocketSphinxContinuousSetupDidFailWithReason:(NSString *)reasonForFailure { // This can let you know that something went wrong with the recognition loop startup. Turn on [OELogging startOpenEarsLogging] to learn why.
//NSLog(@"Local callback: Setting up the continuous recognition loop has failed for the reason %@, please turn on [OELogging startOpenEarsLogging] to learn more.", reasonForFailure); // Log it.
self.statusTextView.text = @"Status: Not possible to start recognition loop."; // Show it in the status box.
}
- (void) pocketSphinxContinuousTeardownDidFailWithReason:(NSString *)reasonForFailure { // This can let you know that something went wrong with the recognition loop startup. Turn on [OELogging startOpenEarsLogging] to learn why.
//NSLog(@"Local callback: Tearing down the continuous recognition loop has failed for the reason %@, please turn on [OELogging startOpenEarsLogging] to learn more.", reasonForFailure); // Log it.
self.statusTextView.text = @"Status: Not possible to cleanly end recognition loop."; // Show it in the status box.
}
- (void) testRecognitionCompleted { // A test file which was submitted for direct recognition via the audio driver is done.
//NSLog(@"Local callback: A test file which was submitted for direct recognition via the audio driver is done."); // Log it.
NSError *error = nil;
if([OEPocketsphinxController sharedInstance].isListening) { // If we're listening, stop listening.
error = [[OEPocketsphinxController sharedInstance] stopListening];
if(error)
{
//NSLog(@"Error while stopping listening in testRecognitionCompleted: %@", error);
}
}
}
/** Pocketsphinx couldn't start because it has no mic permissions (will only be returned on iOS7 or later).*/
- (void) pocketsphinxFailedNoMicPermissions {
//NSLog(@"Local callback: The user has never set mic permissions or denied permission to this app's mic, so listening will not start.");
self.startupFailedDueToLackOfPermissions = TRUE;
}
/** The user prompt to get mic permissions, or a check of the mic permissions, has completed with a TRUE or a FALSE result (will only be returned on iOS7 or later).*/
- (void) micPermissionCheckCompleted:(BOOL)result {
if(result) {
self.restartAttemptsDueToPermissionRequests++;
if(self.restartAttemptsDueToPermissionRequests == 1 && self.startupFailedDueToLackOfPermissions) { // If we get here because there was an attempt to start which failed due to lack of permissions, and now permissions have been requested and they returned true, we restart exactly once with the new permissions.
NSError *error = nil;
if([OEPocketsphinxController sharedInstance].isListening){
error = [[OEPocketsphinxController sharedInstance] stopListening]; // Stop listening if we are listening.
if(error) {
//NSLog(@"Error while stopping listening in micPermissionCheckCompleted: %@", error);
}
}
if(!error && ![OEPocketsphinxController sharedInstance].isListening) { // If there was no error and we aren't listening, start listening.
[[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"] languageModelIsJSGF:FALSE]; // Start speech recognition.
self.startupFailedDueToLackOfPermissions = FALSE;
}
}
}
}
#pragma mark -
#pragma mark UI
// This is not OpenEars-specific stuff, just some UI behavior
- (IBAction) suspendListeningButtonAction { // This is the action for the button which suspends listening without ending the recognition loop
[[OEPocketsphinxController sharedInstance] suspendRecognition];
self.startButton.hidden = TRUE;
self.stopButton.hidden = FALSE;
self.suspendListeningButton.hidden = TRUE;
self.resumeListeningButton.hidden = FALSE;
}
- (IBAction) resumeListeningButtonAction { // This is the action for the button which resumes listening if it has been suspended
[[OEPocketsphinxController sharedInstance] resumeRecognition];
self.startButton.hidden = TRUE;
self.stopButton.hidden = FALSE;
self.suspendListeningButton.hidden = FALSE;
self.resumeListeningButton.hidden = TRUE;
}
- (IBAction) stopButtonAction { // This is the action for the button which shuts down the recognition loop.
NSError *error = nil;
if([OEPocketsphinxController sharedInstance].isListening) { // Stop if we are currently listening.
error = [[OEPocketsphinxController sharedInstance] stopListening];
if(error){
//NSLog(@"Error stopping listening in stopButtonAction: %@", error);
}
}
self.startButton.hidden = FALSE;
self.stopButton.hidden = TRUE;
self.suspendListeningButton.hidden = TRUE;
self.resumeListeningButton.hidden = TRUE;
}
- (IBAction) startButtonAction { // This is the action for the button which starts up the recognition loop again if it has been shut down.
if(![OEPocketsphinxController sharedInstance].isListening) {
[[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"] languageModelIsJSGF:FALSE]; // Start speech recognition if we aren't already listening.
}
self.startButton.hidden = TRUE;
self.stopButton.hidden = FALSE;
self.suspendListeningButton.hidden = FALSE;
self.resumeListeningButton.hidden = TRUE;
}
#pragma mark -
#pragma mark Example for reading out Pocketsphinx and Flite audio levels without locking the UI by using an NSTimer
// What follows are not OpenEars methods, just an approach for level reading
// that I've included with this sample app. My example implementation does make use of two OpenEars
// methods: the pocketsphinxInputLevel method of OEPocketsphinxController and the fliteOutputLevel
// method of OEFliteController.
//
// The example is meant to show one way that you can read those levels continuously without locking the UI,
// by using an NSTimer, but the OpenEars level-reading methods
// themselves do not include multithreading code since I believe that you will want to design your own
// code approaches for level display that are tightly-integrated with your interaction design and the
// graphics API you choose.
//
// Please note that if you use my sample approach, you should pay attention to the way that the timer is always stopped in
// dealloc. This should prevent you from having any difficulties with deallocating a class due to a running NSTimer process.
- (void) startDisplayingLevels { // Start displaying the levels using a timer
[self stopDisplayingLevels]; // We never want more than one timer valid so we'll stop any running timers first.
self.uiUpdateTimer = [NSTimer scheduledTimerWithTimeInterval:1.0/kLevelUpdatesPerSecond target:self selector:@selector(updateLevelsUI) userInfo:nil repeats:YES];
}
- (void) stopDisplayingLevels { // Stop displaying the levels by stopping the timer if it's running.
if(self.uiUpdateTimer && [self.uiUpdateTimer isValid]) { // If there is a running timer, we'll stop it here.
[self.uiUpdateTimer invalidate];
self.uiUpdateTimer = nil;
}
}
- (void) updateLevelsUI { // And here is how we obtain the levels. This method includes the actual OpenEars methods and uses their results to update the UI of this view controller.
self.pocketsphinxDbLabel.text = [NSString stringWithFormat:@"Pocketsphinx Input level:%f",[[OEPocketsphinxController sharedInstance] pocketsphinxInputLevel]]; //pocketsphinxInputLevel is an OpenEars method of the class OEPocketsphinxController.
if(self.fliteController.speechInProgress) {
self.fliteDbLabel.text = [NSString stringWithFormat:@"Flite Output level: %f",[self.fliteController fliteOutputLevel]]; // fliteOutputLevel is an OpenEars method of the class OEFliteController.
}
}
@end