Can utterances only bring back what is in dictionary?

Tagged: Utterances

This topic has 20 replies, 3 voices, and was last updated 6 years, 3 months ago by olegnaumenko.

Viewing 21 posts - 1 through 21 (of 21 total)

Advertisement: “Don't want to wait for pauses before receiving speech recognition results? try RapidEars!”

Author

Posts
August 8, 2017 at 12:23 pm #1032011

jeffbonasso
Participant

Hello. It seems that if I have a dictionary like follows:

[
“GO BACK”,
“READ CURRENT ITEM”,
“READ ITEM”,
“CHECK”,
“CHECK ITEM”,
“NEXT”,
“NEXT ITEM”,
“SKIP”,
“SKIP ITEM”,
“LOCATE”,
“LOCATE ITEM”,
“EMERGENCY”,
“EMERGENCY LIST”,
“OPEN COMMENTS”,
“CLOSE COMMENTS”,
“SHOW COMMENTS”,
“HIDE COMMENTS”,
“OPEN NOTES”,
“CLOSE NOTES”,
“SHOW NOTES”,
“HIDE NOTES”,
“OPEN LOGBOOK”,
“CLOSE LOGBOOK”,
“SHOW LOGBOOK”,
“HIDE LOGBOOK”,
“OPEN CLOCK”,
“CLOSE CLOCK”,
“SHOW CLOCK”,
“HIDE CLOCK”,
“NEXT LIST”,
“PREVIOUS LIST”,
“NEXT SECTION”,
“PREVIOUS SECTION”,
“SPEAK”,
“SILENCE”,
“READ SECTION”,
“READ CURRENT SECTION”,
“CHECK SECTION”,
“CHECK CURRENT SECTION”,
“OPEN NAV OVERLAY”,
“CLOSE NAV OVERLAY”,
“SHOW NAV OVERLAY”,
“HIDE NAV OVERLAY”,
“QUIDNUNC”
];

…that when it is listening, it will bring back utterances that aren’t specifically in this dictionary like “BACK”, “CURRENT”, “NAV NOTES”, “NAV ITEM NO OUT”

Is there a way to only have utterances come back that only are in the dictionary and the ones in the dictionary are weighted higher than ones that aren’t? In my case it will bring back an utterance like “BACK” when I say “CHECK”, but I don’t have “BACK” in the dictionary by itself, I specifically added “GO BACK” to try to solve it from making the wrong choice.

August 8, 2017 at 3:15 pm #1032012

Halle Winkler
Politepix

Welcome,

Yes, take a look in the docs for information about grammars in OpenEars (versus language models, which you are using above) and after looking into that and trying it out, you may possible also want to investigate the use of RuleORama in case you need it in realtime.

August 8, 2017 at 4:58 pm #1032013

jeffbonasso
Participant

Thanks! I am using Rejecto which seems to only have the method…

generateRejectingLanguageModelFromArray

Is there a similar one for generating a grammar while using Rejecto?

August 8, 2017 at 5:47 pm #1032014

Halle Winkler
Politepix

No, Rejecto works with language models only. I would first start with the stock OpenEars grammar methods and then check out RuleORama if you need to use that grammar approach in realtime.

August 11, 2017 at 6:34 pm #1032022

jeffbonasso
Participant

Thanks Halle. So I am now using the grammar instead of language model and when I now say the phrases in the dictionary it is way more accurate than before which is great. One thing I am seeing though is because there is no Rejecto it seems to be overly aggressive in returning something in the dictionary even if what is being said is not even close to the phrases. Even when I say a single syllable, single word it will bring back phrases. Is there any strategies to mimic what something like Rejecto does or maybe I am missing something that can make things being said not trip it always wanting to bring back an utterance?

August 11, 2017 at 8:38 pm #1032023

Halle Winkler
Politepix

Hi,

Make sure that the vadThreshold is set correctly.

August 12, 2017 at 3:28 am #1032024

jeffbonasso
Participant

Yes. I put in slider in my app where at anytime I change it from 1 to 5 in 0.1 increments. When I get to 4.2 or above it doesn’t recognize most words. At 4 it usually recognizes words but it still trips up a lot when saying words where there definitely are no phrases anywhere near what is being said, .

August 12, 2017 at 10:40 am #1032025

Halle Winkler
Politepix

OK, that’s surprising, but vadThreshold would be the available way to address this. If the utterances you are using in the grammar are particularly short, you may wish to make them a bit longer so they are more distinct from each other and less easily substituted for other utterances.

August 12, 2017 at 4:15 pm #1032026

jeffbonasso
Participant

Almost every phrase is at least two words. Most are two words or more. This is one dictionary…

AFFIRMATIVE,
NEGATIVE,
“GO BACK”,
“READ CURRENT ITEM”,
“READ ITEM”,
CHECK,
“CHECK ITEM”,
NEXT,
“NEXT ITEM”,
SKIP,
“SKIP ITEM”,
LOCATE,
“LOCATE ITEM”,
EMERGENCY,
“EMERGENCY LIST”,
“OPEN COMMENTS”,
“CLOSE COMMENTS”,
“SHOW COMMENTS”,
“HIDE COMMENTS”,
“OPEN NOTES”,
“CLOSE NOTES”,
“SHOW NOTES”,
“HIDE NOTES”,
“OPEN LOGBOOK”,
“CLOSE LOGBOOK”,
“SHOW LOGBOOK”,
“HIDE LOGBOOK”,
“OPEN CLOCK”,
“CLOSE CLOCK”,
“SHOW CLOCK”,
“HIDE CLOCK”,
“NEXT LIST”,
“PREVIOUS LIST”,
“NEXT SECTION”,
“PREVIOUS SECTION”,
“MIRA SILENCE”,
“MIRA SHUT UP”,
“MIRA BE QUIET”,
“READ SECTION”,
“READ CURRENT SECTION”,
“CHECK SECTION”,
“CHECK CURRENT SECTION”,
“RESET CHECKLIST”,
“RESET CURRENT CHECKLIST”,
“RESET LIST”,
“RESET CURRENT LIST”,
“RESET SECTION”,
“RESET CURRENT SECTION”,
“OPEN NAV OVERLAY”,
“CLOSE NAV OVERLAY”,
“SHOW NAV OVERLAY”,
“HIDE NAV OVERLAY”,
“OPEN BIG CHECK OVERLAY”,
“CLOSE BIG CHECK OVERLAY”,
“SHOW BIG CHECK OVERLAY”,
“HIDE BIG CHECK OVERLAY”,
QUIDNUNC,
“MIRA PREFLIGHT”,
“MIRA INITIAL”,
“MIRA EXTERIOR”,
“MIRA INTERIOR”,
“MIRA START”,
“MIRA TAXI”,
“MIRA RUN-UP”,
“MIRA INFLIGHT”,
“MIRA PRE-TAKEOFF”,
“MIRA TAKEOFF”,
“MIRA CLIMB”,
“MIRA CRUISE”,
“MIRA DESCENT”,
“MIRA PRE-LANDING”,
“MIRA LANDING”,
“MIRA GO-AROUND”,
“MIRA POSTFLIGHT”,
“MIRA AFTER LANDING”,
“MIRA SECURING”,
“MIRA SPEEDS”,
“MIRA QUICK SPEEDS”,
“MIRA NORMAL OPERATION”,
“MIRA REFERENCE”,
“MIRA SPECS”,
“MIRA FREQUENCIES”,
“MIRA EMERGENCY”,
“MIRA HELP”,
“MIRA MAYDAY”,
“MIRA POWER LOSS ON TAKEOFF”,
“MIRA TAKEOFF POWER LOSS”,
“MIRA POWER LOSS INFLIGHT”,
“MIRA INFLIGHT POWER LOSS”,
“MIRA NO RESTART WITH TIME”,
“MIRA ELECTRICAL FIRE”,
“MIRA ENGINE FIRE ON STARTUP”,
“MIRA STARTUP ENGINE FIRE”,
“MIRA ENGINE FIRE INFLIGHT”,
“MIRA INFLIGHT ENGINE FIRE”,
“MIRA ICING”,
“MIRA EXCESS CHARGE”,
“MIRA LOW VOLTAGE”,
“MIRA RADIO OUT”

I do understand I could use the the following format to build more of a hierarchy, but I am not sure if doing that will improve the results…

@{
ThisWillBeSaidOnce : @[
@{ OneOfTheseCanBeSaidOnce : @[@”HELLO COMPUTER”, @”GREETINGS ROBOT”]},
@{ OneOfTheseWillBeSaidOnce : @[@”DO THE FOLLOWING”, @”INSTRUCTION”]},
@{ OneOfTheseWillBeSaidOnce : @[@”GO”, @”MOVE”]},
@{ThisWillBeSaidWithOptionalRepetitions : @[
@{ OneOfTheseWillBeSaidOnce : @[@”10″, @”20″,@”30″]},
@{ OneOfTheseWillBeSaidOnce : @[@”LEFT”, @”RIGHT”, @”FORWARD”]}
]},
@{ OneOfTheseWillBeSaidOnce : @[@”EXECUTE”, @”DO IT”]},
@{ ThisCanBeSaidOnce : @[@”THANK YOU”]}
]
};

August 12, 2017 at 6:12 pm #1032027

Halle Winkler
Politepix

Hi,

Only the part at the end of your previous post beginning with ThisWillBeSaidOnce is a grammar; the other is still a language model.

August 12, 2017 at 6:25 pm #1032028

jeffbonasso
Participant

I just copied the strings from the debug output. I am passing those to the generate grammar method in an NSDictiinary under a ThisWillBeSaidOnce and using JSGF to YES when listening.

August 12, 2017 at 6:36 pm #1032029

Halle Winkler
Politepix

OK, I would get rid of the one-syllable single-word entries and see if it improves.

August 13, 2017 at 12:20 am #1032032

jeffbonasso
Participant

OK, I have done that. Still very weird behavior. I am testing this in pretty ideal conditions with no background noise and a very good headset and noise cancelling mic. When I say the words that appear in the dictionary it works flawlessly. When I saw am just saying other long phrases, it trips almost every time.

“THAT IS NOT VERY GOOD” and it came back with “NEGATIVE”
“THIS IS A TEST” and it came back with “MIRA SECURING”
“THIS IS A TEST” and it came back with “MIRA SPEEDS”
“THERE REALLY IS SOMETHING WRONG” and it came back with “MIRA RADIO-OUT”
“THAT IS WEIRD” and it came back with “READ ITEM”

August 13, 2017 at 9:31 am #1032033

Halle Winkler
Politepix

That is unexpected, but I’m afraid I don’t have more suggestions.

August 13, 2017 at 11:23 am #1032034

jeffbonasso
Participant

Since there is no rejecto for grammar, are there any strategies that could be done to simulate rejecto. One thing I just tried was I added every letter of the alphabet to the dictionary, and now almost anytime time it hears anything it picks one of those unless I say a phrase specifically in the dictionary which definitely is helping a lot.

Still confused why using a grammar it seems like it always wants to match with something in the dictionary. When I say almost anything it is always coming back now with one of the letters. It seems like there would be something that if in the time it started recognizing to the silence delay if there was what was clearly many syllables and words being said that it wouldn’t match to something with one syllable.

August 13, 2017 at 11:29 am #1032035

Halle Winkler
Politepix

Sorry, as I said, that is an unexpected result and I don’t have further suggestions for it. Take a look at the post Please read before you post – how to troubleshoot and provide logging info here so you can see the info needed in order to request any more in-depth troubleshooting.

January 3, 2018 at 4:09 pm #1032186

olegnaumenko
Participant

Also using Grammar method currently,
Having the same problem (wrong results while phrase not in dict is being said).

Is there any help in probability / score numbers? Is there any way to get the credibility index of current hypothesis / utterance? This would greatly help,

Will any of Your paid plugins help improve this?

January 3, 2018 at 4:11 pm #1032187

Halle Winkler
Politepix

Welcome,

Like for the post before yours, this would need logging output to be able to help with – please take a look at the link I provided in my previous response, thanks.

January 3, 2018 at 5:49 pm #1032188

olegnaumenko
Participant

OpenEars 2.506, iPhone 5s, speaking into built-in mic from 8..10 inch distance, using grammar mode with vocabulary:

@{OneOfTheseWillBeSaidOnce:@[@”HELLO ROBOT”,
@”HEY THERE”,
@”GREEN CROCODILE”,
@”HELLO PEOPLE”,
@”HEY YOU NERD”,
@”EMERGENCY SITUATION”]}

when saying, “I didn’t say that” I get “HEY THERE”. Often when I pronounce “Hello People” I get “HEY THERE”. I understand that these sound similar. Is there a way to get the probability for detection so I can filter out hypotheses with low credibility? Or is probability unavailable in JSGF mode?

the log goes here:

2018-01-03 18:40:15.732414+0200 OpenEarsTest[2672:954570] Starting OpenEars logging for OpenEars version 2.506 on 64-bit device (or build): iPhone running iOS version: 10.300000
2018-01-03 18:40:15.756330+0200 OpenEarsTest[2672:954570] Since there is no cached version, loading the language model lookup list for the acoustic model called AcousticModelEnglish
2018-01-03 18:40:15.813194+0200 OpenEarsTest[2672:954570] I’m done running performDictionaryLookup and it took 0.039237 seconds
2018-01-03 18:40:15.856756+0200 OpenEarsTest[2672:954570] Creating shared instance of OEPocketsphinxController
2018-01-03 18:40:15.873651+0200 OpenEarsTest[2672:954570] Attempting to start listening session from startListeningWithLanguageModelAtPath:
2018-01-03 18:40:15.880222+0200 OpenEarsTest[2672:954570] User gave mic permission for this app.
2018-01-03 18:40:15.882161+0200 OpenEarsTest[2672:954570] setSecondsOfSilence wasn’t set, using default of 0.700000.
2018-01-03 18:40:15.883578+0200 OpenEarsTest[2672:954626] Starting listening.
2018-01-03 18:40:15.883754+0200 OpenEarsTest[2672:954626] About to set up audio session
2018-01-03 18:40:16.130446+0200 OpenEarsTest[2672:954626] Creating audio session with default settings.
2018-01-03 18:40:16.130549+0200 OpenEarsTest[2672:954626] Done setting audio session category.
2018-01-03 18:40:16.131371+0200 OpenEarsTest[2672:954638] Audio route has changed for the following reason:
2018-01-03 18:40:16.135606+0200 OpenEarsTest[2672:954638] There was a category change. The new category is AVAudioSessionCategoryPlayAndRecord
2018-01-03 18:40:16.200989+0200 OpenEarsTest[2672:954638] This is not a case in which OpenEars notifies of a route change. At the close of this method, the new audio route will be <Input route or routes: “MicrophoneBuiltIn”. Output route or routes: “Speaker”>. The previous route before changing to this route was “<AVAudioSessionRouteDescription: 0x174007e80,
inputs = (
“<AVAudioSessionPortDescription: 0x174008160, type = MicrophoneBuiltIn; name = iPhone Microphone; UID = Built-In Microphone; selectedDataSource = Front>”
);
outputs = (
“<AVAudioSessionPortDescription: 0x174007d40, type = Speaker; name = Speaker; UID = Speaker; selectedDataSource = (null)>”
)>”.
2018-01-03 18:40:16.208455+0200 OpenEarsTest[2672:954626] Done setting preferred sample rate to 16000.000000 – now the real sample rate is 16000.000000
2018-01-03 18:40:16.209213+0200 OpenEarsTest[2672:954626] number of channels is already the preferred number of 1 so not setting it.
2018-01-03 18:40:16.210498+0200 OpenEarsTest[2672:954626] Done setting session’s preferred I/O buffer duration to 0.128000 – now the actual buffer duration is 0.128000
2018-01-03 18:40:16.210616+0200 OpenEarsTest[2672:954626] Done setting up audio session
2018-01-03 18:40:16.212290+0200 OpenEarsTest[2672:954638] Audio route has changed for the following reason:
2018-01-03 18:40:16.214772+0200 OpenEarsTest[2672:954626] About to set up audio IO unit in a session with a sample rate of 16000.000000, a channel number of 1 and a buffer duration of 0.128000.
2018-01-03 18:40:16.234177+0200 OpenEarsTest[2672:954638] There was a category change. The new category is AVAudioSessionCategoryPlayAndRecord
2018-01-03 18:40:16.239235+0200 OpenEarsTest[2672:954638] This is not a case in which OpenEars notifies of a route change. At the close of this method, the new audio route will be <Input route or routes: “MicrophoneBuiltIn”. Output route or routes: “Speaker”>. The previous route before changing to this route was “<AVAudioSessionRouteDescription: 0x174007d20,
inputs = (
“<AVAudioSessionPortDescription: 0x174007e90, type = MicrophoneBuiltIn; name = iPhone Microphone; UID = Built-In Microphone; selectedDataSource = Bottom>”
);
outputs = (
“<AVAudioSessionPortDescription: 0x170008640, type = Receiver; name = Receiver; UID = Built-In Receiver; selectedDataSource = (null)>”
)>”.
2018-01-03 18:40:16.248024+0200 OpenEarsTest[2672:954626] Done setting up audio unit
2018-01-03 18:40:16.248110+0200 OpenEarsTest[2672:954626] About to start audio IO unit
2018-01-03 18:40:16.528044+0200 OpenEarsTest[2672:954626] Done starting audio unit
INFO: pocketsphinx.c(145): Parsed model-specific feature parameters from /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/feat.params
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-ceplen 13 13
-cmn current current
-cmninit 8.0 40
-compallsen no no
-debug 0
-dict /var/mobile/Containers/Data/Application/2DD76771-D5C9-4303-AE69-7DC22AB5849E/Library/Caches/FirstGrammarModel.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/noisedict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/feat.params
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle
-input_endian little little
-jsgf /var/mobile/Containers/Data/Application/2DD76771-D5C9-4303-AE69-7DC22AB5849E/Library/Caches/FirstGrammarModel.gram
-keyphrase
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-01
-kws_threshold 1 1.000000e+00
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 22
-lm
-lmctl
-lmname
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.300000e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 1.000000e+00
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/mdef
-mean /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/means
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 25
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-10 1.000000e-10
-pl_pip 1.0 1.000000e+00
-pl_weight 3.0 3.000000e+00
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec 0-12/13-25/26-38
-tmat /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/transition_matrices
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 6.800000e+03
-uw 1.0 1.000000e+00
-vad_postspeech 50 69
-vad_prespeech 20 10
-vad_startspeech 10 10
-vad_threshold 2.0 2.300000e+00
-var /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/variances
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02

INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’current’, VARNORM=’no’, AGC=’none’
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(164): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/mdef
INFO: bin_mdef.c(516): 46 CI-phone, 168344 CD-phone, 3 emitstate/phone, 138 CI-sen, 6138 Sen, 32881 Sen-Seq
INFO: tmat.c(206): Reading HMM transition probability matrices: /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/transition_matrices
INFO: acmod.c(117): Attempting to use PTM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: ptm_mgau.c(805): Number of codebooks doesn’t match number of ciphones, doesn’t look like PTM: 1 != 46
INFO: acmod.c(119): Attempting to use semi-continuous computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(294): 512×13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(904): Loading senones from dump file /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/sendump
INFO: s2_semi_mgau.c(928): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(991): Rows: 512, Columns: 6138
INFO: s2_semi_mgau.c(1023): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1294): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 4119 * 32 bytes (128 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /var/mobile/Containers/Data/Application/2DD76771-D5C9-4303-AE69-7DC22AB5849E/Library/Caches/FirstGrammarModel.dic
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(336): 14 words read
INFO: dict.c(358): Reading filler dictionary: /var/containers/Bundle/Application/A9C4195E-904F-44F1-906A-F193DC56484D/OpenEarsTest.app/AcousticModelEnglish.bundle/noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 9 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 46^3 * 2 bytes (190 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 51152 bytes (49 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 51152 bytes (49 KiB) for single-phone word triphones
INFO: jsgf.c(691): Defined rule: <FirstGrammarModel.g00000>
INFO: jsgf.c(691): Defined rule: PUBLIC <FirstGrammarModel.rule_0>
INFO: fsg_model.c(215): Computing transitive closure for null transitions
INFO: fsg_model.c(277): 0 null transitions added
INFO: fsg_search.c(227): FSG(beam: -1080, pbeam: -1080, wbeam: -634; wip: -5, pip: 0)
INFO: fsg_model.c(428): Adding silence transitions for <sil> to FSG
INFO: fsg_model.c(448): Added 9 silence word transitions
INFO: fsg_model.c(428): Adding silence transitions for <sil> to FSG
INFO: fsg_model.c(448): Added 9 silence word transitions
INFO: fsg_model.c(428): Adding silence transitions for [BREATH] to FSG
INFO: fsg_model.c(448): Added 9 silence word transitions
INFO: fsg_model.c(428): Adding silence transitions for [COUGH] to FSG
INFO: fsg_model.c(448): Added 9 silence word transitions
INFO: fsg_model.c(428): Adding silence transitions for [NOISE] to FSG
INFO: fsg_model.c(448): Added 9 silence word transitions
INFO: fsg_model.c(428): Adding silence transitions for [SMACK] to FSG
INFO: fsg_model.c(448): Added 9 silence word transitions
INFO: fsg_model.c(428): Adding silence transitions for [UH] to FSG
INFO: fsg_model.c(448): Added 9 silence word transitions
INFO: fsg_search.c(173): Added 4 alternate word transitions
INFO: fsg_lextree.c(110): Allocated 846 bytes (0 KiB) for left and right context phones
INFO: fsg_lextree.c(256): 134 HMM nodes in lextree (81 leaves)
INFO: fsg_lextree.c(259): Allocated 19296 bytes (18 KiB) for all lextree nodes
INFO: fsg_lextree.c(262): Allocated 11664 bytes (11 KiB) for lextree leafnodes
2018-01-03 18:40:16.733045+0200 OpenEarsTest[2672:954626] There is no CMN plist so we are using the fresh CMN value 40.000000.
2018-01-03 18:40:16.733680+0200 OpenEarsTest[2672:954626] Listening.
2018-01-03 18:40:16.734432+0200 OpenEarsTest[2672:954626] Project has these words or phrases in its dictionary:
CROCODILE
EMERGENCY
EMERGENCY(2)
GREEN
HELLO
HELLO(2)
HEY
NERD
PEOPLE
ROBOT
ROBOT(2)
SITUATION
THERE
YOU
2018-01-03 18:40:16.734542+0200 OpenEarsTest[2672:954626] Recognition loop has started
2018-01-03 18:40:16.734957+0200 OpenEarsTest[2672:954570] Successfully started listening session from startListeningWithLanguageModelAtPath:
2018-01-03 18:40:16.752577+0200 OpenEarsTest[2672:954570] Pocketsphinx is now listening.
2018-01-03 18:40:17.022659+0200 OpenEarsTest[2672:954626] Speech detected…
2018-01-03 18:40:17.023302+0200 OpenEarsTest[2672:954570] Pocketsphinx has detected speech.
2018-01-03 18:40:18.126128+0200 OpenEarsTest[2672:954626] End of speech detected…
2018-01-03 18:40:18.127059+0200 OpenEarsTest[2672:954570] Pocketsphinx has detected a period of silence, concluding an utterance.
INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 45.87 15.93 -4.69 -5.94 -3.90 -3.12 -2.24 1.58 -1.77 -1.60 -4.82 -5.68 0.49 >
INFO: fsg_search.c(843): 115 frames, 5325 HMMs (46/fr), 10570 senones (91/fr), 439 history entries (3/fr)

ERROR: “fsg_search.c”, line 913: Final result does not match the grammar in frame 115
2018-01-03 18:40:18.130912+0200 OpenEarsTest[2672:954626] Pocketsphinx heard “” with a score of (0) and an utterance ID of 0.
2018-01-03 18:40:18.131202+0200 OpenEarsTest[2672:954626] Hypothesis was null so we aren’t returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController’s property returnNullHypotheses to TRUE before starting OEPocketsphinxController.
2018-01-03 18:40:20.958072+0200 OpenEarsTest[2672:954624] Speech detected…
2018-01-03 18:40:20.959169+0200 OpenEarsTest[2672:954570] Pocketsphinx has detected speech.
2018-01-03 18:40:22.609818+0200 OpenEarsTest[2672:954626] End of speech detected…
2018-01-03 18:40:22.610542+0200 OpenEarsTest[2672:954570] Pocketsphinx has detected a period of silence, concluding an utterance.
INFO: cmn_prior.c(131): cmn_prior_update: from < 45.87 15.93 -4.69 -5.94 -3.90 -3.12 -2.24 1.58 -1.77 -1.60 -4.82 -5.68 0.49 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 49.01 10.38 1.63 1.36 2.62 -2.81 -8.39 1.88 -3.63 -1.06 -3.82 -2.39 -1.53 >
INFO: fsg_search.c(843): 167 frames, 11050 HMMs (66/fr), 17759 senones (106/fr), 1380 history entries (8/fr)

2018-01-03 18:40:22.614643+0200 OpenEarsTest[2672:954626] Pocketsphinx heard “HEY THERE” with a score of (0) and an utterance ID of 1.
2018-01-03 18:40:22.634868+0200 OpenEarsTest[2672:954570] The received hypothesis is HEY THERE with a score of 0 and an ID of 1
2018-01-03 18:40:25.315471+0200 OpenEarsTest[2672:954624] Speech detected…
2018-01-03 18:40:25.316027+0200 OpenEarsTest[2672:954570] Pocketsphinx has detected speech.
2018-01-03 18:40:27.083584+0200 OpenEarsTest[2672:954627] End of speech detected…
2018-01-03 18:40:27.084296+0200 OpenEarsTest[2672:954570] Pocketsphinx has detected a period of silence, concluding an utterance.
INFO: cmn_prior.c(131): cmn_prior_update: from < 49.01 10.38 1.63 1.36 2.62 -2.81 -8.39 1.88 -3.63 -1.06 -3.82 -2.39 -1.53 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 50.11 9.56 2.45 2.06 4.52 -5.13 -9.10 1.93 -4.76 -0.55 -3.99 -0.50 -2.36 >
INFO: fsg_search.c(843): 189 frames, 14014 HMMs (74/fr), 21753 senones (115/fr), 1671 history entries (8/fr)

2018-01-03 18:40:27.087052+0200 OpenEarsTest[2672:954627] Pocketsphinx heard “HEY THERE” with a score of (0) and an utterance ID of 2.
2018-01-03 18:40:27.093560+0200 OpenEarsTest[2672:954570] The received hypothesis is HEY THERE with a score of 0 and an ID of 2
2018-01-03 18:40:29.268660+0200 OpenEarsTest[2672:954625] Speech detected…
2018-01-03 18:40:29.269468+0200 OpenEarsTest[2672:954570] Pocketsphinx has detected speech.
2018-01-03 18:40:31.318108+0200 OpenEarsTest[2672:954625] End of speech detected…
2018-01-03 18:40:31.319683+0200 OpenEarsTest[2672:954570] Pocketsphinx has detected a period of silence, concluding an utterance.
INFO: cmn_prior.c(131): cmn_prior_update: from < 50.11 9.56 2.45 2.06 4.52 -5.13 -9.10 1.93 -4.76 -0.55 -3.99 -0.50 -2.36 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 49.19 10.69 3.28 1.81 3.41 -5.30 -9.70 2.74 -3.79 -1.62 -3.66 -0.23 -1.29 >
INFO: fsg_search.c(843): 215 frames, 4460 HMMs (20/fr), 9700 senones (45/fr), 587 history entries (2/fr)

2018-01-03 18:40:31.321215+0200 OpenEarsTest[2672:954625] Pocketsphinx heard “HELLO PEOPLE” with a score of (0) and an utterance ID of 3.
2018-01-03 18:40:31.326913+0200 OpenEarsTest[2672:954570] The received hypothesis is HELLO PEOPLE with a score of 0 and an ID of 3
2018-01-03 18:40:32.592207+0200 OpenEarsTest[2672:954627] Speech detected…
2018-01-03 18:40:32.592810+0200 OpenEarsTest[2672:954570] Pocketsphinx has detected speech.
INFO: cmn_prior.c(99): cmn_prior_update: from < 49.19 10.69 3.28 1.81 3.41 -5.30 -9.70 2.74 -3.79 -1.62 -3.66 -0.23 -1.29 >
INFO: cmn_prior.c(116): cmn_prior_update: to < 50.16 10.24 3.44 2.45 4.35 -5.52 -9.80 2.17 -4.13 -1.48 -3.39 -0.10 -1.88 >
2018-01-03 18:40:34.129945+0200 OpenEarsTest[2672:954627] End of speech detected…
2018-01-03 18:40:34.130669+0200 OpenEarsTest[2672:954570] Pocketsphinx has detected a period of silence, concluding an utterance.
INFO: cmn_prior.c(131): cmn_prior_update: from < 50.16 10.24 3.44 2.45 4.35 -5.52 -9.80 2.17 -4.13 -1.48 -3.39 -0.10 -1.88 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 48.14 10.65 3.09 2.29 3.56 -5.10 -9.35 2.25 -3.69 -1.62 -2.82 -0.10 -1.23 >
INFO: fsg_search.c(843): 154 frames, 7536 HMMs (48/fr), 13619 senones (88/fr), 653 history entries (4/fr)

2018-01-03 18:40:34.133792+0200 OpenEarsTest[2672:954627] Pocketsphinx heard “HEY THERE” with a score of (0) and an utterance ID of 4.
2018-01-03 18:40:34.138785+0200 OpenEarsTest[2672:954570] The received hypothesis is HEY THERE with a score of 0 and an ID of 4

January 5, 2018 at 11:16 am #1032191

Halle Winkler
Politepix

Thanks for the logging. This is a bit unusual in my experience so I’m trying to pin down whether there are any contributing factors, pardon my questions. How close is your implementation to the sample app which ships with the distribution? Do you get the same results when just altering the sample app to support this grammar? Is there anything about the environment (or I guess even the speaker) which could contribute to the results here?

January 16, 2018 at 4:06 pm #1032204

olegnaumenko
Participant

Thank You for the reply.
Everything is standard as in example, except, for this log i changed mode to grammar and supplied several phrases I would like it to recognize (as in top of the log). XCode 9.2, iPhone 5s. I am not native English speaker but I speak it not bad. And the point of the experiment is to only fire when the proper phrase is said, in proper English.
Author

Posts

Viewing 21 posts - 1 through 21 (of 21 total)

You must be logged in to reply to this topic.