 dave
|
Hi,
We’re looking to enable a set of commands, e.g.
“Command1 Command1a”
“Command1 Command1b”
“Command2 Command2a”
“Command2 Command2b
etc.
We’ve found that placing these phrases in the dynamically generated dictionary results in words being individually recognised (e.g. “Command2a” on its own). Is there a way to force phrase recognition so that the whole phrase is taken into consideration rather than individual words?
Thanks :)
|
 Halle
|
Hi Dave,
If you are putting in the entire phrase as your NSString when doing the dynamic generation (i.e. @”My Entire Phrase” rather than @”My”, @”Entire”,@”Phrase” ) and that nonetheless isn’t doing the job of giving you whole-phrase recognition, you can require that the entire phrase is recognized by using a JSGF grammar that has the whole phrase instead of an ARPA LM. The .dic contents will be the same for either approach.
If you are using dynamic model generation and want to keep that ability, you can just dynamically create the model, use the dictionary that is created, don’t use the ARPA language model that is created, and then programmatically create your own .gram with your restrictive JSGF grammar according to the (reasonably simple) rules of JSGF generation. There is a JSGF example in the sample app and there’s lots of info about the rules of JSGF online (but ignore the parts about dynamically linking to other public grammar files since OpenEars doesn’t support that).
If you move over to JSGF you will lose the ability to dynamically switch between grammars on the fly in OpenEars. If that is important to you, you would do better to stick with an ARPA LM because I can’t give an ETA on when/if I will extend switching to support JSGF.
To improve whole-phrase recognition with an ARPA language model, you can just raise the probability value of the entire trigram in the language model (this will again require that you create the lm yourself, since LanguageModelGenerator has to work on the assumption that you want evenly-distributed detection probabilities.
Let me know if you want me to elaborate on any specifics.
|
 dave
|
Hi Halle,
Thanks for your prompt reply.
When i attempt to use the JSCF with Version 0.91 I get a EXEC_BAD_ACCESS crash within pocketsphinx.c within this method: ps_start_utt(ps_decoder_t *ps, char const *uttid)
*Crash occurs on this line *
if (ps->search == NULL) {
My grammar looks like:
#JSGF V1.0;
grammar samplegrammar;
public = ONE | TWO | THREE | FOUR;
and is included as follows:
useJSGF = TRUE;
self.pathToDynamicallyGeneratedGrammar = [NSString stringWithFormat:@"%@/%@", [[NSBundle mainBundle] resourcePath], @”OpenEars.gram”];
Thanks, as always, for your help.
|
 ramsegal
|
Hello,
I started using openears with ARPA language model,
and everything works smooth.
Now i am facing the same issue – i need to recognize phrases rather than individual words, so i read this thread and tried to convert my dynamically created ARPA language model to JSGF model as you advised.
I started out creating a simple example.
I created a .gram file with the following content:
#JSGF V1.0;
public = [RAM SEGAL];
I added it to my project as explained in the tutorial, and i changed the array i use for building the dic to:
NSArray *languageArray = [[NSArray alloc] initWithArray:[NSArray arrayWithObjects: @"RAM SEGAL", nil]];
and i changed the call for startListeningWithLanguageModelAtPath (i set the languageModelIsJSGF to TRUE)
I get the following error when running:
ERROR: syntax error
JSGF parse of /var/mobile/Applications/0635A160-D4B0-4866-8CA8-6B35962D46E4/Documents/OpenEarsDynamicGrammar.languagemodel failed
Can you give me a simple example how to create a .gram file?
Thanks!
Ram
|
 Halle
|
Hi Ram,
Sure — I think you may have two issues here but they are easy to fix. Issue 1 is the syntax, as you said. I think you just need to get rid of the square brackets, but you can see an example of a known-working .gram file if you download the old version of OpenEars here and unzip the project and look in its language model folder:
http://www.politepix.com/wp-content/uploads/0.9.02.zip
I think the Sphinx project introduction to JSGF should also be helpful and give some ideas about creating restrictive grammars:
http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/jsgf/JSGFGrammar.html
The second issue is that you say you created a .gram file but from the error you got, the file that PocketsphinxController is being told to use has the suffix .languagemodel, so it looks like you may not have given your .gram file as the input in your call to start PocketsphinxController. If your JSGF grammar has the suffix .languagemodel, I’d recommend changing it both for clarity when seeking support and because the dynamic generator will always create both an ARPA language model file and dictionary with the same name and different suffixes when you use it, which means that it may be overwriting your JSGF grammar with an ARPA language model. If you name the JSGF grammar with the .gram suffix there is no confusion over whether the language model generator might be overwriting it.
|
 ramsegal
|
Hi Halle,
Thanks for the quick and helpful answer!
I did had 2 issues, the syntax, which i fixed using the example from the old openears, and i forgot to change the path to the .gram file..i changed only the file name…
I got it working thanks to you.
Thanks,
Ram.
|
 Halle
|
Excellent, glad that worked.
|
 scderek
|
Hi Halle,
I seem to have the exact same problem as Dave (Post #7047) while running .912
I was just wondering if there was any offline help that you gave to Dave as his posting trail runs cold. I initially feel somewhat hesitant giving you the full logging dump in hopes that I might be doing something you have already addressed…. Otherwise logging/.gram/.dic contents it is!
BTW, thank you very much. This website/tutorials + source code + your attentiveness to forum users are all greatly appreciated!
|
 Halle
|
I was just wondering if there was any offline help that you gave to Dave as his posting trail runs cold.
Nope, he just decided not to post his logs.
|
 scderek
|
ok, Logging info for ver .912:
Quick Facts:
* Using your current sampleproject just to figure out this bug/error I am doing. Everything was ok on startup/installation.
* Modifications include, In OpenEarsSampleProjectViewController.mm All instances of languageModelIsJSGF are set to true, and the path that originally held the “OpenEars.languagemodel” was changed to “OpenEars.gram”
* OpenEars.gram file includes:
#JSGF V1.0;
public = Backward Forward;
* Same result on simulator & iPhone device. Crashes on pocketsphinx.c on line #619 when it has started listening to your voice.
The .gram file does not seem to be loaded in correctly. Stalking Dave’s posts some more, he seems to have success in generating a .gram file dynamically. I’ll likely try that out or possibly revert back to .902 (saw success on that in other posts) if I can’t find a solution to this…
1970-02-06 14:34:23.-839 OpenEarsSampleProject[743:707] OPENEARSLOGGING: The audio session has never been initialized so we will do that now.
1970-02-06 14:34:23.-783 OpenEarsSampleProject[743:707] OPENEARSLOGGING: AudioSessionManager startAudioSession has reached the end of the initialization.
1970-02-06 14:34:23.-778 OpenEarsSampleProject[743:707] OPENEARSLOGGING: Exiting startAudioSession.
1970-02-06 14:34:23.-643 OpenEarsSampleProject[743:707] OPENEARSLOGGING: Starting dynamic language model generation
1970-02-06 14:34:23.-638 OpenEarsSampleProject[743:707] OPENEARSLOGGING: Running MITLM
1970-02-06 14:34:23.-269 OpenEarsSampleProject[743:707] OPENEARSLOGGING: Using convertGraphemes for the word or phrase QUIDNUNC which doesn’t appear in the dictionary
1970-02-06 14:34:24.-779 OpenEarsSampleProject[743:707] OPENEARSLOGGING: I’m done running dynamic language model generation and it took 0.859269 seconds
1970-02-06 14:34:24.-773 OpenEarsSampleProject[743:707] Dynamic language generator completed successfully, you can find your new files OpenEarsDynamicGrammar.languagemodel
and
OpenEarsDynamicGrammar.dic
at the paths
/var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/Documents/OpenEarsDynamicGrammar.languagemodel
and
/var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/Documents/OpenEarsDynamicGrammar.dic
1970-02-06 14:34:24.-767 OpenEarsSampleProject[743:707]
INFO: cmd_ln.c(512): Parsing command line:
\
-jsgf /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/OpenEars.gram \
-dict /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/OpenEars1.dic \
-fdict /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/noisedict \
-hmm /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app \
-maxhmmpf 3000 \
-maxwpf 5
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-argfile
-ascale 20.0 2.000000e+01
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/OpenEars1.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/noisedict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app
-input_endian little little
-jsgf /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/OpenEars.gram
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 3000
-maxnewoov 20 20
-maxwpf -1 5
-mdef
-mean
-mfclogdir
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: cmd_ln.c(512): Parsing command line:
\
-nfilt 20 \
-lowerf 1 \
-upperf 4000 \
-wlen 0.025 \
-transform dct \
-round_filters no \
-remove_dc yes \
-svspec 0-12/13-25/26-38 \
-feat 1s_c_d_dd \
-agc none \
-cmn current \
-cmninit 39 \
-varnorm no
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 39
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.000000e+00
-ncep 13 13
-nfft 512 512
-nfilt 40 20
-remove_dc no yes
-round_filters yes no
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec 0-12/13-25/26-38
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 4.000000e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.500000e-02
INFO: acmod.c(238): Parsed model-specific feature parameters from /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/feat.params
INFO: feat.c(848): Initializing feature stream to type: ’1s_c_d_dd’, ceplen=13, CMN=’current’, VARNORM=’no’, AGC=’none’
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(163): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(520): Reading model definition: /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(330): Reading binary model definition: /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/mdef
INFO: bin_mdef.c(508): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/transition_matrices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size
INFO: ms_gauden.c(358): 0 variance values floored
INFO: s2_semi_mgau.c(897): Loading senones from dump file /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/sendump
INFO: s2_semi_mgau.c(921): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1016): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1293): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(294): Allocating 4115 * 20 bytes (80 KiB) for word entries
INFO: dict.c(306): Reading main dictionary: /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/OpenEars1.dic
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(309): 8 words read
INFO: dict.c(314): Reading filler dictionary: /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/noisedict
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(317): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(405): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
1970-02-06 14:34:24.-35 OpenEarsSampleProject[743:707] Pocketsphinx is starting up.
INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word triphones
INFO: fsg_search.c(139): FSG(beam: -1105112, pbeam: -1105112, wbeam: -648215; wip: -25842, pip: 0)
ERROR: syntax error
JSGF parse of /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/OpenEars.gram failed
1970-02-06 14:34:25.-831 OpenEarsSampleProject[743:6103] OPENEARSLOGGING: Starting openAudioDevice on the device.
1970-02-06 14:34:25.-828 OpenEarsSampleProject[743:6103] OPENEARSLOGGING: Audio unit wrapper successfully created.
1970-02-06 14:34:25.-810 OpenEarsSampleProject[743:6103] OPENEARSLOGGING: Set audio route to SpeakerAndMicrophone
1970-02-06 14:34:25.-805 OpenEarsSampleProject[743:6103] OPENEARSLOGGING: Setting the variables for the device and starting it.
1970-02-06 14:34:25.-802 OpenEarsSampleProject[743:6103] OPENEARSLOGGING: Looping through ringbuffer sections and pre-allocating them.
1970-02-06 14:34:25.-158 OpenEarsSampleProject[743:6103] OPENEARSLOGGING: Started audio output unit.
1970-02-06 14:34:25.-151 OpenEarsSampleProject[743:6103] OPENEARSLOGGING: Calibration has started
1970-02-06 14:34:25.-149 OpenEarsSampleProject[743:707] Pocketsphinx calibration has started.
1970-02-06 14:34:30.-922 OpenEarsSampleProject[743:6103] OPENEARSLOGGING: Calibration has completed
1970-02-06 14:34:30.-916 OpenEarsSampleProject[743:6103] OPENEARSLOGGING: Project has these words in its dictionary:
BACKWARD
CHANGE
FORWARD
GO
LEFT
MODEL
RIGHT
TURN
1970-02-06 14:34:30.-914 OpenEarsSampleProject[743:6103] OPENEARSLOGGING: Listening.
1970-02-06 14:34:30.-921 OpenEarsSampleProject[743:707] Pocketsphinx calibration is complete.
1970-02-06 14:34:30.-907 OpenEarsSampleProject[743:707] OPENEARSLOGGING: I’m running flite
1970-02-06 14:34:32.-697 OpenEarsSampleProject[743:707] OPENEARSLOGGING: I’m done running flite and it took 2.204869 seconds
1970-02-06 14:34:32.-694 OpenEarsSampleProject[743:707] OPENEARSLOGGING: Flite audio player was nil when referenced so attempting to allocate a new audio player.
1970-02-06 14:34:32.-690 OpenEarsSampleProject[743:707] OPENEARSLOGGING: Loading speech data for Flite concluded successfully.
1970-02-06 14:34:32.-625 OpenEarsSampleProject[743:707] Pocketsphinx is now listening.
1970-02-06 14:34:32.-616 OpenEarsSampleProject[743:707] OPENEARSLOGGING: Flite sending suspend recognition notification.
1970-02-06 14:34:32.-612 OpenEarsSampleProject[743:707] Flite has started speaking
1970-02-06 14:34:34.-696 OpenEarsSampleProject[743:707] OPENEARSLOGGING: AVAudioPlayer did finish playing with success flag of 1
1970-02-06 14:34:34.-542 OpenEarsSampleProject[743:707] OPENEARSLOGGING: Flite sending resume recognition notification.
1970-02-06 14:34:34.-32 OpenEarsSampleProject[743:707] Flite has finished speaking
|
 Joseph S. Wisniewski
|
I can see two things going on. Sphinx is case sensitive, so if your dictionary contains BACKWARD and FORWARD, and your JSGF contains Backward and Forward, you’re in trouble.
This isn’t seen when using LMs, because both the MIT LM tool built into OpenEars and the CMU web based tool convert both the words in the LM and the prons in the CMU dictionary to uppercase. It comes up to bite you when using either JSGF or FSG grammars, because Pocketsphinx retains case.
The second thing is that this doesn’t look like valid JSGF…
#JSGF V1.0;
public = Backward Forward;
I can see three things that look problematic.
1 – You need to give the grammar a name:
2 – You need to declare at least one rule. If you’ve looked at JSGF files, those are in greater than/less than signs.
3 – You’ve declared a phrase “forward backward” when you want to give the person a choice between the two. That’s vertical bars.
This runs. I just tried it.
#JSGF V1.0;
grammar dereks_grammar;
public = backward | forward;
You can define more rules, and use them like this…
#JSGF V1.0;
grammar dereks_grammar;
public = [go] ;
= backward | forward;
The is a rule. The [go] is just something optional. You can wrap anything in [].
It’s quite useful to have a computer (Windows or Linux) alongside your Mac, so you can run the standalone Sphinx demo program pocketsphinx_continuous, feed it an FSG, JSGF, or LM, and watch it run.
|
 scderek
|
Joseph, thanks a lot for the reply.
Unfortunately, adding the name and matching capitalization didn’t solve my error, but it greatly helps to know that it is in fact a programming error on my part if you are able to get it to work.
|
 Halle
|
Hi scderek,
Pretty sure it’s just about the validity of your JSGF syntax:
ERROR: syntax error
JSGF parse of /var/mobile/Applications/2AADAACF-4CAB-453C-9238-90822EE06D41/OpenEarsSampleProject.app/OpenEars.gram failed
There is a known-working .gram in .902 which you can download here:
http://www.politepix.com/wp-content/uploads/0.9.02.zip
I would bring it into your current project and start creating your JSGF by selectively replacing the words in it, and build it up from there. Once you have the beginning of a known-working grammar you can start to expand on the rules:
http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/jsgf/JSGFGrammar.html
|
 scderek
|
Success… from a very stupid mistake I thought I tested way back when.
At least for me, rule names are required, not optional. So a working example was one I pulled from .902.
#JSGF V1.0;
grammar samplegrammar;
public = FORWARD | BACKWARD;
Thanks to both of you guys for the regex syntax help.
|
 scderek
|
oh hahaha….
Joseph, I think you gave me the correct answer initially, but the rule name became omitted because this site parsed it out thinking it was an html tag. Same happened to me.
So going back to HTML character entities…
public <RequiredWord> = FORWARD | BACKWARD
-
This reply was modified 267 days ago by
Halle. Reason: Helped with html entities
|
 Joseph S. Wisniewski
|
I did. I didn’t notice that wordpress absorbed my tags…
It should have read
#JSGF V1.0;
grammar dereks_grammar;
public <dereks_rule> = backward | forward;
The second example, with a public and a private rule should have read
#JSGF V1.0;
grammar dereks_grammar;
public <main> = <direction>[go] ;
<direction> = backward | forward;
-
This reply was modified 266 days ago by
Joseph S. Wisniewski. Reason: trying to get HTML entites right
|
 Joseph S. Wisniewski
|
I ended up using the < and > entities. So <main> renders as <main>
|
 Halle
|
That’s kind of a PITA, I’ll see if I can open up brackets without it being a security thing.
|