This topic has 4 voices, contains 14 replies, and was last updated by Isaac Squires 149 days ago.
| Author | Posts |
|---|---|
| Author | Posts |
| October 13, 2011 at 5:47 pm #7688 | |
|
slowfoxtrot |
I would like to add to the library of cmusphinx you are using this latest developed keyword spotting library: http://cmusphinx.svn.sourceforge.net/viewvc/cmusphinx/branches/long-audio-aligner/KeyWordSpotting/ How difficult would this be to add to the current library? I am basically wanting a one-word dictionary to match that one word spoken and to ignore everything else. I don’t care about any other word spoken but that one word. Any ideas? Brock Haymond |
| October 13, 2011 at 6:51 pm #7690 | |
|
Halle |
Hi Brock, Unless I’m mistaken that library is in Java, so it would have to be ported to C or something similar before you could give it a try integrating it with your OpenEars install. Even at that point, it would be a difficult question to answer since I don’t really know much about your experience with C or iOS static libraries, etc. |
| October 14, 2011 at 12:02 am #7692 | |
|
Halle |
Have you tried just creating a JSGF grammar with a single word in it? |
| October 14, 2011 at 5:32 am #7693 | |
|
slowfoxtrot |
How does the processing logic differ for a JSGF grammar as compared to an ARPA model? Does JSGF ignore words not in the dictionary? |
| October 14, 2011 at 6:06 am #7694 | |
|
slowfoxtrot |
Ok, I have successfully tested it with a JSGF grammar and I noticed it is actually significantly faster. However with a single word “siri” I can still say “tree” and it thinks I said “siri.” Here is the logging: 2011-10-13 23:01:27.086 OpenEarsSampleProject[281:6203] OPENEARSLOGGING: Speech detected… INFO: fsg_search.c(1395): Start node ++CLICK++.0:3:39 Any other ideas? It seems to be a simple problem but I can’t seem to figure it out yet. I just want it to throw everything it hears that isn’t that one word. Thanks for your help! |
| October 14, 2011 at 6:13 am #7695 | |
|
slowfoxtrot |
Also, interestingly enough, I thought I would play around with the kFILLPROB values, and I have raised it to “1.0″ without seeing any difference in performance in the app… Am I doing something wrong? I thought it wouldn’t recognize anything with a value for the noise dictionary that high? Thanks, |
| October 14, 2011 at 3:59 pm #7696 | |
|
Halle |
Since your issue doesn’t relate to noise levels I’d leave kFILLPROB alone (actually, it’s supposed to be left alone under all circumstances but occasionally it’s worth a try when the issue is very high noise levels and nothing else is working). What happens if you add word representation of the phonemes that aren’t in SIRI to your JSGF and dictionary and ignore them when they are perceived (I have no idea if this will be effective, it’s a guess)? JSGF weighting is also an option but the only other person on this board who tried it said it wasn’t working (apparently in OpenEars but perhaps it is actually not working as expected in Pocketsphinx since there is an open Pocketsphinx bug on this subject). Another option for you is to create an ARPA model that has your single word in it with a low probability. However, you can’t use LanguageModelGenerator to create it. I would just create it by hand and assign it a low probability relative to 0.0. This is also untested by me. |
| October 15, 2011 at 10:30 pm #7697 | |
|
slowfoxtrot |
It seems I accidentally found a solution! So I continued to play around with kFILLPROB and it turns out to act differently than expected. Turning it up to 1.0 didn’t really show anything, so I turned it down to 0.00000000000001 and what I found was that ANY sound I made was matched to “Siri.” Any clap, crackle, anything that would trip that speech was detected. So I decided to try going in the opposite direction. I set it up to 50000.0 (ya crazy high) and I discovered NOTHING I said would be recognized as Siri. What I discovered is that if I set the kFILLPROB to about 1000.0 (I’m guessing it is relating to the probability score, usually when I say “Siri” it is detected with a score below 1000) it recognizes “Siri” every time I say it and anything else is either kicked to “null” or “”. I can now adjust the sensitivity to the word by kicking it up a little to say 2000 or down a little to 10 depending on how strict you want to be on filtering. Stricter will only respond to “Siri” but you have to say it very proper, and looser might result in more false positives. All in all I have complete control over it responding to only one word now and ignoring all else, including how sensitive I want it to be to the word! I noticed, however, this kFILLPROB value only works with JSGF grammars, as with ARPA it doesn’t seem to have any effect whatsoever. Thanks for your help! Brock Haymond |
| October 15, 2011 at 10:51 pm #7698 | |
|
Halle |
OK, well glad it works for you. Keep in mind that probability scores can vary due to device, microphone and environment so any fixed values need to be tested under multiple scenarios. |
| October 16, 2011 at 12:23 pm #7699 | |
|
Halle |
(Also variations among speakers such as age, gender and accent, in case it needs mentioning). |
| October 17, 2011 at 2:50 pm #7702 | |
|
Joseph S. Wisniewski |
> I set it up to 50000.0 (ya crazy high) Not at all. Many Sphinx parameters are large exponentials, which get turned immediately into large logs and added as integers inside Sphinx. By Sphinx standards, making something a million times larger or smaller is a “small” change. When I tuned the beams, I went up and down in trillions. The default was 1e-48, and I ran a script at 1e-12, 24, 36, 48, 60, 72, 84, and 96, a range of e84 or a trillion trillion trillion trillion trillion trillion trillion. Take that, Carl Sagan! |
| October 17, 2011 at 3:03 pm #7703 | |
|
Joseph S. Wisniewski |
OK, do you want to spot your one word inside a huge mix of other words, or floating around alone? In other words, do you want a box that you set for the word “unto” and then you can feed in the Gettysburg address Four score and seven years ago, our forefathers brought forth unto this continent a great nation, conceived in liberty, etc etc etc and have it trip on the 9th word? Or is your environment going to be peaceful, with somebody lobbing the word “ginger” at the box every now and then? Sphinx recognition is “competitive”, if there’s nothing for your single word to compete against, it’s going to trip a lot, as it thinks up ways of “making itself believe” that “score” is really “unto”. The way I usually get around that is making a dictionary like… unto AH N T UW Then in the grammar, put “unto” into competition with “gibberish” like this public (spot) = ( unto | gibberish+ )+ The “spot” should be in greater than/less than symbols, not parenthesis, but the forum software consumed them, and I forgot how we worked around that the last time this happened…
|
| December 9, 2011 at 11:15 pm #8237 | |
|
Isaac Squires |
Hi Joseph, I’ve been thinking about this thread for awhile, and just recently tried implementing what you describe in a JSGF grammar. What I realized afterwards, is that I don’t know how to prevent gibberish from competing with the actual word that should match. In your example it looks like you included all phonemes as gibberish. I suspect I could probably remove the phonemes that match the target word, but that only works if there is only 1 or 2 words being matched. Do you have a trick here that I’m missing? |
| December 12, 2011 at 6:21 pm #8244 | |
|
Joseph S. Wisniewski |
Actually, I typically do remove the phonemes that occur in the word (i refer to it, humorously, as an anti-word) but it still performs surprisingly well of you don’t do this. Sphinx uses triphones, so it doesn’t just model “unto” in that example as ah/n/t/uw, it tries to model it as sil-ah-h / ah-n-t / n-t-uw /t-uw-sil When there’s a “phoneme loop” like gibberish+, one or more of the gibberish “word”, the word is made of “context independent” triphones, essentially “isolated” phonemes, so it’s version of “unto” maps to a more generic ah/n/t/uw, which isn’t as strong as that proper version. (I won’t get into Sphinx’s use of “interword” triphones as opposed to word triphones). |
| December 20, 2011 at 11:41 pm #8253 | |
|
Isaac Squires |
Awesome – thanks for the additional info. |
You must be logged in to reply to this topic.

OpenEars
Our Flying Friends