Tagged: grammar
| Author | Posts |
|---|---|
| Author | Posts |
| November 15, 2011 at 9:51 pm #8071 | |
|
Pi |
Japanese language works on a 9×5 grid. 9 Consonants, 5 Vowels. so 9×5=45 CV-Pairs. One CV-pair ( like Fu’ ‘Ji’ ‘Ka’ ‘Zu’ ‘Mo’ ‘To’ etc ) is called a Kana ( http://en.wikipedia.org/wiki/Kana” ) I would like to detect a stream of Kana in real-time. looking at the Sphinx documentation, it seems I just need to change the .dic and .languagemodel, which I did — http://cl.ly/BqZY now this gives an error in SphinxTrain.c so I went onto the CMUSphinx channel on IRC, and the resident expert told me: “to load jsgf grammar you need to use -jsgf option instead of -lm option by default which is used by openears” however he then went on to say ‘actually from the log it seems openears creates grammar itself’ looking through the open ears Xcode project, I cannot figure out how to make use of this information. I can’t see what I need to do. Can anyone help? π PS I would be willing to pay someone to get this working… please anyone interested e-mail me (sunfish7|gmail|c0m) |
| November 15, 2011 at 10:01 pm #8075 | |
|
Pi |
somehow I seem to have lost the ability to edit my own post: I should add more information… from the screenshot you will be able to see how I’m attempting to catch a stream of Kana, using grammar KanaStream; public = BAH | BEH | BEE | BOR | BOO; public = … ( I can’t write this because the HTML highlighting gets confused by it… it is in the screenshot ) I am not at all confident this is correct. I got it from reading http://cmusphinx.sourceforge.net/wiki/tutoriallm also I’m not sure how such a system will output data. will it actually output ‘BAH BEH BEE etc as I speak these Consonant-Vowel pairs? or is some modification to the code required? π |
| November 15, 2011 at 10:13 pm #8078 | |
|
Halle |
Take a look at the OpenEars documentation to see how to load JSGF in OpenEars rather than a language model. It’s documented on this page: http://www.politepix.com/openears/yourapp OpenEars does not create a .gram file itself, you must add it and set the startListening: method so that it knows JSGF is being used. I’d be very surprised if recognition functioned well with your consonant-vowel pairs since those are only one or two phonemes, but the way it will work (if it works) is that after a detected silence on the part of the speaker, recognition will be attempted. You can see how this works by running the sample app. |
| November 15, 2011 at 11:04 pm #8079 | |
|
Halle |
A secondary issue with this besides the problem of trying to detect non-words using an acoustic model trained with phrases is that the English-language acoustic model of OpenEars will further not be able to detect Japanese phonemes except in the rare cases that they both overlap with English phonemes and happen to have been successfully recognized. |
| November 15, 2011 at 11:22 pm #8080 | |
|
Pi |
Hi Halle, Thanks for the replies, and thanks for this incredible work you have done. Looking through this project, I would never have managed to do this myself. It is a Herculean task! I am so stupid! A few weeks back I installed it and only read the help as far as getting the sample app working. I was unaware of the help page you linked. And it couldn’t have been any easier to find. My bad. Ok, so it looks on first sight as if I just need to change a NO to a YES on the last parameter of [self.pocketsphinxController startListeningWithLanguageModelAtPath:self.pathToGrammarToStartAppWith and it seems I need to do this five times or so… looks like there are several situations that warrant starting the engine up. however, this is still giving a runtime error. then I noticed the documentation specifies that a grammar file shouldn’t have the extension .languagemodel but instead should be .gram so I changed the file name. also I ran a search through the code and found one instance that required changing: self.pathToGrammarToStartAppWith = [NSString stringWithFormat:@"%@/%@",[[NSBundle mainBundle] resourcePath], @”OpenEars1.gram”]; I also found a few other references but couldn’t figure out whether I should change it anywhere else. So I didn’t. now it runs. But as soon as I say something it breaks. Not sure how to push it forwards now… PS as regards your comments, firstly I believe there is a good chance it will work. I am choosing a set of phonemes that are all linearly independent of one another so to speak. So if I’m using k I will not use g. if I use t I will not use d. etc. so I’m cutting down from say 35 down to under 15. also by only using a minimal dictionary of CV pairs, I will not be using a lot of combinations. hopefully the engine can look through the dictionary and pick out all the tri-phones being used, and only work with this set. if it does that, I’ll be using a very small fraction of the triphones of a full language model. maybe 2%. secondly, I’m not actually at all interested in Japanese. Sorry, misleading post title. I’m just interested in getting the computer to recognise which CV pair was recognised on the grid. so I would be able to speak any combination and it would hit them spot on. it is for a speech keyboard ( which I need for myself as I have chronic RSI ). I am looking at training my own acoustic model if necessary, but I would like to try this experiment first; I don’t really fancy speaking gibberish into the microphone for five days solid. and I am daunted by the task of incorporating all of that into this framework. even this task I am out of my depth with. I just put it as Japanese because I have recently discovered they use this grid for their language; I am fascinated. I was hoping to find some Kana recognition software, I’m disappointed I can’t find any. It was still in my head when I was writing the post… |
| November 15, 2011 at 11:35 pm #8081 | |
|
Halle |
Thank you. If you want debugging help, you should read this for info about what logging is necessary to post: http://www.politepix.com/forums/topic/install-issues-and-their-solutions/ |
| November 16, 2011 at 12:03 am #8082 | |
|
Pi |
here is the log: http://pastebin.com/hzqdYzAU EDIT: Oops found it line 205: JSGF parse of /var/mobile/Applications/A97B621D-16D9-4BF8-B00B-85D3FF3635A6/OpenEarsSampleProject.app/OpenEars1.gram failed so I need to go back to the grammar documentation http://cmusphinx.sourceforge.net/wiki/tutoriallm I guess Nicolai has just told me the latest version of pocketSphinx reports more detailed information…
|
| November 16, 2011 at 12:35 am #8084 | |
|
Pi |
ok don’t know what I was thinking the first time I wrote the grammar file, it was all wrong. However, I’ve gone through the documentation and I can’t see what is wrong with this: ERROR: “fsg_search.c”, line 322: The word ‘and’ is missing in the dictionary I am following the syntax for a recursive grammar to the letter. I can’t figure out why it is drawing an error. any ideas? ( I can’t paste the grammar is text because it won’t display properly — it uses the same angled brackets as HTML formatting ) |
| November 16, 2011 at 12:53 am #8085 | |
|
Pi |
I am an idiot, I thought ‘and’ was a reserved keyword. removing it, it works! although it is indeed unusable. even with only five words Bah Beh Bii Boh Buu, most of the time, say 70%, it gets it right, but for example I might say ‘ Bee Bee Bah ‘ and it catches ‘Bee Bee Bah Beh’ which makes me scratch my head. it has completely invented what is to my ears a very unmistakable ‘B’ sound. it’s not as if it just got the phoneme wrong. it has actually got the wrong number of syllables. tomorrow I will play around with changing the phonemes, but it is looking like game over :| |
| November 16, 2011 at 11:17 am #8087 | |
|
Halle |
For what it’s worth, 70% is a lot better than I would have expected for what is basically a phoneme detection application. |
| November 16, 2011 at 2:17 pm #8088 | |
|
Pi |
I am going to experiment a little further, by switching to a Spanish model ( Spanish only has 5 vowel phonemes IIRC ) I may be able to get a significant improvement. however, I am kind of disturbed by the fact that if I speak k syllables, it is pretty much random how many syllables come back if I say ‘ba ba boo boo bee’ it may give ‘boo boo bee’ if I say ‘ba boo ba’ it might give ‘ba boo beh bor’ without knowing the intricacies of the algorithm I have no idea whether there is any possibility of getting a decent recognition. PS thanks for correcting the topic!
|
| November 19, 2011 at 1:13 am #8142 | |
|
Joseph S. Wisniewski |
There’s not a Spanish model worth bothering with in Sphinx format, and the vowel coordinates are pretty far off. You’ll also find that there’s a lot of Japanese CV pairs beyond the basic 5×9. Have a look at the phoneme set used for the Julius recognition system. Julius is a robust Japanese free recognizer. There may be scripts out there to convert the Julius acoustic models to Sphinx format. Good luck. |
| November 19, 2011 at 10:40 am #8143 | |
|
Pi |
Hey Joseph, I’m starting to see the same people in all the speech recognition hangouts now :) I am working with an HTK engineer to build my own acoustic model, then I am going to test it in HTK and if it is decent I should be able to drop it into OpenEars ( hopefully?! ). I will post back to the thread when I get some result either way. PS I’m going to check out Julius as well…
|
| January 4, 2012 at 2:41 pm #8365 | |
|
aerialcombat |
I’m sort of having a similar problem. I would be very much interested to know how everything turned out. |
| January 4, 2012 at 3:50 pm #8366 | |
|
Halle |
This kind of a task is unlikely to give really satisfactory results for a commercial app, pretty much. The context of entire words and phrases such as the hmm was built with is not really that dispensable for decent results. I think it’s more something to experiment with or do research on than to base an app concept on that you’re expecting to ship in the near term which does a great job at recognizing different syllables. There are a few ideas like this that come up a lot here and in other similar places (although they get asked more here than elsewhere, I’ve noticed, I think because I hear about a large variety of small and practical commercial projects here) which are just very difficult challenges: keyword spotting, recognizing digits, syllable detection, pronunciation correctness rating. I would say that these are dangerous tasks to base an expensive project on (expensive either in terms of your income-productive development time as an indie dev, or a client or employer’s budget) and should probably be rethought in that case, but they are interesting to pursue in lower-stakes situations. |
You must be logged in to reply to this topic.

OpenEars
Our Flying Friends