November 18, 2012 at 4:25 pm #13072
I am working on a financial application where I would like the user to be able to input large numbers using one voice. For example I would like a user to be able to input their salary as “twenty eight thousand five hundred” rather than “two eight five zero zero zero”.
I have looked around online for a number grammar which can support this but I have been unable to find one. As I imagine this is a common requirement I thought a grammar for this would be readily available. Could someone please point me in the right direction?
Thanks in advance.November 18, 2012 at 4:28 pm #13073
I’m not aware of a pre-rolled grammar for large numbers, sorry. I generally recommend not using JSGF due to slow performance and what seems like slightly buggy recognition in the engine. Have you tried generating a text corpus of number words and creating your own ARPA language model (like in this blog post: https://www.politepix.com/2012/11/02/openears-tips-1-create-a-language-model-before-runtime-from-a-text-file/)?November 18, 2012 at 6:26 pm #13074
Thanks for the link. The text corpus to detect all of the possible numbers is going to be fairly large. Do you have any advice on then going back from the recognised strings to numbers?
BenNovember 18, 2012 at 9:24 pm #13075
I’ve never thought about this task so this is not coming from a position of experience with it, but if the maximum is (for instance) 999,999 this seems to me that it would need [0-9], a set of tens incrementing by ten going up to “90”, a set of hundreds incrementing by 100 going up to “900”, and a set of thousands incrementing by 1000 going up to “9000”, so a model with a base set of 40 unigrams which have equal probability of being found in a particular bigram or trigram. Out of that you can make 999,999 with the available words “nine hundred”, “ninety” “nine thousand” “nine hundred” “ninety” “nine”. It seems that interpreting this back into digits should be possible to construct a ruleset for since there are only a few variations on correct statement of a number in English. I can also see why you would want a grammar, however, to have a rules-based recognition that you can be more confident about processing backwards into digits.November 18, 2012 at 11:01 pm #13077
I have tried to implement something similar and it seems to be working fairly well.
I have included “and” as this is often used within numbers. “nine hundred and eight one”.
One issue I am having is that “thirty” “fifty” and “eighty” are often wrongly identified as each other.
I will try adding “one hundred”, “two hundred” … into the grammar as this should make it slightly easier to parse.
ANDNovember 18, 2012 at 11:06 pm #13078
Looks like a good start. There might be an accent bias hurting accuracy since the default acoustic model is comprised of US speech. You might want to adapt the model to a variety of UK accents using your number set as the speech corpus. This may get you some improvement with the thirty/fifty/eighty issue.November 19, 2012 at 12:07 am #13080
Halle how would I go about using my number set as a speech corpus?November 19, 2012 at 12:19 am #13081
To learn about how an acoustic model is adapted you probably want to check out the CMU Sphinx project, since that isn’t something I can support from here beyond pointing you to the docs at the CMU project since it isn’t part of OpenEars: http://cmusphinx.sourceforge.net/wiki/tutorialadapt
The corpus of speech you would want to use in order to adapt to a UK accent for your particular application would have a number of different speakers with the desired UK accents saying the words for which you want more accuracy (I would have them say all of the words in your language model). Basically you will want to make recordings of your speakers saying the words and then you will use the acoustic model adaptation method linked above to integrate their speech into the acoustic model. The result ought to be that your adapted acoustic model will get better at recognizing/distinguishing between those words in the accents you include. The acoustic model you end up with can be used with OpenEars just like the default acoustic model.November 19, 2012 at 1:44 am #13082
Thanks for the link. I will definitely look into that!
One more thing. Is there a way to queue things to be spoken?
Currently if I request the fliteController to say something whilst it is already talking it ignore it. Ideally i’d like it to queue the request and start it when the previous speech has stopped. Will I need to manually implement this behaviour?November 19, 2012 at 8:26 am #13083
This isn’t a feature of FliteController, but NeatSpeech operates with a queue and it renders the new speech in the background so that it generally starts playing instantly when the previous speech is complete, and it has a male and female UK voice.April 21, 2014 at 3:31 pm #1020916
Please check out the new dynamic generation language for OpenEars added with version 1.7: https://www.politepix.com/2014/04/10/openears-1-7-introducing-dynamic-grammar-generation/April 24, 2014 at 6:07 pm #1021025
In addition to the dynamic grammar generation that has been added to stock OpenEars in version 1.7, there is also a new plugin called RuleORama which can use the same API in order to generate grammars which are a bit faster and compatible with RapidEars: https://www.politepix.com/ruleorama
- You must be logged in to reply to this topic.