Maximum number of words that can be added to the language dictionary

Home Forums OpenEars Maximum number of words that can be added to the language dictionary

Tagged: ,

Viewing 4 posts - 1 through 4 (of 4 total)

  • Author
    Posts
  • #1019796
    hari
    Participant

    i am creating an iOS app for full language recognition .I want to know the maximum number of words that can be added to the language dictionary ,any one please reply

    #1019797
    Halle Winkler
    Politepix

    Welcome,

    Sorry, you can’t use offline recognition for large vocabulary recognition. A good language model size for OpenEars is no more than ~1000 words and smaller models than that will perform better and be more accurate. This kind of recognition is used for a particular search domain, i.e. where you know something about the topic of the speech you are trying to recognize and can design a vocabulary that corresponds to it.

    #1019799
    hari
    Participant

    thanks for replying,

    I have https://svn.code.sf.net/p/cmusphinx/code/trunk/pocketsphinx/model/lm/en_US/cmu07a.dic dictionary with me. I want to know whether it can be used in iPhone or not.It is having more than 50,000 words in it.is it possible using your sdk.

    #1019800
    Halle Winkler
    Politepix

    Hello Hari,

    It isn’t the dictionary that defines which words Pocketsphinx is able to recognize. The dictionary is used by the language model or grammar to find the pronunciation of a word which is already part of a language model or grammar. So the idea of using a dictionary with Pocketsphinx or OpenEars’ implementation of Pocketsphinx is not the right one, since the dictionary is not the vocabulary. How this works is going to be important to understand regardless of the size of the vocabulary you choose to go with. The OpenEars docs and the Pocketsphinx docs each cover this so they are worth a look.

    Regarding your follow-up question, I did answer it above without any ambiguity. If you want to do large vocabulary recognition with Pocketsphinx, it will have to be done as or via a network service. Even if you do take that approach, there are no pre-rolled large vocabulary sets for Pocketsphinx which cover likely speech for an iPhone user in 2014, so in order to get good accuracy rates, you would have to create your own language model consisting of real phrases your users would say rather than dropping something in which has already been created, since an inappropriate language model will lead to low accuracy.

    BTW, you don’t need to add cmu07a.dic to an OpenEars app — it is already in the English acoustic model bundle under the name LanguageModelGeneratorLookupList.text.

Viewing 4 posts - 1 through 4 (of 4 total)
  • You must be logged in to reply to this topic.