Duplicate words in dictionary

Home Forums OpenEars Duplicate words in dictionary

Viewing 3 posts - 1 through 3 (of 3 total)

  • Author
    Posts
  • #1027839
    touchapptech
    Participant

    I’ve found that if I add the certain words, such as “CLOSE” or “RESUME” to the array passed to generateLanguageModelFromArray or the dictionary passed to generateGrammarFromDictionary, when Pocketshpinx starts listening, I get this:

    2016-02-06 13:42:30.115 myapp[1048:661228] Project has these words or phrases in its dictionary:
    CLOSE
    CLOSE(2)

    or

    2016-02-06 13:45:00.021 myapp[1060:662742] Project has these words or phrases in its dictionary:
    RESUME
    RESUME(2)
    RESUME(3)

    Even tested it just adding the single word with no other words, and still get the duplicates. Not all words trigger this, but some others that I’ve found that do are “FAVORITES”, “TV”, “ENTER” and “EXIT”, but I’m guessing there are probably others. Don’t know what I’m doing that is causing this. I can give you the complete logs if you need, but thought you might know off the top of your head what was causing this, and if it is a problem or just something I should ignore. Trying to keep the language model as small as possible, so would like to figure out how to keep the duplicates out.

    Thanks!

    #1027840
    touchapptech
    Participant

    So was just looking at the words again and realize there are two pronunciations of “close” and two (that I can think of) for “resume”. So guess that accounts for some of it. But not sure why it would cause duplicates for “exit” or “enter” or “favorites”? And is there a way even for a word like “close” (in my case it is as in “close the door”) that I can keep it from generating the duplicates?

    Thanks!

    #1027841
    Halle Winkler
    Politepix

    Hi,

    This is correct behavior. If there is an alternate pronunciation in the dictionary that means that there is a common accent that uses it, so if you just include the ones which correspond to your own accent, other users will be excluded. It shouldn’t lead to any unwanted behavior. All recognitions of the various pronunciations will be returned as just the word itself in the hypothesis without the (2) or (3) in the word, since that part is managed by the grammar or language model (which only has the one textual representation) rather than the pronunciation dictionary.

Viewing 3 posts - 1 through 3 (of 3 total)
  • You must be logged in to reply to this topic.