Custom Acoustic Model – Missing g2p file

Home Forums OpenEars Custom Acoustic Model – Missing g2p file

Viewing 4 posts - 1 through 4 (of 4 total)

  • Author
    Posts
  • #1030187
    Powerkey
    Participant

    I am trying to create a custom AcousticModel for my app. I simply replaced the contents of the LMGLL.text file in the AcousticModelEnglish bundle with my manually generated list of names.

    My corpus text contains entries like…

    John-Smith
    Jane-Doe

    My modified LookupList contains entries like…

    John-Smith<tab>JH AA N S M IH TH
    Jane-Doe<tab>JH EY N D OW

    When the app runs I get the message that the word John-Smith does not exist in the dictionary and that it is using the fallback method to generate graphemes.

    2016-04-25 13:43:46.908 TalkApp[9674:5082253] Starting OpenEars logging for OpenEars version 2.501 on 64-bit device (or build): iPhone running iOS version: 9.300000
    2016-04-25 13:43:46.971 TalkApp[9674:5082253] Starting dynamic language model generation
    .
    INFO: ngram_model_arpa_legacy.c(504): ngrams 1=1027, 2=2050, 3=1025
    INFO: ngram_model_arpa_legacy.c(136): Reading unigrams
    INFO: ngram_model_arpa_legacy.c(543):     1027 = #unigrams created
    INFO: ngram_model_arpa_legacy.c(196): Reading bigrams
    INFO: ngram_model_arpa_legacy.c(561):     2050 = #bigrams created
    INFO: ngram_model_arpa_legacy.c(562):        3 = #prob2 entries
    INFO: ngram_model_arpa_legacy.c(570):        3 = #bo_wt2 entries
    INFO: ngram_model_arpa_legacy.c(293): Reading trigrams
    INFO: ngram_model_arpa_legacy.c(583):     1025 = #trigrams created
    INFO: ngram_model_arpa_legacy.c(584):        2 = #prob3 entries
    INFO: ngram_model_dmp_legacy.c(521): Building DMP model...
    INFO: ngram_model_dmp_legacy.c(551):     1027 = #unigrams created
    INFO: ngram_model_dmp_legacy.c(652):     2050 = #bigrams created
    INFO: ngram_model_dmp_legacy.c(653):        3 = #prob2 entries
    INFO: ngram_model_dmp_legacy.c(660):        3 = #bo_wt2 entries
    INFO: ngram_model_dmp_legacy.c(664):     1025 = #trigrams created
    INFO: ngram_model_dmp_legacy.c(665):        2 = #prob3 entries
    2016-04-25 13:43:47.018 TalkApp[9674:5082253] Done creating language model with CMUCLMTK in 0.046206 seconds.
    2016-04-25 13:43:47.019 TalkApp[9674:5082253] Since there is no cached version, loading the language model lookup list for the acoustic model called AcousticModelEnglish
    2016-04-25 13:43:47.020 TalkApp[9674:5082253] The word John-Smith was not found in the dictionary of the acoustic model /Users/powerkey/Library/Developer/CoreSimulator/Devices/504E24B0-3556-4CFE-BAA8-E316926491B2/data/Containers/Bundle/Application/F11E5786-82FE-4C84-8A6D-5DF547950513/TalkApp.app/AcousticModelEnglish.bundle. Now using the fallback method to look it up. If this is happening more frequently than you would expect, likely causes can be that you are entering words in another language from the one you are recognizing, or that there are symbols (including numbers) that need to be spelled out or cleaned up, or you are using your own acoustic model and there is an issue with either its phonetic dictionary or it lacks a g2p file. Please get in touch at the forums for assistance with the last two possible issues.
    2016-04-25 13:43:47.020 TalkApp[9674:5082253] Using convertGraphemes for the word or phrase john which doesn't appear in the dictionary
    2016-04-25 13:43:47.021 TalkApp[9674:5082253] Using convertGraphemes for the word or phrase smith which doesn't appear in the dictionary
    2016-04-25 13:43:47.022 TalkApp[9674:5082253] the graphemes "JH AA N S M IH TH" were created for the word John-Smith using the fallback method.

    My expectations are that if the word in the corpus matches the word in the LookupList then the grapheme will be generated using the pronunciations in the LookupList. I seem to be missing something.

    I also tried creating an AcousticModelCustom (leaving the AcousticModelEnglish alone) with my custom names, but now I get a more messages regarding a missing g2p file.

    2016-04-25 14:04:30.109 TalkApp[9714:5137214] Starting OpenEars logging for OpenEars version 2.501 on 64-bit device (or build): iPhone running iOS version: 9.300000
    2016-04-25 14:04:30.173 TalkApp[9714:5137214] Starting dynamic language model generation
    .
    INFO: ngram_model_arpa_legacy.c(504): ngrams 1=1028, 2=2051, 3=1026
    INFO: ngram_model_arpa_legacy.c(136): Reading unigrams
    INFO: ngram_model_arpa_legacy.c(543):     1028 = #unigrams created
    INFO: ngram_model_arpa_legacy.c(196): Reading bigrams
    INFO: ngram_model_arpa_legacy.c(561):     2051 = #bigrams created
    INFO: ngram_model_arpa_legacy.c(562):        3 = #prob2 entries
    INFO: ngram_model_arpa_legacy.c(570):        3 = #bo_wt2 entries
    INFO: ngram_model_arpa_legacy.c(293): Reading trigrams
    INFO: ngram_model_arpa_legacy.c(583):     1026 = #trigrams created
    INFO: ngram_model_arpa_legacy.c(584):        2 = #prob3 entries
    INFO: ngram_model_dmp_legacy.c(521): Building DMP model...
    INFO: ngram_model_dmp_legacy.c(551):     1028 = #unigrams created
    INFO: ngram_model_dmp_legacy.c(652):     2051 = #bigrams created
    INFO: ngram_model_dmp_legacy.c(653):        3 = #prob2 entries
    INFO: ngram_model_dmp_legacy.c(660):        3 = #bo_wt2 entries
    INFO: ngram_model_dmp_legacy.c(664):     1026 = #trigrams created
    INFO: ngram_model_dmp_legacy.c(665):        2 = #prob3 entries
    2016-04-25 14:04:30.202 TalkApp[9714:5137214] Done creating language model with CMUCLMTK in 0.028684 seconds.
    2016-04-25 14:04:30.203 TalkApp[9714:5137214] Since there is no cached version, loading the language model lookup list for the acoustic model called AcousticModelCustom
    2016-04-25 14:04:30.204 TalkApp[9714:5137214] Since there is no cached version, loading the g2p model for the acoustic model called AcousticModelCustom
    2016-04-25 14:04:30.204 TalkApp[9714:5137214] Error: an attempt was made to load the g2p file for the acoustic model at the path /Users/powerkey/Library/Developer/CoreSimulator/Devices/504E24B0-3556-4CFE-BAA8-E316926491B2/data/Containers/Bundle/Application/D106895C-2174-40BB-AABA-FF2145542035/TalkApp.app/AcousticModelCustom.bundle and it wasn't possible to complete.  This file does not appear to exist. Please ask for help in the forums and be sure to turn on all logging. An exception or unpredictable behavior should be expected now since this file is a requirement.
    2016-04-25 14:04:30.204 TalkApp[9714:5137214] Error: a g2p is missing in a case where one will be needed. Expect an exception shortly. If you need help getting a new acoustic model set up with a g2p please come by the forums and inquire.
    2016-04-25 14:04:30.205 TalkApp[9714:5137214] The word John-Smith was not found in the dictionary of the acoustic model /Users/powerkey/Library/Developer/CoreSimulator/Devices/504E24B0-3556-4CFE-BAA8-E316926491B2/data/Containers/Bundle/Application/D106895C-2174-40BB-AABA-FF2145542035/TalkApp.app/AcousticModelCustom.bundle. Now using the fallback method to look it up. If this is happening more frequently than you would expect, likely causes can be that you are entering words in another language from the one you are recognizing, or that there are symbols (including numbers) that need to be spelled out or cleaned up, or you are using your own acoustic model and there is an issue with either its phonetic dictionary or it lacks a g2p file. Please get in touch at the forums for assistance with the last two possible issues.
    2016-04-25 14:04:30.205 TalkApp[9714:5137214] the graphemes "" were created for the word John-Smith using the fallback method.
    #1030193
    Halle Winkler
    Politepix

    Hi,

    The only modification I support is adding entries to an existing acoustic model lookup list in the alphabetically-correct location, but not altering the bundle contents or removing entries from the lookup list, sorry.

    #1030195
    Powerkey
    Participant

    Okay. I think I can work within that process, but I have a few questions to make sure I understand the details.

    1. Will pocketsphinx only recognize full names if my corpus contains only full names?

    2. Does the fallback method utilize the lookup list in AcousticModelEnglish? i.e. Would an error in the English lookup list cause problems with the fallback method?

    3. If I duplicate (in the Finder) the AcousticModelEnglish.bundle, rename it to AcousticModelCustom.bundle, add it to my project and point the pathToModel method to the Custom bundle, would you expect that to work? Or, should I just modify the lookup list in the English bundle?

    #1030196
    Halle Winkler
    Politepix

    Hi,

    1. Will pocketsphinx only recognize full names if my corpus contains only full names?

    I think we already talked this one through in your previous question, but if we haven’t, clarify it a little more with reference to your previous questions so I can understand what differentiates it, thanks.

    2. Does the fallback method utilize the lookup list in AcousticModelEnglish? i.e. Would an error in the English lookup list cause problems with the fallback method?

    Sorry, the question is a bit outside of the scope of support here – make sure that you only add entries to the lookup list which are valid and in the alphabetically-correct position so there is no need to discuss acoustic model failure states. If your changes to the lookup list lead to functionality issues you should remove them.

    3. If I duplicate (in the Finder) the AcousticModelEnglish.bundle, rename it to AcousticModelCustom.bundle, add it to my project and point the pathToModel method to the Custom bundle, would you expect that to work? Or, should I just modify the lookup list in the English bundle?

    That should work fine.

Viewing 4 posts - 1 through 4 (of 4 total)
  • You must be logged in to reply to this topic.