about using a custom dictionary

This topic has 3 replies, 2 voices, and was last updated 8 years ago by Halle Winkler.

Viewing 4 posts - 1 through 4 (of 4 total)

Advertisement: “Rejecto is a plugin for OpenEars™ and RapidEars that lets you ignore speech that isn't in your vocabulary!”

Author

Posts
April 1, 2016 at 2:20 am #1029893

xwang
Participant

Hi Halle,
I read from following thread,
https://www.politepix.com/forums/topic/question-about-using-a-custom-dictionary/
and i want to use my own custom dictionary so i create a acoustic model bundle with only LanguageModelGeneratorLookupList.text in it.

so in (generateLanguageModelFromArray), i use the path of my own acoustic model bundle(which locate at Caches), and in (startListeningWithLanguageModelAtPath), i use the original bundle path.

It works perfect before i upgraded to the newest version of Openears(Rapid ear and Rejecto),
it still works after upgrade, but it will prints following error

Error: an attempt was made to load the g2p file for the acoustic model at the path /var/mobile/Containers/Data/Application/3FFAA228-0764-48B7-BAF6-42CE5AAB2717/Library/Caches/AcousticModelEnglish1.bundle and it wasn’t possible to complete.

I don’t know if i need to pay attention to this error, since the engine still works as before.

BTW: actually, i just create a directory named AcousticModelEnglish1.bundle, and copy LanguageModelGeneratorLookupList.text into it.

April 1, 2016 at 6:16 pm #1029895

Halle Winkler
Politepix

Hello,

What is the goal of using the custom dictionary file? This may affect my advice a little bit.

April 3, 2016 at 12:07 am #1029914

xwang
Participant

What we thought is we don’t need to recognize all the words, so we only need some of them in that look up list, thus different users have different words in the list.
And also in the list that exists words like
OPENWINDOW OW P AH N W IH N D OW
which not exists in the original one.

April 3, 2016 at 11:22 am #1029915

Halle Winkler
Politepix

OK, the reason I ask is that a lot of effort has been taken to make sure the use of the lookup list is extremely fast, so unless you have tested the timing and discovered that there is a big difference between the use of your list and the default list, it usually doesn’t accomplish very much to reduce the list, and of course it can potentially lead to less accuracy to remove words. Adding words (as mentioned in the blog post referenced) is the expected/supported usage, since it has the potential to increase accuracy. To add words, just use the same acoustic model that ships with OpenEars 2.5 and add your words to it in the alphabetically-correct location in its language model lookup list.

Can I ask the amount of speed improvement in model generation time you saw by using a lookup list with words removed?

You probably already know this, but to clarify for other readers who find this topic: the lookup lists aren’t the dictionary used during recognition and have no effect at all on recognition speed or accuracy once the model has been generated – the dictionary used during recognition is a newly generated dictionary which is already reduced only to the words needed by the vocabulary. The lookup list is only used very briefly during the generation of the language model.
Author

Posts

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.