Reply To: OpenEars and the main thread

Home Forums OpenEars OpenEars and the main thread Reply To: OpenEars and the main thread

Halle Winkler

That is my suggestion. Since 1.2 LanguageModelGenerator has been able to take a custom or customized master phonetic dictionary in place of cmu07a.dic, and for this task I think the best approach is to add to cmu07a.dic and keep using it. In order to get the pronunciations to add I would do the following steps (this involves a little hacking to OpenEars and I’m leaving implementation up to you):

1. Source some kind of comprehensive list of the most popular tracks with whatever cutoff for popularity and timeframe makes sense for you. Feed the whole thing into LanguageModelGenerator using the simulator and save everything that triggers the fallback method.

2. Attempt to pick all the low-hanging fruit by trying to identify any compound nouns which consist of words which are actually in the master phonetic dictionary, just not with each other.

3. Do the tedious step with the remaining words of transcribing their phonemes using the same phoneme system found in cmu07a.dic.

4. Add them all to cmu07a.dic.

5. Sort cmu07a.dic alphabetically (very important), use.

With some good choices about your source data you could make a big impact on the processing time. The rest of it might have to be handled via UX methods such as further restricting the number of songs parsed or only using songs which are popular with the user, and informing the user in cases where this could lead to confusion.

Remember that for Deichkind and similar, you aren’t just dealing with a unknown word but also a word that will be pronounced with phonemes which aren’t present in the default acoustic model since it is made from North American speech. There’s nothing like that “ch” sound in English, at least as it is pronounced in Hochdeutsch. If your target market is German-speaking and you expect some German-named bands in there it might be worthwhile to think about doing some acoustic model adaptation to German-accented speech. I just did a blog post about it.