NeatSpeech British pronunciations

This topic has 5 replies, 2 voices, and was last updated 6 years, 9 months ago by Halle Winkler.

Viewing 6 posts - 1 through 6 (of 6 total)

Advertisement: “RapidEars is an OpenEars™ plugin that lets you perform speech recognition while the user is still speaking!”

Author

Posts
July 24, 2017 at 2:44 pm #1031975

zammer
Participant

I’m trying to use NeatSpeech for a UK oriented product but I keep finding pronunciations that are miles away from the real word. My guess is that the CMU phonemes don’t translate brilliantly in some cases. I also noticed that the cmudict is version 0.4 rather than current 0.7. Do you have any knowledge of how to update the dictionary or source a British accented one?

I found https://github.com/rhdunn/cmudict-tools – would it be a Festival format?

Thanks

July 24, 2017 at 3:04 pm #1031976

Halle Winkler
Politepix

Hi,

There are two issues – one is how the phonemes are said (this should be correctly handled by the UK voices) and the other is which phonemes the local pronunciation contains and/or are accented (this can be quite different, for instance in the words aluminum or garage). The CMU dictionary is a US speech dictionary, so as far as I know there is no version of it which will preference UK pronunciations over US ones. It sounds to me like your issue is with the latter case, is that correct?

July 24, 2017 at 3:09 pm #1031977

zammer
Participant

Hi,

Yes, I think you are correct that it is the second issue. I edited a couple of particularly bad words:

E.g. Awkward went from:
(“awkward” nil (((ax) 0) ((k w er d) 0)))
to
(“awkward” nil (((ao r) 1) ((k w er d) 0)))
and was much improved.

Do you know of a version of the CMU dictionary with UK pronunciations at all?

Thanks,

Martin

July 24, 2017 at 3:11 pm #1031978

zammer
Participant

Sorry, you answered my question already…

My only clue is that apparently there is a conversion table to “en-GB-x-rp” for CMU but I couldn’t get any further than that.

Martin

July 24, 2017 at 8:38 pm #1031981

Halle Winkler
Politepix

My strong suspicion is that that table is for converting a voice that uses US phonemes to sound like received pronunciation, because that could be done tolerably by a table (e.g. “er” at the end of a word always sounds like a US “ah”), while converting words which actually have different pronunciations would have to be a long list of exceptional cases, including different accented syllables.

July 24, 2017 at 8:48 pm #1031982

Halle Winkler
Politepix

When I’ve had problems like these (things I wanted to fix by hand which were too many and too distributed across the language), this is how I’ve handled it. 1) I’ve searched for some canonical list of $WORDS, where in this case they are the list of words pronounced differently in US and UK English at the word level, and 2) got a list of the 5,000 most-used words in the language overall, and 3) taken the intersection of these two lists. At that point you may have a short enough list, but relevant enough, to make it not too terrible of a job to change them manually. If it’s still too much you can reduce 5,000 to something smaller, or vice versa if you discover it isn’t as many common words as you thought.
Author

Posts

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.