OOV words – training ?

This topic has 5 replies, 2 voices, and was last updated 10 years, 1 month ago by Halle Winkler.

Viewing 6 posts - 1 through 6 (of 6 total)

Advertisement: “RuleORama is an OpenEars™ plugin that lets you create rules-based grammars for fixed phrase recognition, fast enough for RapidEars!”

Author

Posts
March 1, 2014 at 11:10 pm #1020399

ranavision
Participant

I’ve tried openears sucessfully, and generate custom vocabularies from arrays.
However, we need to recognize names from the addressbook, and here the fallback method inevitably fails miserably very often…
Do I understand correctly Flite is asked to synthesize the text, and returns an utterance that in turn generates the phonemes? Is it possible to inject a user recording instead and get back the phonemes?
Or can I get to see what the actual phonemes were that Openears thinks it has heard? I could then simply add them to the dictionary, using a few (2), (3) etc alternatives?
I’ve played around a bit with Rejecto but it doesn’t seem to help.

I could write a parser and transcribe the texts (mainly French and Dutch sounding names) to phonemes myself, but that would be a lot of (guessing) work…. I’m pretty sure the data I need is generated somewhere in the recognition process???

March 2, 2014 at 8:46 am #1020400

Halle Winkler
Politepix

Hello,

In my direct experience name generation from the addressbook usually works quite well on the order of about 90% accuracy – are the names primarily coming from another language?

March 2, 2014 at 11:37 am #1020403
ranavision
Participant
Yes, like I said, Dutch, French. A few examples of differences fallback vs. (manual) below.
I’m not denying the 90%, I was having several ‘wow’ moments wint openears!
But with larger lists, the accuracy drops, not only because of the missed names, but also because it starts ‘constructing’ hits with mis-interpreted bits from these names…
So I was still hoping, as per my original question, that what I need is right there somewhere already :-)
- CLAES K L EY Z (K L HH AH Z)
- HANNOT HH AE N AA T (HH AE N OW)
- HEYRAUD HH EH R AO D (EY R OW)
- JACQUELINE(3) JH AE K AH L IH N (JH AH K L IY N)
- JEANFRANCOIS JH IY N F R AE N S W AA (JH AA N F R AA N Z W AH)
- MAILLEUX M EH L OW (M AA UW)
- PATRICK P AE T R IH K (P AA T R IH K)
- VILQUIN V IH L K W IH N (V IH L K AY N)
March 2, 2014 at 12:05 pm #1020404

Halle Winkler
Politepix

No, there is really no way that you can get really accurate phoneme transcriptions from words from other languages using English-language tools. What makes them work correctly in English is that they follow grapheme to phoneme rules which can be said to apply to English (to the extent that g2p ever works well with English), so by definition they have to give mixed results for other languages.

Here is how the two methods in OpenEars work in English. The first method tries to look up the word in the dictionary. If it isn’t found, the second method uses a set of rules for estimating the phonemes in a word based on the graphemes in it. These rules only apply to English – other languages have very different rules. Pure phoneme-based recognition doesn’t work well so IMO there is no way to reverse-engineer a pronunciation from a speech recognition utterance that will have a higher degree of accuracy than the fallback method. This problem will be extremely compounded by the fact that you will be doing recognition of people with accents that are very different than the speakers found in the English acoustic model that corresponds to the phonemes you’re looking for.

If this were my task I would try to break it down as follows:

1. Get a big word list of French names and a big word list of Dutch names. Both languages have much more consistent grapheme to phoneme rules than English, so it should be possible to find their phonemes in their native language by finding a source for those rules, for instance a g2p software library.
2. Create a map by which you can convert the phonemes found in those two word lists into closest approximations of the ones in the English set that are used by the acoustic model.
3. Add this combined list to the lookup list used by the main method.

But getting good results with the language model generator in a language other than English or Spanish is really not expected behavior, so this will definitely need some kind of strategy for obtaining more text data including pronunciation transcriptions.

March 2, 2014 at 12:38 pm #1020406

ranavision
Participant

There are a few more approaches, one could be step-2 like: mapping dutch graphemes to english graphemes and submit that to the fallback. Had some good results with that and works on the fly.
Thanks for the feedback, I now know better what way to go about it.
And, finally, never trust IH K S P EH K T AH D B AH HH EY V Y ER.

March 2, 2014 at 1:19 pm #1020408

Halle Winkler
Politepix

There are a few more approaches, one could be step-2 like: mapping dutch graphemes to english graphemes and submit that to the fallback. Had some good results with that and works on the fly.

Yes, this is also a very good idea which exploits the fact that Dutch spelling is way more standard than English spelling.

And, finally, never trust IH K S P EH K T AH D B AH HH EY V Y ER.

S AA R IY,
AY
K AH N
OW N L IY
SH EY V
W AH N
Y AE K
AH T
AH
T AY M.
Author

Posts

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.