Using Speech to Text to train the user to speak better?

This topic has 2 replies, 2 voices, and was last updated 8 years, 5 months ago by tonyknight.

Viewing 3 posts - 1 through 3 (of 3 total)

Advertisement: “NeatSpeech is great-sounding offline speech synthesis, compatible with iOS6.1, and you can even edit pronunciations!”

Author

Posts
November 5, 2015 at 2:13 am #1027194

tonyknight
Participant

As I typed out the title, I realized it had a lot of potential to be perceived as a really dumb question so let me explain.

I have a really small vocabulary of commands, and the user can also enter first and last names from their contacts or other places that can then be recognized. My app is used by people in about 30 different countries, and some commands and names are not recognized because the user’s utterance isn’t a close enough phonetic match to get recognition. There are also cases where the contact name’s phonetic match is far different from what the user expects it to be, so no match occurs.

I had thought that we could create an interface where the user could tap a command or a name and hear Open Ears say that word. Is this close enough to train the user how to say the word to get a better match, or would this complicate things too much?

Thanks,
Tony

November 5, 2015 at 9:41 am #1027199

Halle Winkler
Politepix

Hi Tony,

I honestly don’t know if this will help. You are correct that the phonetics that OpenEars will respond to in English approximately mirror the ones used in OpenEars’ English speech synthesis. What I can’t speak to is whether it would be a good user experience to hear what is probably going to be perceived as a name mispronunciation (in the cases where recognition isn’t good out of the box due to the name not being of English origin or relatively common) and be asked to use that mispronunciation by a software program for the benefit of the program. Names are sensitive subjects and the solution might feel kind of procrustean to the user. It seems like a case for some user testing.

November 6, 2015 at 12:53 am #1027222

tonyknight
Participant

Hello Halle,

Thank you for responding so quickly. I think it is worth a try.

Most proper names seem to be recognized well, so this playback of names in the dictionary would be in the cases where they can’t get recognition of some names. It might be useful for them to hear something that might lead them to get better recognition. I agree that it not ideal to get the user to modify their pronounciation for the benefit of recognition, but in this case there might be a compelling motive for the user to do so. We use OpenEars to help users tag photos they are scanning with names either in their contacts or in their family tree.

Another aspect of this I have not mentioned was that we are also using OpenEars to support commands languages other than English. The way we do this is we transliterate words from other languages to sounds approximating English. For example, we have a command that allows the user to scan the back of a photo that might have some important notations on it, and we automatically join the front with the back. The normal command is ‘Capture’, but if the user says ‘Backscan’, we will associate the newly scanned image with the previous one. If the user selects German as their language, this command is ‘umdrehen’. In our app, we place this command in the dictionary as ‘OOMDRAYEN’, and generally we get good recognition. As you can image, this isn’t perfect so having the user have the ability to hear the transliteration may help with recognition. We also have these language commands in the cloud, so we can change them dynamically if we get a better transliteration. In our next version of the software, we will let the user override out transliteration in favor of their own.

We have dome some nifty things with OpenEars in our app. You can check it out at qroma.net.

Many thanks,
Tony
Author

Posts

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.