tonyknight

Forum Replies Created

Viewing 2 posts - 1 through 2 (of 2 total)

Advertisement: “NeatSpeech is great-sounding offline speech synthesis, compatible with iOS6.1, and you can even edit pronunciations!”

Author

Posts
November 6, 2015 at 12:53 am in reply to: Using Speech to Text to train the user to speak better? #1027222

tonyknight
Participant

Hello Halle,

Thank you for responding so quickly. I think it is worth a try.

Most proper names seem to be recognized well, so this playback of names in the dictionary would be in the cases where they can’t get recognition of some names. It might be useful for them to hear something that might lead them to get better recognition. I agree that it not ideal to get the user to modify their pronounciation for the benefit of recognition, but in this case there might be a compelling motive for the user to do so. We use OpenEars to help users tag photos they are scanning with names either in their contacts or in their family tree.

Another aspect of this I have not mentioned was that we are also using OpenEars to support commands languages other than English. The way we do this is we transliterate words from other languages to sounds approximating English. For example, we have a command that allows the user to scan the back of a photo that might have some important notations on it, and we automatically join the front with the back. The normal command is ‘Capture’, but if the user says ‘Backscan’, we will associate the newly scanned image with the previous one. If the user selects German as their language, this command is ‘umdrehen’. In our app, we place this command in the dictionary as ‘OOMDRAYEN’, and generally we get good recognition. As you can image, this isn’t perfect so having the user have the ability to hear the transliteration may help with recognition. We also have these language commands in the cloud, so we can change them dynamically if we get a better transliteration. In our next version of the software, we will let the user override out transliteration in favor of their own.

We have dome some nifty things with OpenEars in our app. You can check it out at qroma.net.

Many thanks,
Tony

July 2, 2015 at 4:57 am in reply to: Change in scoring with 2.04 #1026286

tonyknight
Participant

Hi,

Thank you for the thoughtful answer.

We had started using recognition scores about a year ago when we noticed that some noises were being recognized as words in our relatively small vocabulary (about 15 words) A door would slam, and it would be recognized as a very short word in the vocabulary. We noticed that when a person said that word, it would be somewhere between -800 and 0, so we ignored commands lower that -2000. This ended up working quite well on a large group of testers.

Later, we added a ambiguous detection algorithm for lower confidence recognition scores that fell inside a range lower than -800. If a phrase was detected in this range (say -800 to -2000), the user would need to repeat it on the next detection for it to be accepted. This cut down lots of false positives.

Another use for the ambiguous detection routine was in detecting first and last names. Our app allows user to tag photos using contacts from their phone or from family tree software. One thing we don’t want users to do is get the first name from one contact and the second name from another contact to form the first and last name someone they are tagging. We enforce this by going back to our data model to compare the recognized name against the database. We will reject it if they don’t match, but if it is within the ambiguous range we will accept it if it is repeated on the very next detection.

Since we started doing this, OpenEars has definitely gotten better, and perhaps our rational for doing this doesn’t exist anymore. We will do some testing with a larger group of people on an altered scale to see if it still useful.

Thanks,
Tony
Author

Posts

Viewing 2 posts - 1 through 2 (of 2 total)