Thank you for the thoughtful answer.
We had started using recognition scores about a year ago when we noticed that some noises were being recognized as words in our relatively small vocabulary (about 15 words) A door would slam, and it would be recognized as a very short word in the vocabulary. We noticed that when a person said that word, it would be somewhere between -800 and 0, so we ignored commands lower that -2000. This ended up working quite well on a large group of testers.
Later, we added a ambiguous detection algorithm for lower confidence recognition scores that fell inside a range lower than -800. If a phrase was detected in this range (say -800 to -2000), the user would need to repeat it on the next detection for it to be accepted. This cut down lots of false positives.
Another use for the ambiguous detection routine was in detecting first and last names. Our app allows user to tag photos using contacts from their phone or from family tree software. One thing we don’t want users to do is get the first name from one contact and the second name from another contact to form the first and last name someone they are tagging. We enforce this by going back to our data model to compare the recognized name against the database. We will reject it if they don’t match, but if it is within the ambiguous range we will accept it if it is repeated on the very next detection.
Since we started doing this, OpenEars has definitely gotten better, and perhaps our rational for doing this doesn’t exist anymore. We will do some testing with a larger group of people on an altered scale to see if it still useful.