How do you improve recognition?

This topic has 1 reply, 2 voices, and was last updated 10 years, 4 months ago by Halle Winkler.

Viewing 2 posts - 1 through 2 (of 2 total)

Advertisement: “Did you know OpenEars™ can use rules-based grammars to recognize fixed phrases? And RuleORama lets you use them with RapidEars!”

Author

Posts
November 30, 2013 at 5:24 am #1019043

SparkyNZ
Participant

Hi. I’m using OpenEars to recognise keyword phrases such as:

PLAYER STOP
PLAYER PLAY

What I have found with the tutorial code is that I can say “PLAYER GO” and it matches one of the above phrases (usually PLAYER PLAY).

The recognitionScore delegate gives me a recognitionScore and utteranceID. I am not seeing much difference between using PLAYER PLAY and PLAYER GO in the recognitionScore values.

I also can’t find any information on what recognitionScore actually represents. I’m seeing negative score values – but what is a good score and what is a bad score? Is closer to zero better? Should I ever expect a positive score?

I have my vocabulary defines as follows:

NSArray *words = [NSArray arrayWithObjects:@”PLAYER STOP”, @”PLAYER PLAY”, @”PLAYER SKIP”, @”PLAYER BACK”, nil];

Here’s the debug output for “PLAYER GO”, “PLAYER GO”, “PLAYER GO” and “PLAYER GO” which I would hope does not match any words.

2013-11-30 17:21:42.052 TestApp[43282:907] Pocketsphinx has detected speech.
2013-11-30 17:21:43.729 TestApp[43282:907] Pocketsphinx has detected a period of silence, concluding an utterance.
2013-11-30 17:21:44.005 TestApp[43282:907] The received hypothesis is PLAYER with a score of -219 and an ID of 000000000
2013-11-30 17:21:44.109 TestApp[43282:907] Pocketsphinx is now listening.
2013-11-30 17:21:46.782 TestApp[43282:907] Pocketsphinx has detected speech.
2013-11-30 17:21:48.464 TestApp[43282:907] Pocketsphinx has detected a period of silence, concluding an utterance.
2013-11-30 17:21:48.763 TestApp[43282:907] The received hypothesis is PLAYER PLAY with a score of -179 and an ID of 000000001
2013-11-30 17:21:48.847 TestApp[43282:907] Pocketsphinx is now listening.
2013-11-30 17:21:51.258 TestApp[43282:907] Pocketsphinx has detected speech.
2013-11-30 17:21:53.073 TestApp[43282:907] Pocketsphinx has detected a period of silence, concluding an utterance.
2013-11-30 17:21:53.377 TestApp[43282:907] The received hypothesis is PLAYER PLAY with a score of -1464 and an ID of 000000002
2013-11-30 17:21:53.457 TestApp[43282:907] Pocketsphinx is now listening.
2013-11-30 17:21:55.485 TestApp[43282:907] Pocketsphinx has detected speech.
2013-11-30 17:21:57.168 TestApp[43282:907] Pocketsphinx has detected a period of silence, concluding an utterance.
2013-11-30 17:21:57.457 TestApp[43282:907] The received hypothesis is PLAYER PLAY with a score of -2932 and an ID of 000000003
2013-11-30 17:21:57.551 TestApp[43282:907] Pocketsphinx is now listening.

December 2, 2013 at 10:24 am #1019055

Halle Winkler
Politepix

Hi,

The score is a negative number representing decreasing probability as their distance from zero increases. It has very limited application in an app — you can use it to compare to another score within the same session, speaker and environment only, but scores should never be compared with each other or with a constant across multiple sessions or speakers. The reason for this is that the score is very heavily influenced by the speaker, their accent and other speech characteristics, the mic used, the distance from the mic, and the background noise. So if you try to pick an arbitrary number that means “accurate score” for everyone, you will end up excluding all recognition for speakers who don’t match the profile of your test speaker plus their test environment. It’s better to ignore it in nearly all cases.

I regret including the scores in the callbacks since the first versions of OpenEars because now they would be hard to remove, but they don’t bring a lot to the table and are often looked at as a way to refine accuracy confidence across multiple users, which they aren’t particularly good for since the engine itself has already used the scoring for that to the extent it is useful. You can use them to see if accuracy for a particular utterance has increased or decreased within the same session.

The task you are doing is sort of on the line between keyword spotting and command-and-control, which means that out of vocabulary recognition (the engine hearing a word that isn’t in the vocabulary and matching it to a word that is in the vocabulary) is a big issue as you’ve discovered. This is the usage case for which Rejecto was developed.
Author

Posts

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.