How do you improve recognition?

Home Forums OpenEars How do you improve recognition?

Viewing 2 posts - 1 through 2 (of 2 total)

  • Author
    Posts
  • #1019043
    SparkyNZ
    Participant

    Hi. I’m using OpenEars to recognise keyword phrases such as:

    PLAYER STOP
    PLAYER PLAY

    What I have found with the tutorial code is that I can say “PLAYER GO” and it matches one of the above phrases (usually PLAYER PLAY).

    The recognitionScore delegate gives me a recognitionScore and utteranceID. I am not seeing much difference between using PLAYER PLAY and PLAYER GO in the recognitionScore values.

    I also can’t find any information on what recognitionScore actually represents. I’m seeing negative score values – but what is a good score and what is a bad score? Is closer to zero better? Should I ever expect a positive score?

    I have my vocabulary defines as follows:

    NSArray *words = [NSArray arrayWithObjects:@”PLAYER STOP”, @”PLAYER PLAY”, @”PLAYER SKIP”, @”PLAYER BACK”, nil];

    Here’s the debug output for “PLAYER GO”, “PLAYER GO”, “PLAYER GO” and “PLAYER GO” which I would hope does not match any words.

    2013-11-30 17:21:42.052 TestApp[43282:907] Pocketsphinx has detected speech.
    2013-11-30 17:21:43.729 TestApp[43282:907] Pocketsphinx has detected a period of silence, concluding an utterance.
    2013-11-30 17:21:44.005 TestApp[43282:907] The received hypothesis is PLAYER with a score of -219 and an ID of 000000000
    2013-11-30 17:21:44.109 TestApp[43282:907] Pocketsphinx is now listening.
    2013-11-30 17:21:46.782 TestApp[43282:907] Pocketsphinx has detected speech.
    2013-11-30 17:21:48.464 TestApp[43282:907] Pocketsphinx has detected a period of silence, concluding an utterance.
    2013-11-30 17:21:48.763 TestApp[43282:907] The received hypothesis is PLAYER PLAY with a score of -179 and an ID of 000000001
    2013-11-30 17:21:48.847 TestApp[43282:907] Pocketsphinx is now listening.
    2013-11-30 17:21:51.258 TestApp[43282:907] Pocketsphinx has detected speech.
    2013-11-30 17:21:53.073 TestApp[43282:907] Pocketsphinx has detected a period of silence, concluding an utterance.
    2013-11-30 17:21:53.377 TestApp[43282:907] The received hypothesis is PLAYER PLAY with a score of -1464 and an ID of 000000002
    2013-11-30 17:21:53.457 TestApp[43282:907] Pocketsphinx is now listening.
    2013-11-30 17:21:55.485 TestApp[43282:907] Pocketsphinx has detected speech.
    2013-11-30 17:21:57.168 TestApp[43282:907] Pocketsphinx has detected a period of silence, concluding an utterance.
    2013-11-30 17:21:57.457 TestApp[43282:907] The received hypothesis is PLAYER PLAY with a score of -2932 and an ID of 000000003
    2013-11-30 17:21:57.551 TestApp[43282:907] Pocketsphinx is now listening.

    #1019055
    Halle Winkler
    Politepix

    Hi,

    The score is a negative number representing decreasing probability as their distance from zero increases. It has very limited application in an app — you can use it to compare to another score within the same session, speaker and environment only, but scores should never be compared with each other or with a constant across multiple sessions or speakers. The reason for this is that the score is very heavily influenced by the speaker, their accent and other speech characteristics, the mic used, the distance from the mic, and the background noise. So if you try to pick an arbitrary number that means “accurate score” for everyone, you will end up excluding all recognition for speakers who don’t match the profile of your test speaker plus their test environment. It’s better to ignore it in nearly all cases.

    I regret including the scores in the callbacks since the first versions of OpenEars because now they would be hard to remove, but they don’t bring a lot to the table and are often looked at as a way to refine accuracy confidence across multiple users, which they aren’t particularly good for since the engine itself has already used the scoring for that to the extent it is useful. You can use them to see if accuracy for a particular utterance has increased or decreased within the same session.

    The task you are doing is sort of on the line between keyword spotting and command-and-control, which means that out of vocabulary recognition (the engine hearing a word that isn’t in the vocabulary and matching it to a word that is in the vocabulary) is a big issue as you’ve discovered. This is the usage case for which Rejecto was developed.

Viewing 2 posts - 1 through 2 (of 2 total)
  • You must be logged in to reply to this topic.