Reply To: 'Key Phrase Spotting' Guidance

Home Forums OpenEars plugins 'Key Phrase Spotting' Guidance Reply To: 'Key Phrase Spotting' Guidance

#1027501
Halle Winkler
Politepix

Hi Tim,

Maybe this would be a good case for setting up a replication case for me:

https://www.politepix.com/forums/topic/how-to-create-a-minimal-case-for-replication/

The reason is that generally when you create a wake-up phrase, the goal is that it shouldn’t sound much like utterances that are likely to be spoken, in order to avoid false positives (although some are inevitable). Many of the utterances you’re reporting will never be spoken or overheard in thousands of hours of incidental speech (hero focus, yellow mocus) so they primarily demonstrate that your choice of wake-up phrase was very good, rather representing a case of false positive that would reward troubleshooting work. If you do extensive testing to try to eliminate rhymes that should be expected to be low-score recognitions and that are very low-likelihood to be spoken, you will unintentionally adjust your results towards false negatives, so it is important with speech applications to test against and debug relatively probable things.

The overheard video speech is more important, but as you said, it is ambiguous in combination with the specific reports of syllable-matching rhymes that initiated the forum topic. So consider giving me a replication case as shown above so we can both see exactly the same results from a usage case which is probable. Note, it is very unlikely I can review any replication cases until 2016.

I do have one suggestion for your wake-up phrase which maybe could help you get a slightly more Rule-O-Rama-like result with Rejecto – add your phrase to the LanguagModelGeneratorLookupList.text in the acoustic model in the alphabetically correct place, i.e. right after these entries:

HELLO	HH AH L OW
HELLO(2)	HH EH L OW

add these entries:

HELLOFOCUS	HH AH L OW F OW K AH S
HELLOFOCUS(2)	HH EH L OW F OW K AH S

Please note that there is a tab between the word and the pronunciation and it has to remain there, with no spaces added to the beginning or end of the line.

This will not prevent near-rhymes from being heard, but it will allow you to create a language model which includes the “word” HELLOFOCUS that will not be recognized as the words hello and focus separately, meaning that Rejecto can then handle unrelated utterances before and after (but not within, which you don’t want anyway since that will give you false negatives for intentional but imperfect utterances of the phrase). If it works, you should see both of these pronunciations present in your dynamically generated .dic file found in your caches folder.