Sentences, Word Order and Pauses

Home Forums OpenEars Sentences, Word Order and Pauses

Tagged: ,

Viewing 14 posts - 1 through 14 (of 14 total)

  • Author
    Posts
  • #1017333
    lookbadgers
    Participant

    I understand that OpenEars uses the previous word to help detection with the next word when working with sentences.

    If I had the corpus “THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG”

    If it heard “THE QUICK” it increases the probability the next sound will be BROWN, I was wondering how much more does it increase the probability that the next sound will be matched with BROWN?

    If there was to be a pause between hearing “THE QUICK” and BROWN does this increased probability still apply?

    If there are likely to be pauses between words, is it better to have a corpus that contains the sentence as a whole and the individual words each on a new line?

    #1017334
    Halle Winkler
    Politepix

    There isn’t an answer in the form of “it’s 20% more likely” because it is dependent on the overall number of words as well as how many composed phrases were submitted to be turned into a language model. But I can tell you what is happening under the hood.

    Pocketsphinx as used by OpenEars takes into account unigrams (single words) bigrams (word pairs) and trigrams (word triplets) in language models. These are indicated in the language model as 1-gram, 2-gram and 3-gram and are collectively referred to as n-grams. When a set of individual words are submitted (meaning no composed phrases at all, just single words by themselves), the likelihood of each bigram and trigram is equal, meaning that all combinations of the words that can be expressed as a pair or a triplet are equally likely. When a phrase is submitted (taking yours as an example), every bigram and trigram that occurs within the phrase is more likely than word bigrams and trigrams composed of word combinations which do not appear within the phrase. The and symbols, indicating the beginning and end of an utterance, are also taken into account for probabilities so they appear in all of the n-gram sets. They are added automatically; you don’t do anything about them.

    With this information in mind about how it is working, it should be possible for you to construct a test which answers your question for the specific app you are developing, which I can’t really advise you about from the info in the description.

    #1017335
    lookbadgers
    Participant

    Thank you for the reply. I now understand why it’s not an exact probability for all apps but instead a formula based on number of words in a sentence.

    With that in mind does a previous utterance have any impact on the next?

    For example the pause between “THE QUICK” and “BROWN”. Or is the second utterance “BROWN” treated as a 1-gram and the previous 2-gram “THE QUICK” disregarded.

    I’ve noticed sometime users have been hesitating between words in expected phrases and I am trying to find out if this has been reducing the quality of the perceived hypothesis after the pause.

    #1017336
    Halle Winkler
    Politepix

    You have to construct a test for your app in order to see if recognition accuracy is reduced as a result of pauses. It isn’t difficult: you can turn on verbose output and use this method to submit recordings with and without pauses and look at the results:

    - (void) runRecognitionOnWavFileAtPath: (NSString *) wavPath usingLanguageModelAtPath: (NSString *) languageModelPath
    dictionaryAtPath: (NSString *) dictionaryPath languageModelIsJSGF: (BOOL) languageModelIsJSGF

    If you need to gather input recordings you can make them using the SaveThatWave demo.

    #1017337
    lookbadgers
    Participant

    Thank you, I will give that a try. I was just hoping the question may have already been answered. I will try and post my findings when I get time to test.

    #1017338
    Halle Winkler
    Politepix

    That’s great — please share at least a little information about your app vocabulary (you don’t have to tell the exact words, but give us info about vocabulary size, number of phrases, and a similar kind of phrase to the one you’re testing) and things like the mic used and the length of the pauses and environmental factors so a reader can get the big picture. I don’t think this is something that will behave the same with every app or under every circumstance (which is why I can’t answer it off the cuff) so it would be most helpful to hear about the context for the findings.

    #1017402
    lookbadgers
    Participant

    Just trying to get this up and running. I can get SaveTheWave to work in the sample app. However in my test application I get undefined symbols for architecture armv7

    “_OBJC_CLASS_$_SaevTheWaveController”, referenced from:
    objc-class-ref in…
    ld: symbol(s) not found for architecture armv7
    clang error: link command failed with exit code 1

    I have checked “Other Linker Flags” have the linker flag “-ObjC”:

    #1017403
    Halle Winkler
    Politepix

    Hi,

    You misspelled SaveThatWaveController in your project.

    #1017419
    lookbadgers
    Participant

    Sorry that was a typo when writing out the error, the typo did not exist in the code. Anyway that is now working and I can record wav files.

    I’m calling the method runRecognitionOnWavFileAtPath but nothing happens.

    I give the languageModel and dictionary paths generated by Rejecto.

    I recorded the wav file outside of the app in the end, I was wondering if the problem is it’s not the right format?

    16-bit PCM
    16000Hz
    Stereo

    #1017420
    Halle Winkler
    Politepix

    I’m calling the method runRecognitionOnWavFileAtPath but nothing happens.

    Do you mean that there is nothing ever returned in the OpenEarsEventsObserver pocketsphinxDidReceiveHypothesis: method?

    #1017421
    lookbadgers
    Participant

    That is correct pocketsphinxDidReceiveHypothesis is never called.

    I assume you can’t use RapidEars with SaveTheWave? Which might make the test I’m working on irrelevant.

    #1017422
    Halle Winkler
    Politepix

    It’s probably because the file is stereo. That’s correct, RapidEars doesn’t work on a single pause-bounded utterance in the sense that stock OpenEars does, so it doesn’t have a method for outputting that complete utterance as a WAV.

    I construct replicable tests for RapidEars by making a recording and playing it out of a speaker and into the device, cued by a note in the console that says “press play now”. While it is not the most deterministic thing going, it is the most informative approach I can think of that actually replicates real-world behavior without interfering with RapidEars’ resource management at the same time I’m trying to test it.

    #1017424
    lookbadgers
    Participant

    I’ve created a new recording in mono but still the same problem.

    Thank you for the suggestion about testing RapidEars I will have to try that at some point.

    #1017425
    Halle Winkler
    Politepix

    In that case the issue is going to be related to something else about the app setup. Output from SaveThatWave is known to work so I would just get it from there and if that doesn’t work I would look at your OpenEarsEventsObserver delegate setup.

Viewing 14 posts - 1 through 14 (of 14 total)
  • You must be logged in to reply to this topic.