Reply To: Sentences, Word Order and Pauses

Home Forums OpenEars Sentences, Word Order and Pauses Reply To: Sentences, Word Order and Pauses

Halle Winkler

There isn’t an answer in the form of “it’s 20% more likely” because it is dependent on the overall number of words as well as how many composed phrases were submitted to be turned into a language model. But I can tell you what is happening under the hood.

Pocketsphinx as used by OpenEars takes into account unigrams (single words) bigrams (word pairs) and trigrams (word triplets) in language models. These are indicated in the language model as 1-gram, 2-gram and 3-gram and are collectively referred to as n-grams. When a set of individual words are submitted (meaning no composed phrases at all, just single words by themselves), the likelihood of each bigram and trigram is equal, meaning that all combinations of the words that can be expressed as a pair or a triplet are equally likely. When a phrase is submitted (taking yours as an example), every bigram and trigram that occurs within the phrase is more likely than word bigrams and trigrams composed of word combinations which do not appear within the phrase. The and symbols, indicating the beginning and end of an utterance, are also taken into account for probabilities so they appear in all of the n-gram sets. They are added automatically; you don’t do anything about them.

With this information in mind about how it is working, it should be possible for you to construct a test which answers your question for the specific app you are developing, which I can’t really advise you about from the info in the description.