giebler

Forum Replies Created

Viewing 18 posts - 1 through 18 (of 18 total)

  • Author
    Posts
  • in reply to: First long phrase missed #1022338
    giebler
    Participant

    I determined that the following condition occurs twice during the first long phrase:

    // Expand the backpointer tables if necessary.
    if (ngs->bpidx >= ngs->bp_table_size) {

    excessive_length_notification(); // HLW this has been added since no one ever wants this to go on for more than one round on an iPhone.

    ngs->bp_table_size *= 2;
    ngs->bp_table = ckd_realloc(ngs->bp_table,
    ngs->bp_table_size
    * sizeof(*ngs->bp_table));
    E_INFO(“Resized backpointer table to %d entries\n”, ngs->bp_table_size);

    }

    This one doesn’t happen at all:

    if (ngs->bss_head >= ngs->bscore_stack_size
    – bin_mdef_n_ciphone(ps_search_acmod(ngs)->mdef)) {

    excessive_length_notification(); // HLW this has been added since no one ever wants this to go on for more than one round on an iPhone.

    ngs->bscore_stack_size *= 2;
    ngs->bscore_stack = ckd_realloc(ngs->bscore_stack,
    ngs->bscore_stack_size
    * sizeof(*ngs->bscore_stack));
    E_INFO(“Resized score stack to %d entries\n”, ngs->bscore_stack_size);
    }

    Is there an easy way to start with ngs->bp_table_size four times the normal size so that we never hit that code?

    That would solve the problem for our application while leaving the LongRecognition and kExcessiveUtterancePeriod code intact.

    Thanks again for the excellent support.

    in reply to: First long phrase missed #1022336
    giebler
    Participant

    Thanks, I’ll try that.

    By the way:

    2014-08-22 10:54:39.290 OpenEarsSampleApp[29757:3c03] ###### GLG Recognition Loop: 20.000000

    I am running my compiled version and it is set to 20.0.

    Now that I know I have source code, I’ll also look into the issue here too.

    Thanks,

    Gary

    in reply to: First long phrase missed #1022333
    giebler
    Participant

    Thanks – I never noticed the source was included – I just copied the framework and bundles.

    I changed the value to 20.0, recompiled the bundles and framework, cleaned and compiled the sample app, but it’s still bailing early, especially on the iPad 2nd gen which bails at 5 seconds. Do I need to set LongRecognition to TRUE?

    in reply to: First long phrase missed #1022331
    giebler
    Participant

    Thanks for your help!

    However, ContinuousModel.m isn’t available in the project.

    I tried placing the define in ContinuousModel.h hoping it was undefined in the library, but it has no effect, so it probably is defined in the library after the include for the .h file.

    #define kExcessiveUtterancePeriod 15

    Is there a value I can set once the module is initialized?

    How can I implement this change?

    Thanks.

    in reply to: First long phrase missed #1022329
    giebler
    Participant

    The problem does not show up if you use a WAV file. It only happens when the microphone is recording the audio. It detects finished speech before the end of the sentence. If you change the #ifdef switch in the sample app so that it is using the live mic, and play the audio file from another device, or simply say the sentence, you’ll see the problems.

    When I zipped the sample app, it was set to use the WAV file (sorry for the confusion).

    in reply to: First long phrase missed #1022310
    giebler
    Participant

    The recording was made on the iPhone5s running 7.1.2. but at 16 kHz and 16 bits, I doubt it matters where I recorded it. The error seems to be device and iOS independent. I have verified it exists on the following devices:

    iPad 2nd Gen running iOS 6.1.3
    iPad 4th Gen running iOS 7.1.2
    iPad Air running iOS 7.1.2
    iPad Mini running iOS 7.1.2
    iPhone 5s running iOS 7.1.2

    The iPhone 5s almost gets the whole phrase, just missing the last few words (the first time).

    in reply to: First long phrase missed #1022302
    giebler
    Participant

    I’ve emailed the sample app with the audio file to your email address from your newsletters. Please let me know if you received it.

    Thanks.

    in reply to: First long phrase missed #1022299
    giebler
    Participant

    I tested on a second generation iPad running iOS 6.1.3 and on an iPad Mini running iOS 7.1.2.

    I made the recording as you suggested and wanted to try it before sending it.

    However, I noticed a very repeatable problem (besides the early termination of recognition). If I play the recording (from my iPhone) instead of speaking and regardless of volume, the sample app recognition loop won’t restart. I turned on logging, but it only gave me the following additional log entry:

    2014-08-19 12:02:49.143 OpenEarsSampleApp[26108:3a03] cont_ad_read failed, stopping.
    2014-08-19 12:02:49.246 OpenEarsSampleApp[26108:907] Setting up the continuous recognition loop has failed for some reason, please turn on [OpenEarsLogging startOpenEarsLogging] in OpenEarsConfig.h to learn more.

    The audio recording has some hiss (noise) on it since it is using the iPhone5 microphone, but even when I lower the volume to minimize the hiss, it still fails to resume preventing me from testing the recording for the second time.

    I’m going to go into my recording studio and make a decent voice recording of the phrase to make sure it fails properly and then works the second time, but I’ll send both recordings so that you can see the second problem.

    Thanks!

    in reply to: First long phrase missed #1022289
    giebler
    Participant

    I took the sample app and made this one modification to it. When I attempt to read the following sentence, it shows “has detected finished speech” before I finish reading the sentence. After the first failed attempt, it works perfectly. However, if I press “Stop Listening” and then press “Start Listening” again, it will once again miss the first long sentence. After the first miss, it works fine. Thanks for looking into this issue.

    // THIS IS A LONGER TEST OF A VERY LONG SENTENCE TO DETERMINE IF THE OPEN EARS LIBRARY HAS A PROBLEM DETECTING VERBOSE SPEECH

    NSArray *firstLanguageArray = [[NSArray alloc] initWithArray:[NSArray arrayWithObjects: // All capital letters.
    @”BACKWARD”,
    @”CHANGE”,
    @”FORWARD”,
    @”THIS”,
    @”IS A”,
    @”LONGER”,
    @”TEST”,
    @”OF A”,
    @”VERY”,
    @”LONG”,
    @”SENTENCE”,
    @”TO”,
    @”DETERMINE”,
    @”IF”,
    @”THE”,
    @”OPEN”,
    @”EARS”,
    @”LIBRARY”,
    @”HAS A”,
    @”PROBLEM”,
    @”DETECTING”,
    @”VERBOSE”,
    @”SPEECH”,
    @”GO”,
    @”LEFT”,
    @”MODEL”,
    @”RIGHT”,
    @”TURN”,
    nil]];

    in reply to: First long phrase missed #1022257
    giebler
    Participant

    Well, I disabled everything except OpenEars and it still is happening. I’m going to try it in another app to see if I can determine what is happening. I’ll let you know what I find.

    in reply to: First long phrase missed #1022254
    giebler
    Participant

    Here’s what I think is happening:

    When it receives a long phrase, the average gain (AGC) is adjusting so that it starts to detect silence during the phrase. When I set the timeout to 3.6 seconds, it gives the app enough time to finish the phrase before it times out.

    From the debug log, I see that AGC is set to none with a threshold of 2.

    I’m using an iPad (2nd Gen) with its internal mic.

    Is there a way to change the averaging detection for silence so that it doesn’t change its value over 5 seconds of speech? (Hopefully I’m saying this correctly). Just in case, let me try another way. The level it uses to determine silence is appearing to change over a few seconds of speech so that eventually the speech looks like silence. I need a slower response so that speech still looks like speech after 5 seconds.

    Does that make any sense?

    in reply to: First long phrase missed #1022252
    giebler
    Participant

    I tried all three calibration times and I even set the timeout to 1.6 seconds. Here’s the results:

    2014-08-15 14:38:01.855 coach[24101:6923] Calibration has completed
    2014-08-15 14:38:01.857 coach[24101:6923] Listening.
    2014-08-15 14:38:11.276 coach[24101:6923] Speech detected…
    INFO: file_omitted(0): Resized backpointer table to 10000 entries
    INFO: file_omitted(0): Resized score stack to 200000 entries
    2014-08-15 14:38:14.734 coach[24101:6923] Stopping audio unit.
    2014-08-15 14:38:14.865 coach[24101:6923] Audio Output Unit stopped, cleaning up variable states.
    2014-08-15 14:38:14.866 coach[24101:6923] Processing speech, please wait…
    INFO: file_omitted(0): cmn_prior_update: from < 47.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
    INFO: file_omitted(0): cmn_prior_update: to < 58.97 -2.76 -0.05 -2.55 -2.07 -2.80 -0.86 0.27 -0.82 -0.19 -0.09 0.13 -0.02 >
    INFO: file_omitted(0): 8462 words recognized (24/fr)
    INFO: file_omitted(0): 477916 senones evaluated (1350/fr)
    INFO: file_omitted(0): 305552 channels searched (863/fr), 41300 1st, 165791 last
    INFO: file_omitted(0): 23343 words for which last channels evaluated (65/fr)
    INFO: file_omitted(0): 14266 candidate words for entering last phone (40/fr)
    INFO: file_omitted(0): fwdtree 1.95 CPU 0.551 xRT
    INFO: file_omitted(0): fwdtree 3.60 wall 1.016 xRT
    INFO: file_omitted(0): Utterance vocabulary contains 145 words
    INFO: file_omitted(0): 4616 words recognized (13/fr)
    INFO: file_omitted(0): 337969 senones evaluated (955/fr)
    INFO: file_omitted(0): 342829 channels searched (968/fr)
    INFO: file_omitted(0): 22321 words searched (63/fr)
    INFO: file_omitted(0): 18710 word transitions (52/fr)
    INFO: file_omitted(0): fwdflat 1.27 CPU 0.358 xRT
    INFO: file_omitted(0): fwdflat 1.27 wall 0.358 xRT
    2014-08-15 14:38:16.145 coach[24101:6923] Pocketsphinx heard “WHICH DOCTORS HAVE A HIGHEST NET SWITCHING FOR MY PRODUCT OVER ALL” with a score of (0) and an utterance ID of 000000000.
    2014-08-15 14:38:16.147 coach[24101:6923] Checking and resetting all audio session settings.
    2014-08-15 14:38:16.148 coach[24101:907] _isSpeaking: 0
    2014-08-15 14:38:16.149 coach[24101:6923] audioCategory is correct, we will leave it as it is.

    Here’s the second try in the same debug session:

    2014-08-15 14:38:16.270 coach[24101:6923] Listening.
    2014-08-15 14:38:39.491 coach[24101:6923] Speech detected…
    2014-08-15 14:38:45.964 coach[24101:6923] Stopping audio unit.
    2014-08-15 14:38:46.094 coach[24101:6923] Audio Output Unit stopped, cleaning up variable states.
    2014-08-15 14:38:46.096 coach[24101:6923] Processing speech, please wait…
    INFO: file_omitted(0): cmn_prior_update: from < 59.07 -2.73 -0.26 -2.34 -2.00 -2.78 -0.94 0.16 -0.72 -0.18 0.02 0.04 -0.03 >
    INFO: file_omitted(0): cmn_prior_update: to < 57.70 -2.95 -0.09 -2.21 -1.92 -2.72 -0.87 0.11 -0.72 -0.14 0.03 0.01 -0.05 >
    INFO: file_omitted(0): 8648 words recognized (18/fr)
    INFO: file_omitted(0): 557984 senones evaluated (1136/fr)
    INFO: file_omitted(0): 315338 channels searched (642/fr), 57386 1st, 147718 last
    INFO: file_omitted(0): 28447 words for which last channels evaluated (57/fr)
    INFO: file_omitted(0): 14112 candidate words for entering last phone (28/fr)
    INFO: file_omitted(0): fwdtree 2.27 CPU 0.463 xRT
    INFO: file_omitted(0): fwdtree 6.62 wall 1.348 xRT
    INFO: file_omitted(0): Utterance vocabulary contains 127 words
    INFO: file_omitted(0): 3739 words recognized (8/fr)
    INFO: file_omitted(0): 328839 senones evaluated (670/fr)
    INFO: file_omitted(0): 276068 channels searched (562/fr)
    INFO: file_omitted(0): 21483 words searched (43/fr)
    INFO: file_omitted(0): 18104 word transitions (36/fr)
    INFO: file_omitted(0): fwdflat 1.17 CPU 0.238 xRT
    INFO: file_omitted(0): fwdflat 1.17 wall 0.238 xRT
    2014-08-15 14:38:47.281 coach[24101:6923] Pocketsphinx heard “WHICH DOCTORS HAVE THE HIGHEST NET SWITCHING FOR MY PRODUCT OVER THE LAST THIRTEEN WEEKS” with a score of (0) and an utterance ID of 000000001.
    2014-08-15 14:38:47.282 coach[24101:6923] Checking and resetting all audio session settings.
    2014-08-15 14:38:47.284 coach[24101:6923] audioCategory is correct, we will leave it as it is.

    Let me know if you see anything in this listing. As you can see, it can and does recognize the entire sentence perfectly after the first attempt.

    in reply to: First long phrase missed #1022247
    giebler
    Participant

    It works great after the first try, and there is no pause or hesitation and no intermittent noise during the first try. I’m working in a relatively quiet environment. How would I do a longer calibration and how would that help?

    I don’t have any choice with the length of the phrase – we’re asking questions and as I said, it works fine AFTER the first attempt.

    Here’s a sample question:

    Which doctors have the highest net switching for my product over the last thirteen weeks?

    First try after starting the app always fails while I’m still speaking, and after that it works fine.

    In 1.65 it would give the following message:

    2014-08-14 17:07:45.144 coach[23556:a107] There is reason to suspect the VAD of being out of sync with the current background noise levels in the environment so we will recalibrate.

    In 1.71 it no longer gives that message and as I said, it gets more words in 1.71 but still misses the last half.

    in reply to: Won't Recognize Q, CUE, or QUEUE #1015717
    giebler
    Participant

    Even though I was generating a new .dic file, it was failing to copy it to the proper folder and still using the old one! Once I discovered that, your suggestions for Q1,Q2,Q3,Q4 and IMS are all working!

    Thanks!

    in reply to: Won't Recognize Q, CUE, or QUEUE #1015714
    giebler
    Participant

    I’m adding these entries to the cmu07a.dic file and then generating my .dic file by adding Q1,Q2,Q3,Q4 and IMS to the language array and generating it. I’ll download my language file to make sure they ended up there…

    in reply to: Won't Recognize Q, CUE, or QUEUE #1015713
    giebler
    Participant

    I also can’t get it to recognize our company name (IMS) which I also added to the .dic file as shown here:

    IMRIE IH M ER IY
    IMS AY EH M EH S
    IMUS AY M AH S

    Any suggestions for this one?

    Thanks!

    in reply to: Won't Recognize Q, CUE, or QUEUE #1015711
    giebler
    Participant

    Here’s what (and where) I put in the .dic file:

    Q.S K Y UW Z
    Q1 K Y UW W AH N
    Q2 K Y UW T UW
    Q3 K Y UW TH R IY
    Q4 K Y UW F AO R
    QANA K AA N AH

    I had to edit the .dic file in Hex since at first Xcode put spaces instead of a tab.

    It still doesn’t recognize Q1, Q2, Q3 or Q4.

    It comes out “Two One” or “U One” no matter how clearly I speak.

    I need both “Two” and “U” (U.S.) in my recognition file.

    Any other thoughts? I don’t know what else to do. Would Rejecto help?

    in reply to: PocketSphinx stops listening after I change language file. #1015511
    giebler
    Participant

    This same thing happened to me when I updated to version 1.2.5. I have an app where I need to add words occasionally. When I generate the new files, it would go through the motions of listening, but not ever recognize anything. Based on Geri’s solution, I deleted the old dynamic dictionary and grammar files before creating the new ones and this eliminated the bug. Now I can use the same file names over and over. Hope this helps you track down the bug.

Viewing 18 posts - 1 through 18 (of 18 total)