Chinese recognition is too slow

Home Forums OpenEars Chinese recognition is too slow

Viewing 6 posts - 1 through 6 (of 6 total)

  • Author
    Posts
  • #1030279
    blackwing
    Participant

    I’ve downloaded the AcousticModelChinese.bundle and ran it successfully.
    But I found that the cpu usage is 100% on A5 CPU, ios 8.1. It takes a long time but did not generate a hypothesise. On an A8X ipad air with ios 8, the cpu usage is between 20% to 50%.

    So my question is how to lower the cpu usage ?

    #1030280
    Halle Winkler
    Politepix

    Hello,

    Can you give me a little bit of information about what you are doing with it? How large is the vocabulary, what device is the slow device, is there noise that is leading to ongoing recognition attempts, is this about one of the plugins and if so which ones, etc?

    But I found that the cpu usage is 100% on A5 CPU

    What is the duration of this CPU peak?

    #1030283
    blackwing
    Participant

    Hello Halle,
    1. the A5 device is iPad mini, the cpu usage peak seams last for ever, the usage stay steadily between 98% to 101%.

    2. My vocabulary is small, only eight commands. It’s as follow :
    NSArray *firstLanguageArray = @[@”小宝前进”,
    @”小宝后退”,
    @”小宝向左转”,
    @”小宝向右转”,
    @”小宝抬头”,
    @”小宝低头”,
    @”小宝去充电”,
    @”小宝停止充电”];

    3. It’s quite quiet around and I try it on A8X cpu device, it’s fast with the same condition.

    #1030284
    Halle Winkler
    Politepix

    OK, thanks for the clarification. I actually wonder if the issue is related more to a mic difference more than a CPU difference. Do you get better results if you increase vadThreshold up to a point that rejects most noise on the mini? The vadThreshold settings have to be evaluated and set for each acoustic model besides the English one, I think there is more info about that at the end of the other languages acoustic model download page: https://www.politepix.com/otherlanguages. Let me know if this helps or if you’d like to troubleshoot it more (in this case I’ll ask for some logging output as seen here: https://www.politepix.com/forums/topic/install-issues-and-their-solutions/).

    #1030286
    blackwing
    Participant

    Hello Halle, thank you for your advice, it works.
    I rise vadThreshold to 3.6, it can recognize my commands now. But there are still some questions.

    1. the logs said:

    The word 小宝向右转 was not found in the dictionary of the acoustic model /var/mobile/Applications/830BE127-C119-44E0-B2AA-48CB37746BD3/OpenEarsSampleApp.app/AcousticModelChinese.bundle. Now using the fallback method to look it up.

    Q: Dose it matters? If I add this command to the LanguageModelGeneratorLookupList.text, will it help to speed up the recognition process ?

    2. about the vad:

    -vad_postspeech 50 69
    -vad_prespeech 20 10
    -vad_startspeech 10 10
    -vad_threshold 2.0 3.600000e+00

    Q: How can I set the vad_postspeech and vad_prespeech params ?

    3.
    my recogniztion log:

    2016-05-10 12:31:44.898 OpenEarsSampleApp[779:1903] Speech detected…
    2016-05-10 12:31:44.900 OpenEarsSampleApp[779:60b] Local callback: Pocketsphinx has detected speech.
    2016-05-10 12:31:51.251 OpenEarsSampleApp[779:3807] End of speech detected…
    INFO: cmn_prior.c(131): cmn_prior_update: from < 11.49 0.22 -0.25 -0.06 -0.42 -0.11 -0.17 -0.21 -0.23 -0.07 -0.12 -0.16 -0.13 >
    INFO: cmn_prior.c(149): cmn_prior_update: to < 11.70 0.08 -0.15 -0.05 -0.42 -0.11 -0.19 -0.22 -0.21 -0.10 -0.13 -0.14 -0.14 >
    INFO: ngram_search_fwdtree.c(1553): 886 words recognized (4/fr)
    INFO: ngram_search_fwdtree.c(1555): 15516 senones evaluated (64/fr)
    INFO: ngram_search_fwdtree.c(1559): 3930 channels searched (16/fr), 158 1st, 2778 last
    INFO: ngram_search_fwdtree.c(1562): 1285 words for which last channels evaluated (5/fr)
    INFO: ngram_search_fwdtree.c(1564): 24 candidate words for entering last phone (0/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 7.78 CPU 3.203 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 14.73 wall 6.062 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 6 words
    2016-05-10 12:31:51.253 OpenEarsSampleApp[779:60b] Local callback: Pocketsphinx has detected a second of silence, concluding an utterance.
    INFO: ngram_search_fwdflat.c(948): 793 words recognized (3/fr)
    INFO: ngram_search_fwdflat.c(950): 13900 senones evaluated (57/fr)
    INFO: ngram_search_fwdflat.c(952): 5411 channels searched (22/fr)
    INFO: ngram_search_fwdflat.c(954): 1759 words searched (7/fr)
    INFO: ngram_search_fwdflat.c(957): 276 word transitions (1/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 2.00 CPU 0.825 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 1.96 wall 0.808 xRT
    INFO: ngram_search.c(1280): lattice start node <s>.0 end node </s>.183
    INFO: ngram_search.c(1306): Eliminated 2 nodes before end node
    INFO: ngram_search.c(1411): Lattice has 241 nodes, 507 links
    INFO: ps_lattice.c(1380): Bestpath score: -3737
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:183:241) = -214253
    INFO: ps_lattice.c(1441): Joint P(O,S) = -235609 P(S|O) = -21356
    INFO: ngram_search.c(899): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(902): bestpath 0.00 wall 0.001 xRT
    2016-05-10 12:31:53.221 OpenEarsSampleApp[779:3807] Pocketsphinx heard “小宝去充电” with a score of (-21356) and an utterance ID of 14.
    2016-05-10 12:31:53.224 OpenEarsSampleApp[779:60b] Flite sending interrupt speech request.
    2016-05-10 12:31:53.225 OpenEarsSampleApp[779:60b] Local callback: The received hypothesis is [ 小宝去充电 ] with a score of -21356 and an ID of 14

    the interval is from 2016-05-10 12:31:44.898 to 2016-05-10 12:31:53.225

    It took about 10s to finsh one recognition, how can I reduce the time?

    4. Cpu usage is still high

    When it detects speech, the cpu usage rises to about 100% and the peak lasts for about 6 seconds.

    Q: Is there any method to lower the cpu usage ?

    #1030290
    Halle Winkler
    Politepix

    I rise vadThreshold to 3.6, it can recognize my commands now.

    It may help with the speed to raise it to the highest possible while it can still recognize your speech.

    Q: Dose it matters? If I add this command to the LanguageModelGeneratorLookupList.text, will it help to speed up the recognition process ?

    It doesn’t matter in terms of speed, unless you are seeing that it is this word only that is causing slow results (that is very unlikely)

    Q: How can I set the vad_postspeech and vad_prespeech params ?

    It isn’t necessary to do anything with these parameters.

    It took about 10s to finsh one recognition, how can I reduce the time?

    It only took two seconds to do the recognition. This is where the end of speech happens in your log (OpenEars first waits for the speaker to complete their utterance and then it starts recognizing it):

    2016-05-10 12:31:51.253 OpenEarsSampleApp[779:60b] Local callback: Pocketsphinx has detected a second of silence, concluding an utterance.

    This is where the hypothesis from the completed recognition is given:

    2016-05-10 12:31:53.221 OpenEarsSampleApp[779:3807] Pocketsphinx heard “小宝去充电” with a score of (-21356) and an utterance ID of 14.

    That is 12:31:51.253 – 12:31:53.221 or almost exactly two seconds.

    So, if something seems like it is 10 seconds, it is something else besides recognition time. This could be because Flite is speaking in between, or it could be because the end of user speech is not being recognized at the right time because vadThreshold is too low.

    When it detects speech, the cpu usage rises to about 100% and the peak lasts for about 6 seconds.

    That seems a little doubtful to me, since there are only two seconds in which the speech is being analyzed and the CPU doesn’t need to work much before the speech is being analyzed. Are you sure that isn’t Flite speech being generated that is using the CPU?

Viewing 6 posts - 1 through 6 (of 6 total)
  • You must be logged in to reply to this topic.