Chinese recognition is too slow

This topic has 5 replies, 2 voices, and was last updated 7 years, 11 months ago by Halle Winkler.

Viewing 6 posts - 1 through 6 (of 6 total)

Advertisement: “Did you know OpenEars™ can use rules-based grammars to recognize fixed phrases? And RuleORama lets you use them with RapidEars!”

Author

Posts
May 9, 2016 at 4:14 pm #1030279

blackwing
Participant

I’ve downloaded the AcousticModelChinese.bundle and ran it successfully.
But I found that the cpu usage is 100% on A5 CPU, ios 8.1. It takes a long time but did not generate a hypothesise. On an A8X ipad air with ios 8, the cpu usage is between 20% to 50%.

So my question is how to lower the cpu usage ?

May 9, 2016 at 4:25 pm #1030280

Halle Winkler
Politepix

Hello,

Can you give me a little bit of information about what you are doing with it? How large is the vocabulary, what device is the slow device, is there noise that is leading to ongoing recognition attempts, is this about one of the plugins and if so which ones, etc?

But I found that the cpu usage is 100% on A5 CPU

What is the duration of this CPU peak?

May 9, 2016 at 4:55 pm #1030283

blackwing
Participant

Hello Halle,
1. the A5 device is iPad mini, the cpu usage peak seams last for ever, the usage stay steadily between 98% to 101%.

2. My vocabulary is small, only eight commands. It’s as follow :
NSArray *firstLanguageArray = @[@”小宝前进”,
@”小宝后退”,
@”小宝向左转”,
@”小宝向右转”,
@”小宝抬头”,
@”小宝低头”,
@”小宝去充电”,
@”小宝停止充电”];

3. It’s quite quiet around and I try it on A8X cpu device, it’s fast with the same condition.

May 9, 2016 at 5:02 pm #1030284

Halle Winkler
Politepix

OK, thanks for the clarification. I actually wonder if the issue is related more to a mic difference more than a CPU difference. Do you get better results if you increase vadThreshold up to a point that rejects most noise on the mini? The vadThreshold settings have to be evaluated and set for each acoustic model besides the English one, I think there is more info about that at the end of the other languages acoustic model download page: https://www.politepix.com/otherlanguages. Let me know if this helps or if you’d like to troubleshoot it more (in this case I’ll ask for some logging output as seen here: https://www.politepix.com/forums/topic/install-issues-and-their-solutions/).

May 10, 2016 at 6:42 am #1030286

blackwing
Participant

Hello Halle, thank you for your advice, it works.
I rise vadThreshold to 3.6, it can recognize my commands now. But there are still some questions.

1. the logs said:

The word 小宝向右转 was not found in the dictionary of the acoustic model /var/mobile/Applications/830BE127-C119-44E0-B2AA-48CB37746BD3/OpenEarsSampleApp.app/AcousticModelChinese.bundle. Now using the fallback method to look it up.

Q: Dose it matters? If I add this command to the LanguageModelGeneratorLookupList.text, will it help to speed up the recognition process ?

2. about the vad:

-vad_postspeech 50 69
-vad_prespeech 20 10
-vad_startspeech 10 10
-vad_threshold 2.0 3.600000e+00

Q: How can I set the vad_postspeech and vad_prespeech params ?

3.
my recogniztion log:

2016-05-10 12:31:44.898 OpenEarsSampleApp[779:1903] Speech detected…
2016-05-10 12:31:44.900 OpenEarsSampleApp[779:60b] Local callback: Pocketsphinx has detected speech.
2016-05-10 12:31:51.251 OpenEarsSampleApp[779:3807] End of speech detected…
INFO: cmn_prior.c(131): cmn_prior_update: from < 11.49 0.22 -0.25 -0.06 -0.42 -0.11 -0.17 -0.21 -0.23 -0.07 -0.12 -0.16 -0.13 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 11.70 0.08 -0.15 -0.05 -0.42 -0.11 -0.19 -0.22 -0.21 -0.10 -0.13 -0.14 -0.14 >
INFO: ngram_search_fwdtree.c(1553): 886 words recognized (4/fr)
INFO: ngram_search_fwdtree.c(1555): 15516 senones evaluated (64/fr)
INFO: ngram_search_fwdtree.c(1559): 3930 channels searched (16/fr), 158 1st, 2778 last
INFO: ngram_search_fwdtree.c(1562): 1285 words for which last channels evaluated (5/fr)
INFO: ngram_search_fwdtree.c(1564): 24 candidate words for entering last phone (0/fr)
INFO: ngram_search_fwdtree.c(1567): fwdtree 7.78 CPU 3.203 xRT
INFO: ngram_search_fwdtree.c(1570): fwdtree 14.73 wall 6.062 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 6 words
2016-05-10 12:31:51.253 OpenEarsSampleApp[779:60b] Local callback: Pocketsphinx has detected a second of silence, concluding an utterance.
INFO: ngram_search_fwdflat.c(948): 793 words recognized (3/fr)
INFO: ngram_search_fwdflat.c(950): 13900 senones evaluated (57/fr)
INFO: ngram_search_fwdflat.c(952): 5411 channels searched (22/fr)
INFO: ngram_search_fwdflat.c(954): 1759 words searched (7/fr)
INFO: ngram_search_fwdflat.c(957): 276 word transitions (1/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 2.00 CPU 0.825 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 1.96 wall 0.808 xRT
INFO: ngram_search.c(1280): lattice start node <s>.0 end node </s>.183
INFO: ngram_search.c(1306): Eliminated 2 nodes before end node
INFO: ngram_search.c(1411): Lattice has 241 nodes, 507 links
INFO: ps_lattice.c(1380): Bestpath score: -3737
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:183:241) = -214253
INFO: ps_lattice.c(1441): Joint P(O,S) = -235609 P(S|O) = -21356
INFO: ngram_search.c(899): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(902): bestpath 0.00 wall 0.001 xRT
2016-05-10 12:31:53.221 OpenEarsSampleApp[779:3807] Pocketsphinx heard “小宝去充电” with a score of (-21356) and an utterance ID of 14.
2016-05-10 12:31:53.224 OpenEarsSampleApp[779:60b] Flite sending interrupt speech request.
2016-05-10 12:31:53.225 OpenEarsSampleApp[779:60b] Local callback: The received hypothesis is [ 小宝去充电 ] with a score of -21356 and an ID of 14

the interval is from 2016-05-10 12:31:44.898 to 2016-05-10 12:31:53.225

It took about 10s to finsh one recognition, how can I reduce the time?

4. Cpu usage is still high

When it detects speech, the cpu usage rises to about 100% and the peak lasts for about 6 seconds.

Q: Is there any method to lower the cpu usage ?

May 10, 2016 at 12:51 pm #1030290
Halle Winkler
Politepix
I rise vadThreshold to 3.6, it can recognize my commands now.

It may help with the speed to raise it to the highest possible while it can still recognize your speech.

Q: Dose it matters? If I add this command to the LanguageModelGeneratorLookupList.text, will it help to speed up the recognition process ?

It doesn’t matter in terms of speed, unless you are seeing that it is this word only that is causing slow results (that is very unlikely)

Q: How can I set the vad_postspeech and vad_prespeech params ?

It isn’t necessary to do anything with these parameters.

It took about 10s to finsh one recognition, how can I reduce the time?

It only took two seconds to do the recognition. This is where the end of speech happens in your log (OpenEars first waits for the speaker to complete their utterance and then it starts recognizing it):
```
2016-05-10 12:31:51.253 OpenEarsSampleApp[779:60b] Local callback: Pocketsphinx has detected a second of silence, concluding an utterance.
```
This is where the hypothesis from the completed recognition is given:
```
2016-05-10 12:31:53.221 OpenEarsSampleApp[779:3807] Pocketsphinx heard “小宝去充电” with a score of (-21356) and an utterance ID of 14.
```
That is 12:31:51.253 – 12:31:53.221 or almost exactly two seconds.

So, if something seems like it is 10 seconds, it is something else besides recognition time. This could be because Flite is speaking in between, or it could be because the end of user speech is not being recognized at the right time because vadThreshold is too low.

When it detects speech, the cpu usage rises to about 100% and the peak lasts for about 6 seconds.

That seems a little doubtful to me, since there are only two seconds in which the speech is being analyzed and the CPU doesn’t need to work much before the speech is being analyzed. Are you sure that isn’t Flite speech being generated that is using the CPU?
Author

Posts

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.