Reply To: [Resolved] generateLanguageModelFromArray problem

Home Forums OpenEars [Resolved] generateLanguageModelFromArray problem Reply To: [Resolved] generateLanguageModelFromArray problem

#1024203
maxgarmar
Participant

Here it goes,
[749:161996] Starting dynamic language model generation
## Vocab generated by v2 of the CMU-Cambridge Statistcal
## Language Modeling toolkit.
##
## Includes 21 words ##
wfreq2vocab : Done.
text2idngram
Vocab : /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.vocab
Output idngram : /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.idngram
N-gram buffer size : 10
Hash table size : 5000
Temp directory : /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/cmuclmtk-Uoiyta
Max open files : 20
FOF size : 10
n : 3
Initialising hash table…
Reading vocabulary…
Allocating memory for the n-gram buffer…
Reading text into the n-gram buffer…
20,000 n-grams processed for each “.”, 1,000,000 for each line.

Sorting n-grams…
Writing sorted n-grams to temporary file /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/cmuclmtk-Uoiyta/1
Merging 1 temporary files…

2-grams occurring: N times > N times Sug. -spec_num value
0 35 45
1 31 4 14
2 3 1 11
3 0 1 11
4 0 1 11
5 0 1 11
6 0 1 11
7 0 1 11
8 0 1 11
9 0 1 11
10 0 1 11

3-grams occurring: N times > N times Sug. -spec_num value
0 50 60
1 48 2 12
2 2 0 10
3 0 0 10
4 0 0 10
5 0 0 10
6 0 0 10
7 0 0 10
8 0 0 10
9 0 0 10
10 0 0 10
text2idngram : Done.

read_wlist_into_siht: a list of 21 words was read from “/var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.vocab”.
read_wlist_into_array: a list of 21 words was read from “/var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.vocab”.
Unigram was renormalized to absorb a mass of 0.463415
prob[UNK] = 1e-99
ARPA-style 3-gram will be written to /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.arpa
idngram2lm : Done.
INFO: cmd_ln.c(702): Parsing command line:
sphinx_lm_convert \
-i /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.arpa \
-o /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.DMP \
-debug 10

Current configuration:
[NAME] [DEFLT] [VALUE]
-case
-debug 10
-help no no
-i /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.arpa
-ienc
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.DMP
-oenc utf8 utf8
-ofmt

INFO: ngram_model_arpa.c(504): ngrams 1=21, 2=34, 3=21
INFO: ngram_model_arpa.c(137): Reading unigrams
INFO: ngram_model_arpa.c(543): 21 = #unigrams created
INFO: ngram_model_arpa.c(197): Reading bigrams
INFO: ngram_model_arpa.c(561): 34 = #bigrams created
INFO: ngram_model_arpa.c(562): 6 = #prob2 entries
INFO: ngram_model_arpa.c(570): 3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(294): Reading trigrams
INFO: ngram_model_arpa.c(583): 21 = #trigrams created
INFO: ngram_model_arpa.c(584): 4 = #prob3 entries
INFO: ngram_model_dmp.c(518): Building DMP model…
INFO: ngram_model_dmp.c(548): 21 = #unigrams created
INFO: ngram_model_dmp.c(649): 34 = #bigrams created
INFO: ngram_model_dmp.c(650): 6 = #prob2 entries
INFO: ngram_model_dmp.c(657): 3 = #bo_wt2 entries
INFO: ngram_model_dmp.c(661): 21 = #trigrams created
INFO: ngram_model_dmp.c(662): 4 = #prob3 entries
2015-01-13 13:52:16.863[749:161996] Done creating language model with CMUCLMTK in 0.106283 seconds.
2015-01-13 13:52:16.928[749:161996] The word CALAMARES was not found in the dictionary /private/var/mobile/Containers/Bundle/Application/057D1FCC-0E5A-42AE-B064-B8524A54A8A2//AcousticModelEnglish.bundle/LanguageModelGeneratorLookupList.text/LanguageModelGeneratorLookupList.text.
2015-01-13 13:52:16.928[749:161996] Now using the fallback method to look up the word CALAMARES
2015-01-13 13:52:16.928[749:161996] If this is happening more frequently than you would expect, the most likely cause for it is since you are using the English phonetic lookup dictionary is that your words are not in English or aren’t dictionary words, or that you are submitting the words in lowercase when they need to be entirely written in uppercase. This can also happen if you submit words with punctuation attached – consider removing punctuation from language models or grammars you create before submitting them.
2015-01-13 13:52:16.928[749:161996] Using convertGraphemes for the word or phrase CALAMARES which doesn’t appear in the dictionary
(lldb) bt
* thread #1: tid = 0x278cc, 0x000000010019e000`feat_copy_into + 24, queue = ‘com.apple.main-thread’, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x000000010019e000`feat_copy_into + 24
frame #1: 0x00000001001abeb8`utt_init + 32
frame #2: 0x000000010019a59c`flite_synth_text + 52
frame #3: 0x000000010019a3bc `___lldb_unnamed_function417$$ + 128
frame #4: 0x00000001001af090 `___lldb_unnamed_function551$$ + 1384
frame #5: 0x00000001001b1100 `___lldb_unnamed_function564$$ + 468
frame #6: 0x00000001001b098c `___lldb_unnamed_function559$$ + 560
* frame #7: 0x000000010011b194 `-[MRSCViewController openEarsRefreshProduct](self=0x000000015551cee0, _cmd=0x000000010042265a) + 1648 at MRSCViewController.m:1969
frame #8: 0x0000000100119ec4 `-[MRSCViewController refreshTableNotif:](self=0x000000015551cee0, _cmd=0x0000000100421e98, notification=0x0000000170056aa0) + 4616 at MRSCViewController.m:1850
frame #9: 0x0000000182f801e0 CoreFoundation`__CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__ + 20
frame #10: 0x0000000182ebf370 CoreFoundation`_CFXNotificationPost + 2060
frame #11: 0x0000000183dbacc0 Foundation`-[NSNotificationCenter postNotificationName:object:userInfo:] + 72
frame #12: 0x0000000100123f98 `-[MRSCAppDelegate storeDidChange:](self=0x0000000170056bc0, _cmd=0x0000000100422d55, notification=0x0000000174245850) + 2000 at MRSCAppDelegate.m:265
frame #13: 0x0000000182f801e0 CoreFoundation`__CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__ + 20
frame #14: 0x0000000182ebf370 CoreFoundation`_CFXNotificationPost + 2060
frame #15: 0x0000000183dbacc0 Foundation`-[NSNotificationCenter postNotificationName:object:userInfo:] + 72
frame #16: 0x0000000183f7a148 Foundation`-[NSUbiquitousKeyValueStore _postDidChangeNotificationExternalChanges:sourceChangeCount:] + 396
frame #17: 0x0000000183f7a538 Foundation`__53-[NSUbiquitousKeyValueStore _syncConcurrentlyForced:]_block_invoke_2 + 256
frame #18: 0x0000000100518e30 libdispatch.dylib`_dispatch_call_block_and_release + 24
frame #19: 0x0000000100518df0 libdispatch.dylib`_dispatch_client_callout + 16
frame #20: 0x000000010051d75c libdispatch.dylib`_dispatch_main_queue_callback_4CF + 1056
frame #21: 0x0000000182f916a0 CoreFoundation`__CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 12
frame #22: 0x0000000182f8f748 CoreFoundation`__CFRunLoopRun + 1492
frame #23: 0x0000000182ebd1f4 CoreFoundation`CFRunLoopRunSpecific + 396
frame #24: 0x000000018c00f5a4 GraphicsServices`GSEventRunModal + 168
frame #25: 0x00000001877ee784 UIKit`UIApplicationMain + 1488
frame #26: 0x00000001001229e0 `main(argc=1, argv=0x000000016fd13a70) + 116 at main.m:16
frame #27: 0x000000019404aa08 libdyld.dylib`start + 4
<strong> [749:161996] Starting dynamic language model generation
## Vocab generated by v2 of the CMU-Cambridge Statistcal
## Language Modeling toolkit.
##
## Includes 21 words ##
wfreq2vocab : Done.
text2idngram
Vocab : /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.vocab
Output idngram : /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.idngram
N-gram buffer size : 10
Hash table size : 5000
Temp directory : /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/cmuclmtk-Uoiyta
Max open files : 20
FOF size : 10
n : 3
Initialising hash table…
Reading vocabulary…
Allocating memory for the n-gram buffer…
Reading text into the n-gram buffer…
20,000 n-grams processed for each “.”, 1,000,000 for each line.

Sorting n-grams…
Writing sorted n-grams to temporary file /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/cmuclmtk-Uoiyta/1
Merging 1 temporary files…

2-grams occurring: N times > N times Sug. -spec_num value
0 35 45
1 31 4 14
2 3 1 11
3 0 1 11
4 0 1 11
5 0 1 11
6 0 1 11
7 0 1 11
8 0 1 11
9 0 1 11
10 0 1 11

3-grams occurring: N times > N times Sug. -spec_num value
0 50 60
1 48 2 12
2 2 0 10
3 0 0 10
4 0 0 10
5 0 0 10
6 0 0 10
7 0 0 10
8 0 0 10
9 0 0 10
10 0 0 10
text2idngram : Done.

read_wlist_into_siht: a list of 21 words was read from “/var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.vocab”.
read_wlist_into_array: a list of 21 words was read from “/var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.vocab”.
Unigram was renormalized to absorb a mass of 0.463415
prob[UNK] = 1e-99
ARPA-style 3-gram will be written to /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.arpa
idngram2lm : Done.
INFO: cmd_ln.c(702): Parsing command line:
sphinx_lm_convert \
-i /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.arpa \
-o /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.DMP \
-debug 10

Current configuration:
[NAME] [DEFLT] [VALUE]
-case
-debug 10
-help no no
-i /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.arpa
-ienc
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o /var/mobile/Containers/Data/Application/461A3153-5928-4041-832A-D3F5AD119C33/Library/Caches/NameIWantForMyLanguageModelFiles.DMP
-oenc utf8 utf8
-ofmt

INFO: ngram_model_arpa.c(504): ngrams 1=21, 2=34, 3=21
INFO: ngram_model_arpa.c(137): Reading unigrams
INFO: ngram_model_arpa.c(543): 21 = #unigrams created
INFO: ngram_model_arpa.c(197): Reading bigrams
INFO: ngram_model_arpa.c(561): 34 = #bigrams created
INFO: ngram_model_arpa.c(562): 6 = #prob2 entries
INFO: ngram_model_arpa.c(570): 3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(294): Reading trigrams
INFO: ngram_model_arpa.c(583): 21 = #trigrams created
INFO: ngram_model_arpa.c(584): 4 = #prob3 entries
INFO: ngram_model_dmp.c(518): Building DMP model…
INFO: ngram_model_dmp.c(548): 21 = #unigrams created
INFO: ngram_model_dmp.c(649): 34 = #bigrams created
INFO: ngram_model_dmp.c(650): 6 = #prob2 entries
INFO: ngram_model_dmp.c(657): 3 = #bo_wt2 entries
INFO: ngram_model_dmp.c(661): 21 = #trigrams created
INFO: ngram_model_dmp.c(662): 4 = #prob3 entries
2015-01-13 13:52:16.863 [749:161996] Done creating language model with CMUCLMTK in 0.106283 seconds.
2015-01-13 13:52:16.928 [749:161996] The word CALAMARES was not found in the dictionary /private/var/mobile/Containers/Bundle/Application/057D1FCC-0E5A-42AE-B064-B8524A54A8A2/ /AcousticModelEnglish.bundle/LanguageModelGeneratorLookupList.text/LanguageModelGeneratorLookupList.text.
2015-01-13 13:52:16.928 [749:161996] Now using the fallback method to look up the word CALAMARES
2015-01-13 13:52:16.928 [749:161996] If this is happening more frequently than you would expect, the most likely cause for it is since you are using the English phonetic lookup dictionary is that your words are not in English or aren’t dictionary words, or that you are submitting the words in lowercase when they need to be entirely written in uppercase. This can also happen if you submit words with punctuation attached – consider removing punctuation from language models or grammars you create before submitting them.
2015-01-13 13:52:16.928 [749:161996] Using convertGraphemes for the word or phrase CALAMARES which doesn’t appear in the dictionary
(lldb) bt
* thread #1: tid = 0x278cc, 0x000000010019e000 `feat_copy_into + 24, queue = ‘com.apple.main-thread’, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x000000010019e000 `feat_copy_into + 24
frame #1: 0x00000001001abeb8 `utt_init + 32
frame #2: 0x000000010019a59c `flite_synth_text + 52
frame #3: 0x000000010019a3bc `___lldb_unnamed_function417$$ + 128
frame #4: 0x00000001001af090 `___lldb_unnamed_function551$$ + 1384
frame #5: 0x00000001001b1100 `___lldb_unnamed_function564$$ + 468
frame #6: 0x00000001001b098c `___lldb_unnamed_function559$$ + 560
* frame #7: 0x000000010011b194 `-[MRSCViewController openEarsRefreshProduct](self=0x000000015551cee0, _cmd=0x000000010042265a) + 1648 at MRSCViewController.m:1969
frame #8: 0x0000000100119ec4 `-[MRSCViewController refreshTableNotif:](self=0x000000015551cee0, _cmd=0x0000000100421e98, notification=0x0000000170056aa0) + 4616 at MRSCViewController.m:1850
frame #9: 0x0000000182f801e0 CoreFoundation`__CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__ + 20
frame #10: 0x0000000182ebf370 CoreFoundation`_CFXNotificationPost + 2060
frame #11: 0x0000000183dbacc0 Foundation`-[NSNotificationCenter postNotificationName:object:userInfo:] + 72
frame #12: 0x0000000100123f98 `-[MRSCAppDelegate storeDidChange:](self=0x0000000170056bc0, _cmd=0x0000000100422d55, notification=0x0000000174245850) + 2000 at MRSCAppDelegate.m:265
frame #13: 0x0000000182f801e0 CoreFoundation`__CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__ + 20
frame #14: 0x0000000182ebf370 CoreFoundation`_CFXNotificationPost + 2060
frame #15: 0x0000000183dbacc0 Foundation`-[NSNotificationCenter postNotificationName:object:userInfo:] + 72
frame #16: 0x0000000183f7a148 Foundation`-[NSUbiquitousKeyValueStore _postDidChangeNotificationExternalChanges:sourceChangeCount:] + 396
frame #17: 0x0000000183f7a538 Foundation`__53-[NSUbiquitousKeyValueStore _syncConcurrentlyForced:]_block_invoke_2 + 256
frame #18: 0x0000000100518e30 libdispatch.dylib`_dispatch_call_block_and_release + 24
frame #19: 0x0000000100518df0 libdispatch.dylib`_dispatch_client_callout + 16
frame #20: 0x000000010051d75c libdispatch.dylib`_dispatch_main_queue_callback_4CF + 1056
frame #21: 0x0000000182f916a0 CoreFoundation`__CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 12
frame #22: 0x0000000182f8f748 CoreFoundation`__CFRunLoopRun + 1492
frame #23: 0x0000000182ebd1f4 CoreFoundation`CFRunLoopRunSpecific + 396
frame #24: 0x000000018c00f5a4 GraphicsServices`GSEventRunModal + 168
frame #25: 0x00000001877ee784 UIKit`UIApplicationMain + 1488
frame #26: 0x00000001001229e0 `main(argc=1, argv=0x000000016fd13a70) + 116 at main.m:16
frame #27: 0x000000019404aa08 libdyld.dylib`start + 4