Convert ,wav file to text

Home Forums OpenEars Convert ,wav file to text

Viewing 4 posts - 1 through 4 (of 4 total)

  • Author
    Posts
  • #1030675
    juan14nob
    Participant

    Hi guys, I am facing some trouble when trying to convert a .wav file (that I already stored in memory) to text.

    The method I am trying ti use is :

    AVAudioRecorder *recorder = [[AVAudioRecorder alloc] initWithURL:outputFileURL settings:recordSetting error:nil];
    
     [[OEPocketsphinxController sharedInstance] runRecognitionOnWavFileAtPath:recorder.url.absoluteString usingLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"] languageModelIsJSGF:FALSE];
        

    And the log say:

    INFO: cmd_ln.c(702): Parsing command line:
    \
    -lm /var/mobile/Containers/Data/Application/9CB2A80F-1DDB-4E5D-8B07-1AE10C2347FF/Library/Caches/FirstOpenEarsDynamicLanguageModel.DMP \
    -vad_prespeech 10 \
    -vad_postspeech 69 \
    -vad_threshold 2.000000 \
    -remove_noise yes \
    -remove_silence yes \
    -bestpath yes \
    -lw 6.500000 \
    -dict /var/mobile/Containers/Data/Application/9CB2A80F-1DDB-4E5D-8B07-1AE10C2347FF/Library/Caches/FirstOpenEarsDynamicLanguageModel.dic \
    -hmm /var/containers/Bundle/Application/9B4800F8-FBD3-4FE9-890B-1154FD0A6EA1/myApp.app/AcousticModelEnglish.bundle

    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -allphone
    -allphone_ci no no
    -alpha 0.97 9.700000e-01
    -argfile
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -debug 0
    -dict /var/mobile/Containers/Data/Application/9CB2A80F-1DDB-4E5D-8B07-1AE10C2347FF/Library/Caches/FirstOpenEarsDynamicLanguageModel.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm /var/containers/Bundle/Application/9B4800F8-FBD3-4FE9-890B-1154FD0A6EA1/myApp.app/AcousticModelEnglish.bundle
    -input_endian little little
    -jsgf
    -keyphrase
    -kws
    -kws_plp 1e-1 1.000000e-01
    -kws_threshold 1 1.000000e+00
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lifter 0 0
    -lm /var/mobile/Containers/Data/Application/9CB2A80F-1DDB-4E5D-8B07-1AE10C2347FF/Library/Caches/FirstOpenEarsDynamicLanguageModel.DMP
    -lmctl
    -lmname
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf 30000 30000
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-10 1.000000e-10
    -pl_pip 1.0 1.000000e+00
    -pl_weight 3.0 3.000000e+00
    -pl_window 5 5
    -rawlogdir
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -uw 1.0 1.000000e+00
    -vad_postspeech 50 69
    -vad_prespeech 10 10
    -vad_threshold 2.0 2.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02

    INFO: cmd_ln.c(702): Parsing command line:
    \
    -nfilt 25 \
    -lowerf 130 \
    -upperf 6800 \
    -feat 1s_c_d_dd \
    -svspec 0-12/13-25/26-38 \
    -agc none \
    -cmn current \
    -varnorm no \
    -transform dct \
    -lifter 22 \
    -cmninit 40

    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 40
    -dither no no
    -doublebw no no
    -feat 1s_c_d_dd 1s_c_d_dd
    -frate 100 100
    -input_endian little little
    -lda
    -ldadim 0 0
    -lifter 0 22
    -logspec no no
    -lowerf 133.33334 1.300000e+02
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 25
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -smoothspec no no
    -svspec 0-12/13-25/26-38
    -transform legacy dct
    -unit_area yes yes
    -upperf 6855.4976 6.800000e+03
    -vad_postspeech 50 69
    -vad_prespeech 10 10
    -vad_threshold 2.0 2.000000e+00
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wlen 0.025625 2.562500e-02

    INFO: acmod.c(252): Parsed model-specific feature parameters from /var/containers/Bundle/Application/9B4800F8-FBD3-4FE9-890B-1154FD0A6EA1/myApp.app/AcousticModelEnglish.bundle/feat.params
    INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’current’, VARNORM=’no’, AGC=’none’
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: acmod.c(171): Using subvector specification 0-12/13-25/26-38
    INFO: mdef.c(518): Reading model definition: /var/containers/Bundle/Application/9B4800F8-FBD3-4FE9-890B-1154FD0A6EA1/myApp.app/AcousticModelEnglish.bundle/mdef
    INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
    INFO: bin_mdef.c(336): Reading binary model definition: /var/containers/Bundle/Application/9B4800F8-FBD3-4FE9-890B-1154FD0A6EA1/myApp.app/AcousticModelEnglish.bundle/mdef
    INFO: bin_mdef.c(516): 46 CI-phone, 168344 CD-phone, 3 emitstate/phone, 138 CI-sen, 6138 Sen, 32881 Sen-Seq
    INFO: tmat.c(206): Reading HMM transition probability matrices: /var/containers/Bundle/Application/9B4800F8-FBD3-4FE9-890B-1154FD0A6EA1/myApp.app/AcousticModelEnglish.bundle/transition_matrices
    INFO: acmod.c(124): Attempting to use PTM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/containers/Bundle/Application/9B4800F8-FBD3-4FE9-890B-1154FD0A6EA1/myApp.app/AcousticModelEnglish.bundle/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/containers/Bundle/Application/9B4800F8-FBD3-4FE9-890B-1154FD0A6EA1/myApp.app/AcousticModelEnglish.bundle/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: ptm_mgau.c(805): Number of codebooks doesn’t match number of ciphones, doesn’t look like PTM: 1 != 46
    INFO: acmod.c(126): Attempting to use semi-continuous computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/containers/Bundle/Application/9B4800F8-FBD3-4FE9-890B-1154FD0A6EA1/myApp.app/AcousticModelEnglish.bundle/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/containers/Bundle/Application/9B4800F8-FBD3-4FE9-890B-1154FD0A6EA1/myApp.app/AcousticModelEnglish.bundle/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(294): 512×13
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: s2_semi_mgau.c(904): Loading senones from dump file /var/containers/Bundle/Application/9B4800F8-FBD3-4FE9-890B-1154FD0A6EA1/myApp.app/AcousticModelEnglish.bundle/sendump
    INFO: s2_semi_mgau.c(928): BEGIN FILE FORMAT DESCRIPTION
    INFO: s2_semi_mgau.c(991): Rows: 512, Columns: 6138
    INFO: s2_semi_mgau.c(1023): Using memory-mapped I/O for senones
    INFO: s2_semi_mgau.c(1294): Maximum top-N: 4 Top-N beams: 0 0 0
    INFO: phone_loop_search.c(115): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 4125 * 20 bytes (80 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: /var/mobile/Containers/Data/Application/9CB2A80F-1DDB-4E5D-8B07-1AE10C2347FF/Library/Caches/FirstOpenEarsDynamicLanguageModel.dic
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(336): 20 words read
    INFO: dict.c(342): Reading filler dictionary: /var/containers/Bundle/Application/9B4800F8-FBD3-4FE9-890B-1154FD0A6EA1/myApp.app/AcousticModelEnglish.bundle/noisedict
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(345): 9 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 46^3 * 2 bytes (190 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 25576 bytes (24 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 25576 bytes (24 KiB) for single-phone word triphones
    INFO: ngram_model_arpa.c(79): No \data\ mark in LM file
    INFO: ngram_model_dmp.c(166): Will use memory-mapped I/O for LM file
    INFO: ngram_model_dmp.c(220): ngrams 1=19, 2=34, 3=18
    INFO: ngram_model_dmp.c(266): 19 = LM.unigrams(+trailer) read
    INFO: ngram_model_dmp.c(312): 34 = LM.bigrams(+trailer) read
    INFO: ngram_model_dmp.c(338): 18 = LM.trigrams read
    INFO: ngram_model_dmp.c(363): 4 = LM.prob2 entries read
    INFO: ngram_model_dmp.c(383): 5 = LM.bo_wt2 entries read
    INFO: ngram_model_dmp.c(403): 2 = LM.prob3 entries read
    INFO: ngram_model_dmp.c(431): 1 = LM.tseg_base entries read
    INFO: ngram_model_dmp.c(487): 19 = ascii word strings read
    INFO: ngram_search_fwdtree.c(99): 19 unique initial diphones
    INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 10 single-phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 10 single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 157
    INFO: ngram_search_fwdtree.c(339): after: 19 root, 29 non-root channels, 9 single-phone words
    INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 0 words
    ERROR: “ngram_search.c”, line 1161: Couldn’t find <s> in first frame
    ERROR: “ngram_search.c”, line 1161: Couldn’t find <s> in first frame
    2016-07-13 16:24:39.382 myApp[2260:728503] Pocketsphinx heard “” with a score of (0) and an utterance ID of 2.
    2016-07-13 16:24:39.382 myApp[2260:728503] Hypothesis was null so we aren’t returning it. If you want null hypotheses to also be returned, set OEPocketsphinxController’s property returnNullHypotheses to TRUE before starting OEPocketsphinxController.
    INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree -0.00 CPU -inf xRT
    INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 0.00 wall inf xRT
    INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat -0.00 CPU -inf xRT
    INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.00 wall inf xRT
    INFO: ngram_search.c(307): TOTAL bestpath 0.00 CPU nan xRT
    INFO: ngram_search.c(310): TOTAL bestpath 0.00 wall nan xRT

    #1030677
    Halle Winkler
    Politepix

    Welcome,

    A couple of things – the most important is that when you have a debug question, it’s also necessary to show the entire output from OELogging as explained here: https://www.politepix.com/forums/topic/install-issues-and-their-solutions/

    But the approach of loading a recorded file into an AVAudioRecorder instance and then referencing it from that instance’s URL doesn’t really make sense to me – the method is expecting a string which is the path to an already-completed and closed file which is a 16-bit, 16k mono WAV. The code you’ve shown doesn’t suggest that that is what is being passed to the method. It just looks like it can’t open and read the contents of the path, which is what I’d expect in this case. Just go ahead and pass a path to a completed and closed WAV file with the right format to the method, and skip the indirection or breakage that the AVAudioRecorder instance is causing.

    #1030689
    juan14nob
    Participant

    Hello, thank you for your answer!
    I Could send the wav file (16bits – 16,000 rate – mono), but when I call: “runRecognitionOnWavFileAtPath …” i am still facing some problems when translating from speech to text : the words do not match well.
    I would like to know if I should use any audio filter to help the translation process or use a better quality record.

    Thanks,
    Regards

    #1030690
    Halle Winkler
    Politepix

    Hello,

    Sorry, I don’t know the reason for that. I recommend taking a look at the FAQ and other forum posts on the topic of accuracy.

Viewing 4 posts - 1 through 4 (of 4 total)
  • You must be logged in to reply to this topic.