 matth
|
I am using OpenEars in an app that (hopefully) will have some videos in it (intro, segues, etc.). However, I am unable to get pocketsphinx (on the device, in this case an iPhone 3GS running iOS 4.2.1) to detect audio after having played the video (using MPMoviePlayerController).
I have read the post about trying to simultaneously detect speech while playing a video, and from that learned that calling [audioSessionManager startAudioSession] and stopping and re-starting pocketsphinx after the video has completed should allow audio detection again. However, it’s not working for me.
When I don’t play the intro video, the speech is detected. When I do play the intro video, the speech isn’t detected. (All of this refers to running the app on the device, not the simulator). I have searched for ‘audiosession’ and the only instances are the ones that are supposed to be there.
It’s possible that the MPMoviePlayerController isn’t being release by the time I re-start the audio session, but I haven’t found a way to definitely release the video player that doesn’t cause the app to crash.
Below is a copy of the (relevant parts of the) log. The “audioSessionManager (lazy allocator)” line is where the audio session is initially being started. The “restartAudioSession” line is where I call startAudioSession a second time. The basic sequence is (1) show intro video, (2) show main menu, (3) initialize pocketsphinx but suspend recognition till later, (4) show instructions to user, (5) play audio clip, (6) resume recognition. The audio clips don’t seem to interfere at all (I’m using AVAudioPlayer), it’s just the video. The final line (“pocketsphinxDidResumeRecognition”) is where I am speaking, but the app doesn’t seem to be getting any audio input at all.
Thoughts? Thanks,
- Matt
2011-07-15 13:56:17.149 SeeingNumbers[1216:307] application:didFinishLaunchingWithOptions:
2011-07-15 13:56:17.166 SeeingNumbers[1216:307] audioSessionManager (lazy allocator)
2011-07-15 13:56:17.410 SeeingNumbers[1216:307] SeeingNumbersVieController viewDidLoad
2011-07-15 13:56:17.418 SeeingNumbers[1216:307] SeeingNumbersViewController viewDidAppear:animated
2011-07-15 13:56:17.435 SeeingNumbers[1216:307] applicationDidBecomeActive
2011-07-15 13:56:17.439 SeeingNumbers[1216:307] showIntroVideo
[Switching to process 13059 thread 0x0]
2011-07-15 13:56:31.171 SeeingNumbers[1216:307] introVideoFinished
2011-07-15 13:56:31.197 SeeingNumbers[1216:307] restartAudioSession
2011-07-15 13:56:31.202 SeeingNumbers[1216:307] OPENEARSLOGGING: The audio session has already been initialized, continuing to set its properties.
2011-07-15 13:56:31.205 SeeingNumbers[1216:307] showMainMenu
2011-07-15 13:56:36.161 SeeingNumbers[1216:307] showModuleFive
2011-07-15 13:56:36.167 SeeingNumbers[1216:307] mainMenuButtonClicked: 5
2011-07-15 13:56:36.171 SeeingNumbers[1216:307] initWithNibName…
2011-07-15 13:56:36.192 SeeingNumbers[1216:307] moduleVC:
2011-07-15 13:56:36.247 SeeingNumbers[1216:307] ModuleViewController viewDidLoad
2011-07-15 13:56:36.255 SeeingNumbers[1216:307] ModuleViewController completed [super viewDidLoad]
2011-07-15 13:56:36.261 SeeingNumbers[1216:307] ModuleViewController completed setting OEEO delegate
2011-07-15 13:56:36.266 SeeingNumbers[1216:307] showInstructions:0
2011-07-15 13:56:36.804 SeeingNumbers[1216:307] ModuleViewController viewDidAppear
2011-07-15 13:56:36.816 SeeingNumbers[1216:730f] OPENEARSLOGGING: Recognition loop has started
INFO: cmd_ln.c(512): Parsing command line:
\
-lm /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app/ZeroThroughTen.languagemodel \
-dict /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app/ZeroThroughTen.dic \
-fdict /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app/noisedict \
-hmm /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app \
-maxhmmpf 3000 \
-maxwpf 5
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-argfile
-ascale 20.0 2.000000e+01
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app/ZeroThroughTen.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app/noisedict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app/ZeroThroughTen.languagemodel
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 3000
-maxnewoov 20 20
-maxwpf -1 5
-mdef
-mean
-mfclogdir
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
2011-07-15 13:56:36.817 SeeingNumbers[1216:307] pocketsphinxRecognitionLoopDidStart
INFO: cmd_ln.c(512): Parsing command line:
\
-nfilt 20 \
-lowerf 1 \
-upperf 4000 \
-wlen 0.025 \
-transform dct \
-round_filters no \
-remove_dc yes \
-svspec 0-12/13-25/26-38 \
-feat 1s_c_d_dd \
-agc none \
-cmn current \
-cmninit 39 \
-varnorm no
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 39
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.000000e+00
-ncep 13 13
-nfft 512 512
-nfilt 40 20
-remove_dc no yes
-round_filters yes no
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec 0-12/13-25/26-38
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 4.000000e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.500000e-02
INFO: acmod.c(238): Parsed model-specific feature parameters from /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app/feat.params
INFO: feat.c(848): Initializing feature stream to type: ’1s_c_d_dd’, ceplen=13, CMN=’current’, VARNORM=’no’, AGC=’none’
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(163): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(520): Reading model definition: /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(330): Reading binary model definition: /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app/mdef
INFO: bin_mdef.c(508): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app/transition_matrices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size
INFO: ms_gauden.c(358): 0 variance values floored
INFO: s2_semi_mgau.c(897): Loading senones from dump file /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app/sendump
INFO: s2_semi_mgau.c(921): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1016): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1293): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(294): Allocating 4120 * 20 bytes (80 KiB) for word entries
INFO: dict.c(306): Reading main dictionary: /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app/ZeroThroughTen.dic
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(309): 13 words read
INFO: dict.c(314): Reading filler dictionary: /var/mobile/Applications/ADECE6AE-B3D0-4FE0-9971-4C7EFB9B840A/SeeingNumbers.app/noisedict
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(317): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(405): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(476): ngrams 1=13, 2=22, 3=11
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(515): 13 = #unigrams created
INFO: ngram_model_arpa.c(194): Reading bigrams
INFO: ngram_model_arpa.c(531): 22 = #bigrams created
INFO: ngram_model_arpa.c(532): 3 = #prob2 entries
INFO: ngram_model_arpa.c(539): 3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(291): Reading trigrams
INFO: ngram_model_arpa.c(552): 11 = #trigrams created
INFO: ngram_model_arpa.c(553): 2 = #prob3 entries
INFO: ngram_search_fwdtree.c(99): 13 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 12 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 12 single-phone words
INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 145
INFO: ngram_search_fwdtree.c(333): after: 13 root, 17 non-root channels, 11 single-phone words
INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
2011-07-15 13:56:37.653 SeeingNumbers[1216:730f] OPENEARSLOGGING: Starting openAudioDevice on the device.
2011-07-15 13:56:37.659 SeeingNumbers[1216:730f] OPENEARSLOGGING: Audio unit wrapper successfully created.
2011-07-15 13:56:37.685 SeeingNumbers[1216:730f] OPENEARSLOGGING: Set audio route to SpeakerAndMicrophone
2011-07-15 13:56:37.689 SeeingNumbers[1216:730f] OPENEARSLOGGING: Setting the variables for the device and starting it.
2011-07-15 13:56:37.692 SeeingNumbers[1216:730f] OPENEARSLOGGING: Looping through ringbuffer sections and pre-allocating them.
2011-07-15 13:56:38.182 SeeingNumbers[1216:730f] OPENEARSLOGGING: Started audio output unit.
2011-07-15 13:56:38.191 SeeingNumbers[1216:730f] OPENEARSLOGGING: Calibration has started
2011-07-15 13:56:38.191 SeeingNumbers[1216:307] pocketsphinxDidStartCalibration
2011-07-15 13:56:42.417 SeeingNumbers[1216:730f] OPENEARSLOGGING: Calibration has completed
2011-07-15 13:56:42.424 SeeingNumbers[1216:730f] OPENEARSLOGGING: Project has these words in its dictionary:
EIGHT
FIVE
FOUR
NINE
ONE
ONE(2)
SEVEN
SIX
TEN
THREE
TWO
ZERO
ZERO(2)
2011-07-15 13:56:42.426 SeeingNumbers[1216:730f] OPENEARSLOGGING: Listening.
2011-07-15 13:56:42.419 SeeingNumbers[1216:307] pocketsphinxDidCompleteCalibration
2011-07-15 13:56:42.429 SeeingNumbers[1216:307] showInstructions:1
2011-07-15 13:56:42.432 SeeingNumbers[1216:307] pocketsphinxDidStartListening
2011-07-15 13:56:42.435 SeeingNumbers[1216:307] pocketsphinxDidSuspendRecognition
2011-07-15 13:56:44.732 SeeingNumbers[1216:307] showInstructions:0
2011-07-15 13:56:44.740 SeeingNumbers[1216:307] showNextCard
2011-07-15 13:56:45.723 SeeingNumbers[1216:307] audioPlayerDidFinishPlaying (completed = YES)
2011-07-15 13:56:45.726 SeeingNumbers[1216:307] (audio was How Many)
2011-07-15 13:56:45.730 SeeingNumbers[1216:307] pocketsphinxDidResumeRecognition
|
 Halle
|
EDIT: there is a new OpenEars version .912 out now which will reset the audio session successfully when [audioSessionManager startAudioSession] is run a second time after an AVPlayer or media player has completed playback. This should remove the necessity that you release the player (which might not be possible for every player type).
In some cases (specifically, when the audio session changes caused an interruption message to be sent to PocketsphinxController so that it exited its loop) it is then necessary to restart the listening loop — experiment to find out whether it is needed.
|
 matth
|
Excellent. I’ll try the new version of OpenEars and see if I can get things working with that. Thanks again for your help (and for creating OpenEars in the first place).
|
 matth
|
That did it! I am now able to show my intro video, and (after I restart the audio session) speech recognition still works. Thanks again.
|
 Halle
|
Awesome, glad it’s working now.
|