Trouble with Spanish Recognition with Test File

Tagged: file, Rejecto, Spanish, test, VAD

This topic has 9 replies, 2 voices, and was last updated 8 years, 8 months ago by Halle Winkler.

Viewing 10 posts - 1 through 10 (of 10 total)

Advertisement: “RapidEars is an OpenEars™ plugin that lets you perform speech recognition while the user is still speaking!”

Author

Posts
August 13, 2015 at 11:16 pm #1026550

piggins
Participant

I’m using a test file for recognition in Spanish, and the recognition, even in complete silence, was being unpredictable. I was initially trying with the Rejecto plugin as well, but I tried disabling that and the recognition was still inaccurate. More often than not, it wouldn’t recognize any words, and when it did, it would only recognize short words such as “la” or “una” instead of all the individual words in a longer phrase. I played with the VAD threshold, trying recognition with values from 0.5 to 5, but got similar results as above. When I tried with the continuous recognition loop, however, it worked great and recognized everything well; the problem only occurred with recognition on a test file. Could you possibly help with this? Thanks!

August 14, 2015 at 9:54 am #1026554

Halle Winkler
Politepix

Hello,

Probably the audio file isn’t 16k/16-bit/mono/WAV. Which API are you using to run recognition on your test file??

August 14, 2015 at 6:17 pm #1026562

piggins
Participant

Hi Halle,

I don’t believe the file format is incorrect because recognition in English on the same type of test file is working properly; only Spanish is giving problems. I’m currently using standard Open Ears recognition with PocketSphinx and no plugins, although I received similar results when trying with the Rejecto plugin as well.

August 14, 2015 at 6:23 pm #1026563

Halle Winkler
Politepix

Which OpenEars method or property are you using to run recognition on your test file? There are two. Keep in mind that the VAD setting for Spanish needs to be much higher, something like 4.3 if I recall correctly, and please make sure you are using v2.041 of OpenEars and 2.04 of any plugins.

I don’t believe the file format is incorrect

You can just check what the file format is and see. I think OS X Quicktime has an info pane to check the format.

August 14, 2015 at 8:10 pm #1026564

piggins
Participant

I’m using setPathToTestFile to run recognition on the test file, not runRecognitionOnWavFileAtPath. I tested with the VAD setting at 4.3 but didn’t get any improvements, and I am using the newest versions of OpenEars and the plugins. I also confirmed that the file format is correct. Thanks again for the help and prompt reply.

August 15, 2015 at 11:18 am #1026571

Halle Winkler
Politepix

OK, sorry, I don’t know what the issue is here – Spanish with setPathToTestFile is part of my testbed and I haven’t seen any issues like this in recent versions of OpenEars. Are you using the most recent version? Are there any errors or warnings in the verbosePocketsphinx/OELogging logs?

https://www.politepix.com/forums/topic/install-issues-and-their-solutions/

August 23, 2015 at 11:43 pm #1026640

piggins
Participant

Hi Halle,

I’m not sure if this would affect anything, but I needed functionality such that the user could end speech recognition at will, instead of the normal OpenEars functionality where it will detect when the user has finished speaking. To do this, I used an instance of AVAudioRecorder to record the file and save it to a local directory. When initializing the instance of AVAudioRecorder, I used the following settings:

NSMutableDictionary *settings = [NSMutableDictionary dictionary];
[settings setValue: [NSNumber numberWithInt:kAudioFormatLinearPCM] forKey:AVFormatIDKey];
[settings setValue: [NSNumber numberWithFloat:16000.0] forKey:AVSampleRateKey];
[settings setValue: [NSNumber numberWithInt:1] forKey:AVNumberOfChannelsKey];
[settings setValue: [NSNumber numberWithInt:16] forKey:AVLinearPCMBitDepthKey];
[settings setValue: [NSNumber numberWithInt: AVAudioQualityMax] forKey:AVEncoderAudioQualityKey];
self.recorder = [[AVAudioRecorder alloc]initWithURL:[NSURL fileURLWithPath:filePath] settings:settings error:&error];

After recording, I would use setPathToTestFile to test the file I’d just recorded, and then call startListening with the Spanish acoustic model. The majority of the time, no words were recognized, but sometimes very short words would be recognized. Could you tell me if the settings I used are correct to work with setPathToTestFile or if there are any other red flags? I looked through the verbosePocketSphix/OELogging logs and there weren’t any errors or warnings with both English and Spanish recognition, even though the recognition in English worked perfectly but Spanish was less accurate. Would there be any ways, other than recording the file and then testing it, to achieve the desired functionality? Thanks so much for your help Halle.

August 24, 2015 at 10:05 am #1026641

Halle Winkler
Politepix

Hi,

Sorry, this usage approach is a little outside what I can support since OpenEars is actively designed not to be a push-to-talk tool and pathToTestFile is for automated testing of the entire audio and recognition harness rather than a WAV recognition method.

For your own further troubleshooting I would check out if you get better results with recordings made by the SaveThatWave demo, comparing results to see if there is a difference between a SaveThatWave audio file and the ones you are creating, which should give you some hints about what to look into, and whether you get better recognition results when submitting the WAV to runRecognitionOnWavFileAtPath: etc, which is the method designed for doing file-based recognition rather than pathToTestFile which is intended as an automated testing tool.

August 25, 2015 at 8:15 am #1026644

piggins
Participant

Hi,

I’ve tested with runRecognitionOnWavFileAtPath and it works much better, but I’m not sure if the Rejecto plugin is working with that method. Do the Open Ears plugins work when using runRecognitionOnWavFileAtPath? In addition, can the VAD threshold be manually set when using the aforementioned method? Thanks.

August 25, 2015 at 12:15 pm #1026646

Halle Winkler
Politepix

Rejecto and RuleORama should work fine with that method. VAD should not be necessary with it since it is for continuous listening, which you are not performing. Give it a try and see what the results are.
Author

Posts

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic.