OpenEars Platform 1.5 is alive! Er, I mean live!
I am extraordinarily happy to introduce the release today of the 1.5 release of the entire OpenEars Platform. This is an update of every part of the platform: OpenEars, RapidEars, Rejecto, NeatSpeech and SaveThatWave, and brings iOS7 compatibility, improved testing tools, and Spanish! That’s right, as of version 1.5, the entire platform now works with both English and Spanish (the only exception to this is OpenEars FliteController for reasons explained below, but Spanish TTS is available in NeatSpeech 1.5). The 1.5 versions of the paid frameworks are free upgrades for existing customers and, of course, OpenEars itself remains free, so I hope this will be the start of some great development for Spanish-language speech-enabled apps.
Here is a rundown of the big changes:
Everything should now work fine with iOS7 and if it doesn’t, please let me know ASAP in the forums since this is of course top priority this week. OpenEars has been iOS7 compatible the whole time, so this is just an improvement to the paid frameworks and their demos.
OpenEars is now bilingual. I hope there will be more languages to come, now that all of the infrastructure has been created to support running with multiple languages. This means that OpenEars can create dynamic language models in Spanish, perform speech recognition on Spanish speech using those language models, RapidEars can do live recognition with Spanish speech, Rejecto can create Spanish-language rejection models, and NeatSpeech can do Spanish TTS/synthesized speech. The only exception is that because Flite is English TTS, OpenEars‘ free FliteController does not have a Spanish synthesized voice for TTS, so Spanish TTS is supported through NeatSpeech. Adding support for multiple languages has required a couple of small API changes (the path to the language to use needs to be given to PocketsphinxController and LanguageModelGenerator) so please check out the docs, tutorial, or feel absolutely free to ask in the forums for help if you encounter any issues with updating your code.
I think this is going to be a big deal. Up until now it has been challenging to make replicable tests with PocketsphinxController that really tested the full codebase in use during recognition. There has been a method with which you could submit a recorded WAV, but it didn’t use the audio driver, calibration, voice activity detection, language model switching, or the real callback system, so it didn’t have that much to tell you or me about the real-world behavior of the app under development. Starting with version 1.5, it is possible to tell PocketsphinxController to run over a pre-recorded audio file instead of live mic input using the primary listening method. The buffers from the audio file are injected directly into the buffer callback in OpenEars‘ audio driver, so it is identical behaviorally to live input, which should make troubleshooting and evaluation much easier for you, as well as providing me with a reliable way to regression test and replicate issue reports (since you can now just give me a recording to run through the system when you want to show me an unexpected behavior). Having this testing tool has been indispensible for producing this complex interdependent release and I think it is going to make it easier for everyone to bring their ideas to fruition with less effort and more confidence. The testing tool also works identically in RapidEars.
In greater detail:
• In order to conform better to Apple’s guidelines, all saved files are now saved to Library/Caches rather than Documents, since they are not created by the user. If you have any code which makes hardcoded calls to OpenEars items saved to Documents, just change your NSDocumentDirectory references in them to NSCachesDirectory and they should work identically.
• Many areas of code have been improved/simplified and some tiny leaks have been removed.
• NeatSpeech has a couple more US voices and now supports Spanish!
• NeatSpeech also has some new OpenEarsEventsObserver callbacks which were requested such as knowing which phrase is currently speaking.
• NeatSpeech has received several small incremental improvements to the naturalness of its speech.
• You can now prime a voice before using it so that all of its initialization is complete at the time you want to start speaking.
• All voices have had their volume increased to the maximum possible without clipping. In some cases this is a lot, in some cases it is only a little, but in every case, it’s as much as is possible without resulting in distortion.
• RapidEars now works with Spanish!
• RapidEars is now compatible with SaveThatWave.
• Like OpenEars’ PocketsphinxController, RapidEars can be run over a audio recording instead of live recognition for testing purposes.
• You now specify an acoustic model with RapidEars. This is an API change so take a look at the header, documentation or tutorial to see how to update your code, or feel absolutely free to ask a question in the forums if you encounter any issues updating your code.
• Rejecto now works with Spanish!
• OpenEars now supports creating language models and performing recognition on both English and Spanish.
• OpenEars now has an AcousticModel class which lets you conveniently specify the path to the acoustic model you are using. Examples of its use are [AcousticModel pathToModel:@”AcousticModelEnglish”] or [AcousticModel pathToModel:@”AcousticModelEnglish”]. Since it includes checking of the availability of the files within, it should help a lot with more informative behavior in the cases in which acoustic models are not successfully added to the app.
• There is a new acoustic model bundle system so you can have an app with multiple acoustic model bundles, or create your own acoustic model bundles, without having excessive loose files in your app’s mainBundle.
• Flite now has a property noAudioSessionOverrides which, when set to TRUE, will run Flite without AudioSessionManager. This is only for apps which don’t use PocketsphinxController, since PocketsphinxController requires the audio session overrides.
• PocketsphinxController now takes an acoustic model path as an argument to startListening: so you can easily begin and end recognition for specific acoustic models (including the currently-shipping English and Spanish acoustic models). This is an API change so take a look at the header, documentation or tutorial to see how to update your code, or feel absolutely free to ask a question in the forums if you encounter any issues updating your code.
• You can now set audioSessionMixing as a property of PocketsphinxController.
• You can now select different AudioSessionMode settings by setting the audioMode property of PocketsphinxController, letting you experiment with different levels of noise reduction and AGC in your app when using iOS 5.0 and higher (thanks to forums participant hartsteins for this idea and implementation!).
• OpenEars‘ audio session re-routing only re-routes to the speaker for the single case in which the device is being used with the built-in mic and speaker (thanks to forums participant hartsteins for this idea and implementation!).
• LanguageModelGenerator can now generate language models for Spanish recognition, and to support this it now takes a path to the acoustic model that is the target for the language model as an argument. This is an API change so take a look at the header, documentation or tutorial to see how to update your code, or feel absolutely free to ask a question in the forums if you encounter any issues updating your code. The property “dictionaryPathAsString” was removed from LanguageModelGenerator because all dictionaries now need to reside within an acoustic model bundle, but it is still possible to use a custom dictionary by placing it inside of an acoustic model bundle. More documentation of how to do this will be coming shortly or feel free to ask directly in the forums for help if the removal of the property affects your implementation.
• PocketsphinxController now supports running recognition directly over a WAV file that is directly injected into the audio driver buffer callback so you can repeatedly test recognition on the same input. You can read more about this in the PocketsphinxController documentation or its header.
• SmartCMN. You may have noticed that the very first recognition in a session is sometimes not as good as the accuracy of the rest of the session. OpenEars has a new system for preventing this called SmartCMN, which adapts to your users’ microphone usage habits to remember the settings for later recognition rounds after the first one and then start off with better recognition.
With any update of this size, there are bound to be a couple of bumps, so please just let me know if you experience any issues and I will help. Otherwise, I hope that this update provides new tools and improvements on existing tools for your speech app toolkit!