Today I am very pleased to be able to announce the shipping of OpenEars 2.0, and the entire OpenEars plugin platform 2.0. If you are a licensed user, this is a free upgrade for you. Links and the upgrade guide can be found at the bottom of this post if you want to jump down and get started.
This year, the CMU Sphinx project did a major revision of their Sphinxbase and Pocketsphinx projects in order to add noise robustness and change their voice activity detection algorithm, among many other cool new features. This was brilliant news, and the work on those features has been deeply impressive to see ship. Many OpenEars-using developers wanted those new noise/VAD features ASAP, and with good reason.
It had an interesting effect on OpenEars’ own development. The very first problem I solved when originally designing this framework was making the calibration and voice activity detection system work with Cocoa Touch audio, which wasn’t an altogether simple thing on the more resource-constrained devices and with the somewhat sparsely-documented state of the low-latency audio APIs for iOS development at that time. More so since the fundamental design of iOS audio was quite different from Linux audio that those CMU libraries were largely used with and developed with. The other capabilities of OpenEars came after these design decisions based on the original calibration and voice activity detection, so most code related to audio or engine usage was designed with deference to those features.
When I sat down this summer to update OpenEars to use the new voice activity detection and noise robustness code, I saw that it was different structurally and didn’t mesh with any of those early decisions anymore, or the ripple-effect decisions that followed them, and that there were two ways I could look at the implications of that: either a) an obligation to add a lot of new scaffolding to attempt to force a similar result from a different design, or b) an opportunity to revisit my original decisions in light of four more years of programming experience.
No difficult choice there – this is our craft, and I’m not interested in adding any more detritus to our collective workspace.
So, the first thing this update adds is functional improvements across the board: it adds noise robustness, better voice activity detection, higher accuracy due to moving to 16k acoustic models with no loss in performance due to greater optimization, and there is now no calibration time needed at all (really, calibration is gone and recognition just starts immediately – this is a fantastic new capability from CMU). The new VAD required a complete rewrite for RapidEars as well, so I applied the same process to RapidEars, which is now more accurate but uses far less CPU under heavy usage, while benefiting from a simplified API. As far as I have been able to verify, there have also been improvements in speech recognition coexistence with video objects, but that is a work in progress and I’ll be interested in hearing your real-world results with video plus speech recognition.
The second benefit of being able to revisit the library design decisions has been in code quality:
• OpenEars code is now using only the most-modern-possible Cocoa Touch APIs and the most-modern-possible Objective-C for the iOS versions it supports.
• OpenEars now has 6947 lines of code versus the previous 10917, which is 63% of the previous amount of code despite having multiple new features.
• The majority of code was removed from the areas most likely to be implicated in an issue, with the predictable improvement to the debugging process.
• The OpenEars classes now produce no general or static analysis warnings, and have no warning suppression (the dependencies, quite unavoidably and normally for a codebase representing multiple decades of development, raise 32-bit implicit conversion warnings and coding-style static analysis warnings, so 32-bit implicit conversion warnings are turned off in build settings after verifying that the OpenEars classes do not raise them, and there are three dependency source files which have the -w flag – if you notice any other warning suppression in OpenEars, let me know since it’s an oversight).
• Consequently, OpenEars is now able to ship with the “Treat warnings as errors” build setting selected.
• OpenEars is now ARC rather than manually memory-managed to allow the best possible optimization in LLVM
• OpenEars continues to support 3 current operating systems (iOS 6.1 through 8.x) so it continues to support more than 98% of installs.
• OpenEars now ships with several of my asynchronous XCTests which you can also use as examples for creating your own asynchronous XCTests for speech recognition (my actual XCTest testbed is a lot larger but it depends on some audio files and code I don’t have permission to ship, as well as the plugin tests). There are also some fuzzing tests using my cross-thread fuzzing tool HWHorrorShow if you’re into that sort of thing.
• OpenEars now follows platform guidelines and uses a class prefix (OE) since there were already occasional conflicts coming up with class names such as AudioSessionManager as the framework was more widely-adopted (and it’s just good citizenship). This mean you need to do some renaming when integrating 2.0, but I figured that a major version release was the only time it was going to be acceptable to address, so I bit the bullet and did it with this release. To reduce the pain from the class naming changes I have made a step-by-step upgrade guide which also covers some API changes in 2.0.
As you can see, this is a big update – it affects a lot of visible things and invisible things and that means some issues are guaranteed, so please upgrade using the upgrade guide or do a new install from the updated tutorial tool and let me know in the forums about any problems, questions, or upgrade/installation troubles so I can help you.
Thank you for choosing OpenEars, apologies in advance for any bumps getting into the new version, and I hope that the results of all the work that has gone into this huge update will delight you and the users of your apps.
All the best,