RapidEars: Real Time Speech Recognition for OpenEars from Politepix

Introducing RapidEars!

RapidEars is a plugin for OpenEars which lets it do live speech recognition in realtime. It is so responsive that it can even be used as a gaming input, and it doesn't use a network connection.

Introduction, Installation and Support

Introduction

RapidEars is a plugin for OpenEars® which adds the new ability to do real time speech recognition on in-progress speech for live voice recognition on the iPhone, iPod and iPad, with the English acoustic model AcousticModelEnglish.bundle and the other compatible acoustic models found at the acoustic model download page. If your application has a need for speed and you are shipping for devices that can support more CPU overhead, all you have to do is install the RapidEars plugin in your working OpenEars project and OEPocketsphinxController and OEEventsObserver will get some new methods that will let you stop waiting for pauses in speech and start evaluating immediately.

It's great for games or any application that needs real time speech recognition and fast feedback. Because of the optimizations that are needed to do instant recognition on the iPhone, its live recognition does not quite have the accuracy rates of OEPocketsphinxController in the stock version of OpenEars (its final recognition is the same, however) and should be used with smaller vocabularies.

RapidEars can be purchased at the Politepix shop here and it's very important to thoroughly evaluate the demo version before purchasing, which can be downloaded from this shop page. The installation and usage process is the same for both the demo and licensed version, but the demo times out after 3 minutes of use and can't be submitted to the App Store.

The best way to get started using RapidEars is to get a tutorial from the Politepix interactive tutorial tool. Steps for getting started and more in-depth documentation are also provided on this page.

Installation

How to install and use RapidEars:

RapidEars is a plugin for OpenEars, so it is added to an already-working OpenEars project in order to enable new OpenEars features. In these instructions we are using the OpenEars sample app as an example for adding the plugin and new features, but the steps are basically the same for any app that already has a working OpenEars installation. Please note that RapidEars requires OpenEars 2.5 or greater.

Download and try out the OpenEars distribution and try the OpenEars sample app out. RapidEars is a plug-in for OpenEars that is added to an OpenEars project so you first need a known-working OpenEars app to work with. The OpenEars sample app is fine for this to get started. You can also get a complete tutorial on both creating an OpenEars app and adding RapidEars to it using the automatic customized tutorial.

Open up the OpenEars Sample App in Xcode. Drag your downloaded RapidEarsDemo.framework into the OpenEars sample app project file navigator.

Open up the Build Settings tab of your app or OpenEarsSampleApp and find the entry "Other Linker Flags" and add the linker flag "-ObjC". Do this for debug and release builds. More explanation of this step can be seen in the tutorial by selecting the live recognition tutorial, which will also show exactly how to use the new methods added by RapidEars. Next, navigate to Xcode's Build Settings for your target and find the setting "Framework Search Paths".

If adding the framework in previous step did not automatically add it to "Framework Search Paths", add it manually. You can find the path by going into the Project Navigator (the main Xcode project file view), finding and then selecting your just-added RapidEarsDemo.framework, and typing ⌘⌥-1 to open the File Inspector for it (it may already be open – it is the narrow window pane on the far right of the main Xcode interface). The full path to your added framework is shown under "Identity and Type"->"Full Path". The "Framework Search Path" is this path minus the last path element, so if it says /Users/you/Documents/YourApp/Resources/Framework/RapidEarsDemo.framework, the path to add to "Framework Search Paths" is /Users/yourname/Documents/YourApp/Resources/Framework/ and you should keep the "Recursive" checkbox unchecked.

While we're here, take a moment to look at your Framework Search Paths build setting and verify that it doesn't contain any peculiar entries (for instance, entries with many extra quotes and/or backslashed quote marks) and that each search path is on its own line and hasn't been concatenated to another entry, and that the setting isn't pointing to old versions of the frameworks you're installing that are in other locations.

Support

With your demo download you can receive support via the forums, according to its rules of conduct. In order to receive forum support, it is necessary to have used accurate information with your initial demo download such as a valid email address and your name.

Once you have completed licensing of the framework for your app, forum support will continue to be available to you, and if you need private support via email it is possible to purchase a support contract or individual support incidents at the shop.

Licensing the framework requires giving the exact application name that the framework will be linked to, so don't purchase the license until you know the app name, and again, please try the demo first. It is not possible to change bundle IDs after a purchase, and there are no refunds post-purchase, due to the ability to completely test the comprehensive demo over a full development period.

Please read on for the RapidEars documentation.

BACK TO TOP

OEEventsObserver+RapidEars Category Reference

Detailed Description

This plugin returns the results of your speech recognition by adding some new callbacks to the OEEventsObserver.

Usage examples

What to add to your implementation:

At the top of your header after the line
#import <OpenEars/OEEventsObserver.h>
Add the line
#import <RapidEarsDemo/OEEventsObserver+RapidEars.h>
And after this OEEventsObserver delegate method you added to your implementation when setting up your OpenEars app:
- (void) testRecognitionCompleted {
	NSLog(@"A test file that was submitted for recognition is now complete.");
}
Just add the following extended delegate methods:
- (void) rapidEarsDidReceiveLiveSpeechHypothesis:(NSString *)hypothesis recognitionScore:(NSString *)recognitionScore {
    NSLog(@"rapidEarsDidReceiveLiveSpeechHypothesis: %@",hypothesis);
}

- (void) rapidEarsDidReceiveFinishedSpeechHypothesis:(NSString *)hypothesis recognitionScore:(NSString *)recognitionScore {
    NSLog(@"rapidEarsDidReceiveFinishedSpeechHypothesis: %@",hypothesis);
}

Warning
It is a requirement that any OEEventsObserver you use in a view controller or other object is a property of that object, or it won't work.

Method Documentation

- (void) rapidEarsDidReceiveLiveSpeechHypothesis: (NSString *)  hypothesis
recognitionScore: (NSString *)  recognitionScore 

The engine has detected in-progress speech. This is the simple delegate method that should be used in most cases, which just returns the hypothesis string and its score. Swift 3: rapidEarsDidReceiveLiveSpeechHypothesis(_ hypothesis: String!, recognitionScore: String!)

- (void) rapidEarsDidReceiveFinishedSpeechHypothesis: (NSString *)  hypothesis
recognitionScore: (NSString *)  recognitionScore 

A final speech hypothesis was detected after the user paused. This is the simple delegate method that should be used in most cases, which just returns the hypothesis string and its score. Swift 3: rapidEarsDidReceiveFinishedSpeechHypothesis(_ hypothesis: String!, recognitionScore: String!)

- (void) rapidEarsDidDetectLiveSpeechAsWordArray: (NSArray *)  words
andScoreArray: (NSArray *)  scores 

The engine has detected in-progress speech. Words and respective scores are delivered in separate arrays with corresponding indexes. Swift 3: rapidEarsDidDetectLiveSpeech(asWordArray words: [Any]!, andScoreArray scores: [Any]!)

- (void) rapidEarsDidDetectFinishedSpeechAsWordArray: (NSArray *)  words
andScoreArray: (NSArray *)  scores 

A final speech hypothesis was detected after the user paused. Words and respective scores are delivered in separate arrays with corresponding indexes. Swift 3: rapidEarsDidDetectFinishedSpeech(asWordArray words: [Any]!, andScoreArray scores: [Any]!)

- (void) rapidEarsDidDetectLiveSpeechAsWordArray: (NSArray *)  words
scoreArray: (NSArray *)  scores
startTimeArray: (NSArray *)  startTimes
endTimeArray: (NSArray *)  endTimes 

The engine has detected in-progress speech. Words and respective scores and timing are delivered in separate arrays with corresponding indexes. Swift 3: rapidEarsDidDetectLiveSpeech(asWordArray words: [Any]!, scoreArray scores: [Any]!, startTime startTimes: [Any]!, endTime endTimes: [Any]!)

- (void) rapidEarsDidDetectFinishedSpeechAsWordArray: (NSArray *)  words
scoreArray: (NSArray *)  scores
startTimeArray: (NSArray *)  startTimes
endTimeArray: (NSArray *)  endTimes 

A final speech hypothesis was detected after the user paused. Words and respective scores and timing are delivered in separate arrays with corresponding indexes. Swift 3: rapidEarsDidDetectFinishedSpeech(asWordArray words: [Any]!, scoreArray scores: [Any]!, startTime startTimes: [Any]!, endTime endTimes: [Any]!)

- (void) rapidEarsDidDetectLiveSpeechAsNBestArray: (NSArray *)  words
andScoreArray: (NSArray *)  scores 

The engine has detected in-progress speech. N-Best words and respective scores are delivered in separate arrays with corresponding indexes. Swift 3: rapidEarsDidDetectLiveSpeech(asNBestArray words: [Any]!, andScoreArray scores: [Any]!)

- (void) rapidEarsDidDetectFinishedSpeechAsNBestArray: (NSArray *)  words
andScoreArray: (NSArray *)  scores 

A final speech hypothesis was detected after the user paused. N-Best words and respective scores are delivered in separate arrays with corresponding indexes. Swift 3: rapidEarsDidDetectFinishedSpeech(asNBestArray words: [Any]!, andScoreArray scores: [Any]!)

BACK TO TOP

OEPocketsphinxController+RapidEars Category Reference

Detailed Description

A plugin which adds the ability to do live speech recognition to OEPocketsphinxController.

Usage examples

Preparing to use the class:

Like OEPocketsphinxController which it extends, we need a language model created with OELanguageModelGenerator before using OEPocketsphinxController+RapidEars. We have already completed that step above.

What to add to your implementation:

Add the following to the top of your implementation (the .m file), after the line #import <OpenEars/OEPocketsphinxController.h>:
#import <RapidEarsDemo/OEPocketsphinxController+RapidEars.h>
Next, comment out all calls in your app to the method
startListeningWithLanguageModelAtPath:dictionaryAtPath:languageModelIsJSGF:
and in the same part of your app where you were formerly using this method, place the following:

[[OEPocketsphinxController sharedInstance] startRealtimeListeningWithLanguageModelAtPath:lmPath dictionaryAtPath:dicPath acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelEnglish"]]; // Starts the rapid recognition loop. Change "AcousticModelEnglish" to "AcousticModelSpanish" in order to perform Spanish language recognition.

If you find that sometimes you are getting live recognition and other times not, make sure that you have definitely replaced all instances of startListeningWithLanguageModelAtPath: with startRealtimeListeningWithLanguageModelAtPath:.
Warning
Please read OpenEars' OEPocketsphinxController for information about instantiating this object.

Method Documentation

- (void) startRealtimeListeningWithLanguageModelAtPath: (NSString *)  languageModelPath
dictionaryAtPath: (NSString *)  dictionaryPath
acousticModelAtPath: (NSString *)  acousticModelPath 

Start the listening loop. You will call this instead of the old OEPocketsphinxController method. Swift 3: startRealtimeListeningWithLanguageModel(atPath: String!, dictionaryAtPath: String!, acousticModelAtPath: String!)

- (void) setRapidEarsToVerbose: (BOOL)  verbose

Turn logging on or off. Swift 3: setRapidEarsToVerbose(verbose: Bool)

- (void) setFinalizeHypothesis: (BOOL)  finalizeHypothesis

You can decide not to have the final hypothesis delivered if you are only interested in live hypotheses. This will save some CPU work. Swift 3: setFinalizeHypothesis(finalizeHypothesis: Bool)

- (void) setReturnDuplicatePartials: (BOOL)  duplicatePartials

Setting this to true will cause you to receive partial hypotheses even when they match the last one you received. This defaults to FALSE, so if you only want to receive new hypotheses you don't need to use this. Swift 3: setReturnDuplicatePartials(duplicatePartials: Bool)

- (void) setRapidEarsReturnNBest: (BOOL)  nBest

EXPERIMENTAL. This will give you N-Best results, most likely at the expense of performance. This can't be used with setReturnSegments: or setReturnSegmentTimes: . This is a wholly experimental feature and can't be used with overly-large language models or on very slow devices. It is the sole responsibility of the developer to test whether performance is acceptable with this feature and to reduce language model size and latencyTuning in order to get a good UX – there is absolutely no guarantee given that this feature will not result in searches which are too slow to return if it has too much to do. If you want to turn nbest off, you must first set this and OEPocketsphinxController's returnNBest property to FALSE, and after that you must set other setters you are switching to such as setReturnSegments, setReturnSegmentTimes, etc. If this experimental feature results in an excess of support requests due to the experimental nature of the feature, resulting in expected issues such as slowness of realtime n-best being reported as high-priority bugs, it could end up being removed in a future version. Swift 3: setRapidEarsReturnNBest(nBest: Bool)

- (void) setRapidEarsNBestNumber: (NSUInteger)  rapidEarsNBestNumber

This is the maximum number of nbest results you want to receive. You may receive fewer than this number but will not receive more. This defaults to 3 for RapidEars; larger numbers are likely to use more CPU and smaller numbers less. Settings below 2 are invalid and will be set to 2. It is not recommended to ever set this above 3 for realtime processing. Swift 3: setRapidEarsNBestNumber(rapidEarsNBestNumber: UInt)

- (void) setReturnSegments: (BOOL)  returnSegments

Setting this to true will cause you to receive your hypotheses as separate words rather than a single NSString. This is a requirement for using OEEventsObserver delegate methods that contain timing or per-word scoring. This can't be used with N-best. Swift 3: setReturnSegments(returnSegments: Bool)

- (void) setReturnSegmentTimes: (BOOL)  returnSegmentTimes

Setting this to true will cause you to receive segment hypotheses with timing attached. This is a requirement for using OEEventsObserver delegate methods that contain word timing information. It only works if you have setReturnSegments set to TRUE. This can't be used with N-best. Swift 3: setReturnSegmentTimes(returnSegmentTimes: Bool)

- (void) setLatencyTuning: (NSInteger)  latencyTuning

This can take a value between 1 and 4. 4 means the lowest-latency for partial hypotheses and 1 means the highest. The lower the latency, the higher the CPU overhead, and vice versa. This defaults to 4 (the lowest latency and highest CPU overhead) so you will reduce it if you have a need for less CPU overhead until you find the ideal balance between CPU overhead and speed of hypothesis. Swift 3: setLatencyTuning(latencyTuning: Int)

rapidears_icon_reflection

Demo is identical to licensed version, but times out after a few minutes. Changelog »

Download The RapidEars Demo
Go to the quickstart tutorial Buy RapidEars
 

Help with OpenEars™

There is free public support for OpenEars™ in the OpenEars Forums, and you can also purchase private email support at the Politepix Shop. Most OpenEars™ questions are answered in the OpenEars support FAQ.