Table of Contents:


Using OpenEars In Your App

What we’ll be doing:

For a speech recognition project, learning how to define a custom language model for your app if you need to create one to ship with (this is not required if you would prefer to either link to a large general-purpose vocabulary or to create your language model at runtime using OpenEars methods), and learning how to use OpenEars classes and methods inside your own voice recognition/text-to-speech project.

 

Topics:

1. Adding a language model to your app
2. Using the four important classes of OpenEars
3. Configuration options
4. Handling interruptions

 

1. Adding a language model to your app

Why a language model?
Speech recognition works by comparing the sounds heard over the mic with what is called a language model or a grammar, consisting of the words that the app “knows”, their pronunciations as understood by the acoustic model, and some rules about how probable those words are to be said or about what grammatical contexts the words can be understood in. This is a deep topic but it isn’t necessary to delve into it in order to get OpenEars working to understand your users.

You aren’t required to add your language model at this step: you can also generate it dynamically in-app using the method:

- (NSError *) generateLanguageModelFromArray:(NSArray *)languageModelArray withFilesNamed:(NSString *)fileName;

that you can read about later on this page. That method lets you very easily pass an NSArray of your words in plain English that OpenEars will turn into a probability-calculated ARPA language model, e.g. [NSArray arrayWithObjects:@"WORD", @"A PHRASE", nil] will be converted for you into a complete language model that recognizes the NSStrings in the NSArray in speech. There are a lot of requests for a big starter vocabulary that will do generalized speech recognition, so I am providing one that you can read more about here. However, there are many reasons that you will see poor recognition and compromised user experience when using such a pre-rolled model and I would recommend reading this forum discussion in order to fully understand why. It is a much better idea to ship your app with a custom built-in language model that corresponds closely to the real tasks that your users will be undertaking, and OpenEars has tools for doing that. This step explains how to create that first language model before runtime by making use of the CMU Language Tool, and later on in the section about LanguageModelGenerator you can read about how to use the built-in language model generation tool for dynamic creation of language models.

JSGF or ARPA?
Assuming that you want to ship your app with a pre-created language model, there are two options for doing this: either creating an ARPA language model or a JSGF grammar. Going with an ARPA model will let you use the dynamic language model creation and on-the-fly language model switching features of the most current verion of OpenEars. If you don’t know which to use, skip down past the JSGF section and read the section which follows starting with the header “Using an ARPA language model”, which will explain how to create a new language model or link an existing large language model.

Using a JSGF grammar
Explaining the intricacies of creating JSGF grammars are beyond the scope of these instructions, so I will just say that if you have a JSGF grammar and matching .dic file already and you know what to expect of them, you can use JSGF by adding your JSGF (.gram) file and .dic file to your project (making sure that you have selected adding them to your target in the add dialog) and then jumping down to the PocketsphinxController section of “Using the four important classes of OpenEars” below to learn how to start recognition using that grammar. Otherwise, continue on to learn how to create a custom ARPA language model for use with OpenEars by using the CMU Language Tool.

Using an ARPA language model and using the CMU online tool to generate it

First of all, you will create a text file called a corpus which is just an all-caps list of the words you want Pocketsphinx to be able to recognize.  An example corpus could be a file called corpus.txt consisting of nothing but the following text:

GO
FORWARD
BACKWARD
LEFT
RIGHT
START
STOP
TURN

If you have phrases or sentences that you would like to have recognized in their entirety, you can put them all on a line, for example:

GO
FORWARD
BACKWARD
LEFT
RIGHT
START
STOP IN THE NAME OF LOVE
TURN

This will make it more likely that the entire sentence will be recognized, so if you just need the components of the sentence to be recognized in different combinations, put the individual words in the sentence on their own lines.

Now go to the CMU language tool and upload your corpus by pressing “Choose file” and then “COMPILE KNOWLEDGE BASE”.  The language tool will process your corpus and give you a downloadable archive linked in the sentence “For your convenience there is also a gzip’d tar file version of the above set.”.

Download and unzip the archive.  In this archive will be a file with the suffix “.lm” which you should name to whatever you like with the suffix .languagemodel (for instance, myvoiceproject.languagemodel). Do this renaming now in the Finder (not later after you have imported the file into Xcode), making sure that you don’t accidentally remove the suffix while renaming. You can verify this by dragging the file into Terminal.app and seeing what Terminal.app thinks the file is called, which will match what Xcode will think it is called. There is also a file with the suffix .dic which you should name to whatever you like with the suffix .dic (for instance, myvoiceproject.dic).  Add both the .languagemodel and .dic files to your project (making sure you have selected adding them to your app target in the add dialog).

Please note: the CMU language model tool is fine to use in order to convert a corpus into a language model manually in the course of configuring your own project, but it is not a public API and it is not to be used in an automated fashion by your app. For in-app language model generation, use the LanguageModelGenerator class of OpenEars.

If you want to use a pre-rolled large-vocabulary-sized ARPA language model for Pocketsphinx, and you have read this forum discussion to understand the tradeoffs in accuracy involved in doing so, you can add the following .languagemodel and .dic files to your project:

For the .languagemodel use this file from your install (note: files that end with .DMP are fine to use as ARPA language model files):

[OpenEarsDistribution]/OpenEars/Dependencies/pocketsphinx/model/lm/en_US/hub4.5000.DMP

For the .dic use this file from the Pocketsphinx repository:

https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/pocketsphinx/model/lm/en_US/hub4.5000.dic

This will set you up with a matching 5000 word vocabulary for the default acoustic model which you can then tell PocketsphinxController to start with.

[BACK TO TOPIC LIST]

2. Using the four classes of OpenEars

There are four classes that you can create objects for in your OpenEars-enabled app, the use of all of which can be seen in the example app that ships with the distribution. The classes are:
PocketsphinxController
FliteController
LanguageModelGenerator
OpenEarsEventsObserver

How to use PocketsphinxController:

PocketsphinxController is the class which recognizes user speech.  Add PocketsphinxController to any class by adding the line:

#import <OpenEars/PocketsphinxController.h>

and then instantiating a PocketsphinxController object. That means adding this (or the ARC equivalent) to the header:

#import <OpenEars/PocketsphinxController.h>
@interface ClassIWantToUseAnOpenEarsObjectIn : UIViewController {
     PocketsphinxController *pocketsphinxController;
}

@property (nonatomic, retain) PocketsphinxController *pocketsphinxController;
@end

and adding this (or the ARC equivalent) to the implementation:

@implementation ClassIWantToUseAnOpenEarsObjectIn
@synthesize pocketsphinxController;

- (PocketsphinxController *)pocketsphinxController {
    if (pocketsphinxController == nil) {
        pocketsphinxController = [[PocketsphinxController alloc] init]; // Please use this style of lazy accessor to instantiate the object
    }
    return pocketsphinxController;
}

- (void)dealloc {
    [pocketsphinxController release];
    [super dealloc];
}
@end

You can send a PocketsphinxController the following five messages:

[self.myPocketSphinxController startListeningWithLanguageModelAtPath:pathToALanguageModelFileAsString dictionaryAtPath:pathToADictionaryFileAsString languageModelIsJSGF:YESorNO];
[self.myPocketSphinxController stopListening];
[self.myPocketSphinxController suspendRecognition];
[self.myPocketSphinxController resumeRecognition];
[self.myPocketSphinxController changeLanguageModelToFile:pathToALanguageModelFileAsString withDictionary:pathToADictionaryFileAsString];

startListeningWithLanguageModelAtPath:dictionaryAtPath:languageModelIsJSGF begins the recognition loop with the requested language model files that you either added up above (those should be in mainBundle), or created using the LanguageModelGenerator (those should be in the Documents folder of the app), which entails some startup time since the audio device needs to be calibrated against background levels for about 4 seconds. After this calibration, you do not need to force another calibration later — the calibration will update itself against changing background noise levels. If you are using a JSGF grammar (a .gram file), run with languageModelIsJSGF:TRUE, and if you are using an ARPA grammar (a .languagemodel or .DMP file), run with languageModelIsJSGF:FALSE.

So, if you were using a JSGF grammar called OpenEars.gram and a dictionary called OpenEars.dic you might send the following message:

NSString *gramPath = [NSString stringWithFormat:@"%@/%@",[[NSBundle mainBundle] resourcePath], @"OpenEars.gram"];
NSString *dictionaryPath = [NSString stringWithFormat:@"%@/%@",[[NSBundle mainBundle] resourcePath], @"OpenEars.dic"];
[self.pocketsphinxController startListeningWithLanguageModelAtPath:gramPath dictionaryAtPath:dictionaryPath languageModelIsJSGF:YES];

and if you were using an ARPA language model called OpenEars.languagemodel you might do it this way:

NSString *lmPath = [NSString stringWithFormat:@"%@/%@",[[NSBundle mainBundle] resourcePath], @"OpenEars.languagemodel"];
NSString *dictionaryPath = [NSString stringWithFormat:@"%@/%@",[[NSBundle mainBundle] resourcePath], @"OpenEars.dic"];
[self.pocketsphinxController startListeningWithLanguageModelAtPath:lmPath dictionaryAtPath:dictionaryPath languageModelIsJSGF:NO];

and if you were using a language model that you created dynamically, you would substitute [[NSBundle mainBundle] resourcePath] with the path to the Documents folder, which you can read about obtaining here.

stopListening ends the recognition loop and it is necessary to do this before releasing a PocketSphinxController object.

suspendRecognition does not stop the loop, but it halts speech processing of sound going in the audio input. It is usually instantaneous.

resumeRecognition begins speech processing of sound going in the audio input again. It is usually instantaneous.

changeLanguageModelToFile:withDictionary: switches out the language model for the speech recognizer to the new one you have specified without stopping the recognition loop. After calling this method, the next speech perceived will be compared to your new model instead of the old one. This method only works with ARPA models. Make sure you pass the complete path to your second model as an NSString.

You can also get the input level metering returned as a float by calling:

[self.myPocketSphinxController pocketsphinxInputLevel];

Although you can’t call this frequently enough to animate UI from it without calling it on a background thread.  There is an example of an approach for multithreading this kind of level meter reading in the example app.

Only have one living instance of PocketsphinxController at any given moment.

How to use FliteController:

FliteController is the class that converts text to speech.

Add FliteController to any class by adding the line:

 #import <OpenEars/FliteController.h>

and then instantiating a FliteController object. That means adding this (or the ARC equivalent) to the header:

#import <OpenEars/FliteController.h>
 @interface ClassIWantToUseAnOpenEarsObjectIn : UIViewController { FliteController *fliteController; } @property (nonatomic, retain) FliteController *fliteController; @end 

and adding this (or the ARC equivalent) to the implementation:

@implementation ClassIWantToUseAnOpenEarsObjectIn
@synthesize fliteController;

- (FliteController *)fliteController {
    if (fliteController == nil) {
        fliteController = [[FliteController alloc] init]; // Please use this style of lazy accessor to instantiate the object
    }
    return fliteController;
}

- (void)dealloc {
    [fliteController release];
    [super dealloc];
}
@end

You can send a FliteController the following message:

[self.fliteController say:@"A phrase I'd like my app to speak out loud." withVoice:@"cmu_us_awb8k"];

Where the options for arguments you can enter after withVoice: are as follows:

cmu_us_awb8k
cmu_us_rms8k
cmu_us_slt8k
cmu_time_awb
cmu_us_awb
cmu_us_kal
cmu_us_kal16
cmu_us_rms
cmu_us_slt

You can read descriptions of what these voices are like in the file OpenEarsConfig.h of the OpenEars.xcodeproj. You can time voice processing in order to figure out which voice is optimal to use for a given device by turning on OPENEARSLOGGING in OpenEarsConfig.h. After making any changes to OpenEarsConfig.h it is always necessary to recompile the framework in OpenEars.xcodeproj.

You can change the speed, pitch and variability of a voice by using the following properties of FliteController:

self.fliteController.duration_stretch = .9; // Change the speed
self.fliteController.target_mean = 1.2; // Change the pitch
self.fliteController.target_stddev = 1.5; // Change the variance

1.0 is the default, 0.01 is the minimum value and 2.0 is a probably maximum value (although you can go higher, it’s just unlikely to be useful). a duration_stretch of .5 will be twice as fast as 1.0, a duration_stretch of 2.0 will be half the speed of 1.0. A target_mean or target_stddev of 2.0 will double the pitch or the variance, while 0.5 will reduce them by half.

You do not have to set or override these settings — if you don’t use them, they will be set to the defaults for the voice. If you want to return them to the defaults after overriding them, simple set them all to 1.0:

self.fliteController.duration_stretch = 1.0; // Reset the speed
self.fliteController.target_mean = 1.0; // Reset the pitch
self.fliteController.target_stddev = 1.0; // Reset the variance

You can also get the Flite speech level metering returned as a float by calling:

[self.fliteController fliteOutputLevel];

Although you can’t call this frequently enough to animate UI from it without calling it on a background thread.  There is an example of an approach for multithreading this kind of level meter reading in the example app.

I believe it should be OK to have multiple instances of FliteController but I’d imagine that things will get weird if they try to speak simultaneously.

How to use OpenEarsEventsObserver:

OpenEarsEventsObserver is a class that delivers granular information about what is going on with the AudioSession, Pocketsphinx, and Flite in the form of delegate methods.  You can instantiate as many instances of it in as many different parts of your app as you need in order to be able to respond to speech, Audio Session changes, and monitor whether Flite is speaking. It should be extremely lightweight and all of its methods are optional so you can just implement the delegate methods which relate to your goal in a particular class.  To use OpenEarsEventsObserver you have to make the current class the delegate of OpenEarsEventsObserver in the following way:

a. In the header of the class you want to add an OpenEarsEventsObserver to, add:

#import <OpenEars/OpenEarsEventsObserver.h>

at the top and then add the protocol <OpenEarsEventsObserverDelegate> after the class inheritance declaration, just like this line from the sample app:

@interface OpenEarsSampleProjectViewController : UIViewController <OpenEarsEventsObserverDelegate> {

Then, in the class implementation where you’d like to use the delegate methods of OpenEarsEventsObserver, after instantiating your OpenEarsEventsObserver object, you will make the class its delegate as follows:

[self.myOpenEarsEventsObserver setDelegate:self];

An example of how OpenEarsEventsObserver would be added to the header file is as follows (or its ARC equivalent):

#import <OpenEars/OpenEarsEventsObserver.h>

@interface ClassIWantToUseAnOpenEarsObjectIn : UIViewController {
     OpenEarsEventsObserver *openEarsEventsObserver;
}

@property (nonatomic, retain) OpenEarsEventsObserver *openEarsEventsObserver;
@end

and an example of how it would be added to the implementation is as follows (or its ARC equivalent):

@implementation ClassIWantToUseAnOpenEarsObjectIn
@synthesize openEarsEventsObserver;

- (OpenEarsEventsObserver *)openEarsEventsObserver {
    if (openEarsEventsObserver == nil) {
        openEarsEventsObserver = [[OpenEarsEventsObserver alloc] init]; // Please use this style of lazy accessor to instantiate the object
    }
    return openEarsEventsObserver;
}

- (void)dealloc {
    [openEarsEventsObserver release];
    [super dealloc];
}
@end

Now you are able to implement any of the delegate methods of OpenEarsEventsObserver in that class.  Here is a list of the available methods with their use described afterwards in a comment:

// Audio Session Status Methods.
- (void) audioSessionInterruptionDidBegin; // There was an interruption.
- (void) audioSessionInterruptionDidEnd; // The interruption ended.
- (void) audioInputDidBecomeUnavailable; // The input became unavailable.
- (void) audioInputDidBecomeAvailable; // The input became available again.
- (void) audioRouteDidChangeToRoute:(NSString *)newRoute; // The audio route changed.
// Pocketsphinx Status Methods.
- (void) pocketsphinxDidStartCalibration; // Pocketsphinx isn't listening yet but it started calibration.
- (void) pocketsphinxDidCompleteCalibration; // Pocketsphinx isn't listening yet but calibration completed.
- (void) pocketsphinxRecognitionLoopDidStart; // Pocketsphinx isn't listening yet but it has entered the main recognition loop.
- (void) pocketsphinxDidStartListening; // Pocketsphinx is now listening.
- (void) pocketsphinxDidDetectSpeech; // Pocketsphinx heard speech and is about to process it.
- (void) pocketsphinxDidDetectFinishedSpeech; // Pocketsphinx detected a second of silence indicating the end of an utterance
- (void) pocketsphinxDidReceiveHypothesis:(NSString *)hypothesis recognitionScore:(NSString *)recognitionScore utteranceID:(NSString *)utteranceID; // Pocketsphinx has a hypothesis.
- (void) pocketsphinxDidStopListening; // Pocketsphinx has exited the continuous listening loop.
- (void) pocketsphinxDidSuspendRecognition; // Pocketsphinx has not exited the continuous listening loop but it will not attempt recognition.
- (void) pocketsphinxDidResumeRecognition; // Pocketsphinx has not exited the continuous listening loop and it will now start attempting recognition again.
- (void) pocketsphinxDidChangeLanguageModelToFile:(NSString *)newLanguageModelPathAsString andDictionary:(NSString *)newDictionaryPathAsString; // Pocketsphinx switched language models inline.
- (void) pocketSphinxContinuousSetupDidFail; // Some aspect of setting up the continuous loop failed, turn on OPENEARSLOGGING for more info.

// Flite Status Methods.

- (void) fliteDidStartSpeaking; // Flite started speaking. You probably don’t have to do anything about this.

- (void) fliteDidFinishSpeaking; // Flite finished speaking. You probably don’t have to do anything about this.

You can implement as many or as few of these methods as are useful to you in a class.  So, for instance, to see what speech Pocketsphinx recognized if you have a PocketsphinxController instantiated and started up, you can implement:

- (void) pocketsphinxDidReceiveHypothesis:(NSString *)hypothesis recognitionScore:(NSString *)recognitionScore utteranceID:(NSString *)utteranceID;

in your class like this:

- (void) pocketsphinxDidReceiveHypothesis:(NSString *)hypothesis recognitionScore:(NSString *)recognitionScore utteranceID:(NSString *)utteranceID {
    NSLog(@"Pocketsphinx recognized the following phrase:%@", hypothesis);
}

There are thorough examples of how to use all the methods in the example app.

When you are ready to release an OpenEarsEventsObserver object in a class, make sure that you first set its delegate to nil in a line such as:

[self.myOpenEarsEventsObserver setDelegate:nil];

Then release it afterwards.  We are just making sure that nothing tries to message it after it’s gone.
[BACK TO TOPIC LIST]

How to use LanguageModelGenerator

To use LanguageModelGenerator, instantiate it (in any way which works for you; it isn’t as important that LanguageModelGenerator uses the usual class property/lazy accessor style) and send it the message:

[languageModelGenerator generateLanguageModelFromArray:myWordArray withFilesNamed:myPreferredFilename]

myWordArray is an NSArray filled with words you like to be able to recognize written in all capital letters, e.g.:

NSArray *myWordArray = [NSArray arrayWithObjects:@"WORD", @"OTHER", @"PHRASE TO RECOGNIZE",nil];

If you put a whole phrase as one of the array objects, that phrase will have a higher probability of being recognized in its entirety.

myPreferredFilename is an NSString stating how you’d like the .languagemodel and .dic files that LanguageModelGenerator creates to be named. So, if you pass in @”OpenEarsDynamicModel” here, you’ll get a language model saved to your Documents folder with the filenames OpenEarsDynamicModel.languagemodel and OpenEarsDynamicModel.dic. These files are always saved to the Documents folder. If there is already a file there with the same name, it will be overwritten without any warning, so it is your responsibility to make sure to use unique names if this is not the behavior you want.

generateLanguageModelFromArray:withFilesNamed: returns an NSError, and when its NSError has a code of zero, i.e. there was no error, it includes a userInfo dictionary that has the full paths to your new files in it. There is an example of how to read this dictionary in order to get the paths in the sample app and I recommend checking it out. These paths are important because you will want to pass them to PocketsphinxController in order to have it switch to using your new language model, and getting them out of this dictionary can let you do that entire process programmatically. It is also useful for troubleshooting. You can read about how to obtain the path to your documents folder here.
3. Configuration options

There are a few available configuration options for OpenEars in the config file OpenEarsConfig.h.  The options in OpenEarsConfig.h can only be changed from within OpenEars.xcodeproj (not your app or the sample app!). After they are changed, OpenEars.xcodeproj needs to be rebuilt. The options can be set by commenting or uncommenting the #define lines and are as follows:

#define OPENEARSLOGGING // Turn this on to get logging output from audio driver initialization, etc. Please turn on and submit output when reporting a problem.
#define VERBOSEPOCKETSPHINX // Leave this uncommented to get verbose output from Pocketsphinx, comment it out for quiet operation. I'd recommend starting with it uncommented and then commenting when you're pretty sure things are working.
//#define VERBOSECMUCLMTK // Leave this uncommented to get more output from language model generation
//#define USERCANINTERRUPTSPEECH // Turn this on if you wish to let users cut off Flite speech by talking (only possible when headphones are plugged in).
#define kCalibrationTime 1 // This is how many seconds of background sound will be analyzed to calibrate the decoder. You must enter one of the following three options here: 1, 2 or 3. There are no other supported options for what to enter here currently. If you are seeing poor discernment between background and speech, increase this to 2 or 3. 3 is equal to the amount of calibration that happens by default in the shipping version of Pocketsphinx.

// Comment out the voices here that you are not using in your app:
#define cmu_us_awb8k // Uncomment this in order to use the (faster) 8k version of the us_awb voice
#define cmu_us_rms8k // Uncomment this in order to use the (faster) 8k version of the us_rms voice
#define cmu_us_slt8k // Uncomment this in order to use the (faster) 8k version of the us_slt voice
#define cmu_time_awb // Uncomment this in order to use the 16k awb time voice
#define cmu_us_awb //  Uncomment this in order to use the 16k us_awb voice
#define cmu_us_kal //  Uncomment this in order to use the 8k us_kal voice
#define cmu_us_kal16 // Uncomment this in order to use the 16k us_kal voice
#define cmu_us_rms // Uncomment this in order to use the 16k us_rms voice
#define cmu_us_slt // Uncomment this in order to use the 16k us_slt voice

Keep in mind that any time you change anything in OpenEarsConfig.h, you need to rebuild OpenEars.xcodeproj. It is necessary to comment out the unused voices here before shipping so that your app binary isn’t too huge.

[BACK TO TOPIC LIST]
If you have any questions about how to use the things described in this page, I would recommend taking some time to play with the example app, which shows how everything here works.

4. Handling interruptions

Just like with every other audio app, you need to handle interruptions (e.g. an incoming phone call) and route changes (e.g. a plugged or unplugged headset) correctly when using OpenEars. In the case of OpenEars, that means stopping audio processes that are in progress at the time of interruption or route change and restarting them. There is a very clear example of how to handle this in the sample app by using the OpenEarsEventsObserver class to report a route change or interruption and stop recognition, and then restarting it when the interruption or route change is complete. If you don’t react to these changes of device state at all, the odds are good that you can get a crash as a result since OpenEars will be asked to resume right in the middle of a fast loop and that behavior is undocumented. I believe that it is necessary for Pocketsphinx to be recalibrated when there is a route change or interruption, but that will happen automatically when you restart the stopped recognition loop after a route change or interruption.

[BACK TO TOPIC LIST]

If you have any issues you can check out the FAQ/Support page, ask a question in the forums, or if you’d prefer private email support you can purchase a support incident at the Politepix Shop.  Have fun!