Introduction

OpenEars® is a shared-source iOS framework for iPhone voice recognition and speech synthesis (TTS). It lets you easily implement local, offline speech recognition in English and five other languages , and English text-to-speech (synthesized speech). OpenEars works on the iPhone, iPod and iPad and uses the open source CMU Sphinx project . OpenEars is free to use in an iPhone, iPad or iPod app. It is the most popular offline framework for speech recognition and speech synthesis on iOS and has been featured in development books such as O'Reilly's Basic Sensors in iOS by Alasdair Allan and Cocos2d for iPhone 1 Game Development Cookbook by Nathan Burba among many other places.

The OpenEars Platform is also a complete development platform for creating your speech recognition and text-to-speech apps including both the free OpenEars SDK documented on this page and a diverse set of plugins that can be added to OpenEars in order to extend and refine its default features: you can read more about the OpenEars platform here . This page is all about the free and shared-source OpenEars SDK, to please read on to learn more about it.

Highly-accurate large-vocabulary recognition (that is, trying to recognize any word the user speaks out of many thousands of known words) is not yet a reality for local in-app processing on a small handheld device given the hardware limitations of the platform; even Siri does its large-vocabulary recognition on the server side. However, Pocketsphinx (the open source voice recognition engine that OpenEars uses) is capable of local recognition of vocabularies with hundreds or even thousands of words depending on the environment and other factors, and performs very well with medium-sized language models (vocabularies). The best part is that it uses no network connectivity because all processing occurs locally on the device.

The current version of OpenEars is 2.509. Download OpenEars or read its changelog . If you are upgrading to OpenEars 2.x from a 1.x version, it is necessary to follow the upgrade guide once in order to successfully upgrade. If you are upgrading from OpenEars 2.0x to OpenEars 2.5x, it is very easy but there are brief instructions in the upgrade guide that will give you a smooth transition.

Features of OpenEars

OpenEars can:

Perform speech recognition in English and in six other languages found on the languages download page including Chinese, German, French, Spanish, Italian, and Dutch.
Perform text-to-speech (synthesized speech) in English and with the NeatSpeech plugin, can also perform text-to-speech in Spanish
Listen continuously for speech on a background thread, while suspending or resuming speech processing on demand, all while using less than 2% CPU on average on current devices (decoding speech, text-to-speech, updating the UI and other intermittent functions use more CPU),
Change the pitch, speed and variance of any text-to-speech voice,
Know whether headphones are plugged in and continue voice recognition during text-to-speech only when they are plugged in,
Support bluetooth audio devices (experimental),
Dispatch information to any part of your app about the results of speech recognition and speech, or changes in the state of the audio session (such as an incoming phone call or headphones being plugged in),
Deliver level metering for both speech input and speech output so you can design visual feedback for both states.
Support JSGF grammars with an easy-to-use human-readable grammar specification language, only from Politepix,
Dynamically generate probability-based language models and rule-based grammars using simple object-oriented language
Switch between ARPA language models or JSGF grammars on the fly,
Get n-best lists with scoring,
Test existing recordings,
Be easily interacted with via standard and simple Objective-C methods,
Control all audio functions with text-to-speech and speech recognition in memory instead of writing audio files to disk and then reading them,
Protect user privacy by performing all recognition offline and not storing speech audio,
Drive speech recognition with a low-latency Audio Unit driver for highest responsiveness,
Be installed in a Cocoa-standard fashion using an easy-peasy already-compiled framework.
In addition to its various new features and faster recognition/text-to-speech responsiveness, OpenEars now has improved recognition accuracy.
OpenEars is free to use in an App Store app.

Warning: Before using OpenEars, please note it has to use a different audio driver on the Simulator that is less accurate, so it is always necessary to evaluate accuracy on a real device. Please don't submit support requests for accuracy issues with the Simulator.

Installation

To use OpenEars:

Download the distribution and unpack it.

Create your own app, and then add the iOS frameworks AudioToolbox and AVFoundation to it.

Inside your downloaded distribution there is a folder called "Framework". Drag the "Framework" folder into your app project in Xcode.

OK, now that you've finished laying the groundwork, you have to...wait, that's everything. You're ready to start using OpenEars. Give the sample app a spin to try out the features and then visit the Politepix interactive tutorial generator for a customized tutorial showing you exactly what code to add to your app for all of the different functionality of OpenEars.

If the steps on this page didn't work for you, you can get free support at the forums , read the FAQ , brush up on the documentation , or open a private email support incident at the Politepix shop . If you'd like to read the documentation, simply read onward.

Basic concepts

There are a few basic concepts to understand about voice recognition and OpenEars that will make it easiest to create an app.

Local or offline speech recognition versus server-based or online speech recognition: most speech recognition on the iPhone, iPod and iPad is done by streaming the speech audio to servers. OpenEars works by doing the recognition inside the device, entirely offline without using the network. This saves bandwidth and results in faster response, but since a server is much more powerful than a phone it means that we have to work with much smaller vocabularies to get accurate recognition.

Language Models. The language model is the vocabulary that you want OpenEars to understand, in a format that its speech recognition engine can understand. The smaller and better-adapted to your users' real usage cases the language model is, the better the accuracy. An good language model for PocketsphinxController has fewer than 1000 words. You define the words that your app uses - it will not know about vocabulary other than the vocabulary that you define.

The parts of OpenEars. OpenEars has a simple, flexible and very powerful architecture.

OEPocketsphinxController recognizes speech using a language model that was dynamically created by OELanguageModelGenerator . OEFliteController creates synthesized speech (TTS). And OEEventsObserver dispatches messages about every feature of OpenEars (what speech was understood by the engine, whether synthesized speech is in progress, if there was an audio interruption) to any part of your app.

- (NSError *) generateLanguageModelFromArray:	(NSArray *)	languageModelArray
withFilesNamed:	(NSString *)	fileName
forAcousticModelAtPath:	(NSString *)	acousticModelPath

- (NSError *) generateGrammarFromDictionary:	(NSDictionary *)	grammarDictionary
withFilesNamed:	(NSString *)	fileName
forAcousticModelAtPath:	(NSString *)	acousticModelPath

- (NSError *) generateLanguageModelFromTextFile:	(NSString *)	pathToTextFile
withFilesNamed:	(NSString *)	fileName
forAcousticModelAtPath:	(NSString *)	acousticModelPath

- (void) say:	(NSString *)	statement
withVoice:	(OEFliteVoice *)	voiceToUse

- (void) startListeningWithLanguageModelAtPath:	(NSString *)	languageModelPath
dictionaryAtPath:	(NSString *)	dictionaryPath
acousticModelAtPath:	(NSString *)	acousticModelPath
languageModelIsJSGF:	(BOOL)	languageModelIsJSGF

OpenEars: free speech recognition and speech synthesis for the iPhone

Introduction and Installation

Introduction

Features of OpenEars

Installation

Basic concepts

OELanguageModelGenerator Class Reference

Detailed Description

Usage examples

Method Documentation

Property Documentation

OEFliteController Class Reference

Detailed Description

Usage examples

Method Documentation

Property Documentation

OELogging Class Reference

Detailed Description

Method Documentation

OEContinuousModel Class Reference

OEAcousticModel Class Reference

Detailed Description

Method Documentation

OEEventsObserver Class Reference

Detailed Description

Property Documentation

OEPocketsphinxController Class Reference

Detailed Description

Usage examples

Method Documentation

Property Documentation

<OEEventsObserverDelegate> Protocol Reference

Detailed Description

Usage examples

Method Documentation

Just a few of the great apps made with OpenEars™

Help with OpenEars™

- (void) changeLanguageModelToFile:	(NSString *)	languageModelPathAsString
withDictionary:	(NSString *)	dictionaryPathAsString

- (void) runRecognitionOnWavFileAtPath:	(NSString *)	wavPath
usingLanguageModelAtPath:	(NSString *)	languageModelPath
dictionaryAtPath:	(NSString *)	dictionaryPath
acousticModelAtPath:	(NSString *)	acousticModelPath
languageModelIsJSGF:	(BOOL)	languageModelIsJSGF

- (BOOL) setActive:	(BOOL)	active
error:	(NSError **)	outError