SpeexKit: an iOS Speex audio compression codec framework for your app, from Politepix

SpeexKit is an iOS framework for adding Speex audio compression codec encoding and decoding to your app, easily. Download the SpeexKit demo framework or buy SpeexKit to convert Speex on iOS, easily. Are you looking for speech recognition? Check out OpenEars!


Some info about SpeexKit

What is Speex?
How much does SpeexKit cost?
How do I use SpeexKit?
SpeexKit Classes: SpeexFileEncodingController
SpeexKit Classes: SpeexFileDecodingController
SpeexKit Classes: SpeexNSDataEncodingController
SpeexKit Classes: SpeexNSDataDecodingController
SpeexKit Classes: AudioFileWrapperController
I have some questions, how do I get support?
I’m ready to license the framework, what’s next?

 

What is Speex?

Speex is one of the best codecs for voice audio transmission available, with clear quality and excellent compression characteristics. It is a great match for mobile speech applications given the need for top-notch clarity and careful use of bandwidth. The Speex project is found here and its license will be found at the end of this page. SpeexKit is not a speech recognition framework – if you are looking for a speech recognition framework you probably want OpenEars! SpeexKit is an independent framework and not part of the OpenEars plugin system, since OpenEars works entirely offline and doesn’t require the use of a high-compression audio codec.

What is SpeexKit?

SpeexKit is an iOS framework from Politepix that makes it easy to use every feature of Speex with your iPhone or iPad app without doing your own C or C++ implementation or having to deal with threading or queueing code. It is not the only way of using Speex on the iPhone, but it is designed to be an easy object-oriented way to implement it quickly in your app, even for the tricky stuff. You can download the SpeexKit demo framework here. You have to sign up, but downloading the demo is 100% free, just add it to your cart and check out. You will not have to provide any payment information. The demo is time-limited and cannot be submitted to the app store — when you are ready to license the framework it can be purchased at the Politepix shop. SpeexKit is not a speech recognition framework – if you are looking for a speech recognition framework you probably want OpenEars! SpeexKit is an independent framework and not part of the OpenEars plugin system, since OpenEars works entirely offline and doesn’t require the use of a high-compression audio codec.

What can you do with SpeexKit?

• Convert entire WAV or raw audio files into Speex files
• Convert entire Speex files into WAV files
• Convert buffers of Speex frames into buffers of raw audio samples, synchronously or asynchronously. In asychronous mode, threading and queuing (since Speex decoding is not realtime on all devices) are handled for you.
• Convert buffers of raw PCM audio samples into buffers of Speex frames, synchronously or asynchronously. In asychronous mode, threading and buffer queuing (since Speex encoding is not realtime on all devices) are handled for you.
• Wrap existing buffers of Speex frames into a written-out Speex file
• Wrap existing buffers of raw PCM audio samples into a written-out WAV file
• Use advanced Speex features such as such as Voice Activity Detection (VAD) and echo reduction

With these features, SpeexKit is intended to be usable for any application of Speex, whether it is conversion and upload of streaming raw audio samples as Speex frames without blocking, or local saving or remote upload of Speex files, or conversion and then playback of incoming Speex frames, or something new.

SpeexKit does not make any assumptions about how you record or play back; it is intended to convert and decode using standard Objective-C objects so that you can easily integrate Speex into any audio implementation that you require. Your own choice of audio implementation aside, the only objects you will need to interact with for implementing SpeexKit’s features are NSString, NSNumber, NSData, NSArray and NSDictionary. SpeexKit does assume that you are working with 8-bit or 16-bit mono audio, since it would be unusual to do mobile speech applications with larger bitrates or with stereo audio.

If it’s object-oriented and easy, does that mean it’s slow?

No, every performance-intensive area of the framework is written in C, just encapsulated in an easy-to-swallow Objective-C coating. We think it’s pretty fast.

How much does SpeexKit cost?

SpeexKit’s price is listed in the Politepix Shop and you can try out a demo before licensing. Politepix will donate 5% of every sale of SpeexKit to the Speex project here. Donations will happen after a 6-month grace period due to requirements of our payment gateway.

SpeexKit doesn’t have any complicated licensing terms; the licensing cost is for exactly one listing on the App Store regardless of number of sales of the app, the size of the purchasing company or entity type, or other factors.

The demo framework only runs for four-minute sessions and can’t be submitted to the App Store since it is unlicensed. Otherwise it has the identical features to the version that is received after completing licensing, so you can confidently find out if it integrates into your app first.

How do I use SpeexKit?

To install SpeexKit, download the demo and drag the framework (found in the folder “framework”) into your app. That’s it.

At this point you can stop and try out the sample app. Please keep in mind that the sample app is not the same as the framework — it requires other audio frameworks which assist its audio implementation, as your app will as well, but these are not requirements of using the framework itself. The sample app has implementation example code for every SpeexKit class as well as a simplified version of the OpenEars RemoteIO/AudioQueue audio driver that will do recording for the test app. You can feel free to base your own audio driver on the driver included and its sample app implementation if you need to do a RemoteIO audio implementation with low latency.

Once you’ve dragged the framework into your own app, you can make use of the classes of SpeexKit in your app. Here is a description of the classes, how to add them, and their properties.

SpeexFileEncodingController

Imported into your class like so:

#import <SpeexKitDemo/SpeexFileEncodingController.h>

SpeexFileEncodingController converts entire WAV or raw audio files into a complete Speex file, synchronously. To use it, you first instantiate it in the usual way with allocation and initialization. Then you can give it an audio file using the following method:

- (void) encodeLocalRawOrWavFileAtPath:(NSString *)localFile intoSpeexFileAtPath:(NSString *)speexFile;

As an example:

[mySpeexFileEncodingController encodeLocalRawOrWavFileAtPath: myNSStringRepresentationOfTheLocalInputFilePath intoSpeexFileAtPath: myNSStringRepresentationOfTheLocalDestinationFilePath];

A complete implementation example is in the sample app. If you want to test out the spx file that is created, VLC can play back spx (you’ll need to get the file off your device using the Xcode Organizer). VLC sometimes skips at both the beginning and the end of local audio file playback, so this is not in and of itself a sign of an issue.

In many cases it isn’t necessary to set any properties of SpeexFileEncodingController since the Speex encoder will auto-detect most settings, but in some cases you might want to set specific encoding settings.

Properties of SpeexFileEncodingController that affect the final file, which are set after initializing the object and before using  encodeLocalRawOrWavFileAtPath:intoSpeexFileAtPath:

NSString *mode; // options are "Narrowband", "Wideband" and "UltraWideband". For 8-bit use Narrowband, for 16-bit use Wideband, for more use UltraWideBand. This is necessary to set.
int quality; // options are 0-10, default is 8. Not necessary to set.
int bitrate; // The maximum bitrate to use. Not necessary to set.
BOOL variableBitrate; // Use VBR encoding. Optional.
int vbrBitrate; // maximum bitrate for vbr. Optional.
int averageBitRate; // If set to a number, enable average bitrate at the described rate, defaults to -1. Optional.
BOOL vad; // Use voice activity detection, defaults to NO, optional.
BOOL dtx; // File-based discontinuous transmission, defaults to NO, optional.
int complexity; // encoding complexity from 0-10, default is 3, not necessary to set.
BOOL denoiseInput; // If TRUE, denoise input first. Defaults to FALSE. Optional.
BOOL useAGC; // Turn on AGC. Optional.
BOOL verbose; // Verbose mode, reports used bitrates, defaults to FALSE, optional.
BOOL verboseSpeexKit; // Quiet mode, suppresses all output, defaults to FALSE, optional.
int sampleRate; // Sample rate for input, not necessary to set unless the encoder has some problem detecting this.
BOOL inputIsStereo; // Force input to be considered stereo, not necessary to set unless the encoder has some problem detecting this.
NSString *endianness; // Endianness of input, options are "LE" and "BE", not necessary to set unless the encoder has some problem detecting this. Unlikely to be needed for an iOS implementation.
int inputBits; // Defaults to -1 which lets the encoder decide. If set to 8, input is taken as 8-bit, if set to 16, input is taken as 16-bit. Not necessary to set unless the decoder has some problem detecting this.
BOOL timeConversion; // Set this to TRUE to get a timing for your conversion. Useful for discovering how to optimize your operation on the device using the different encoding options available.

SpeexFileDecodingController

Imported into your class like so:

#import <SpeexKitDemo/SpeexFileDecodingController.h>

SpeexFileDecodingController converts an entire Speex file into a complete WAV file, synchronously. To use it, you first instantiate it in the usual way with allocation and initialization. Then you can give it a Speex file to convert using the following method:

- (void) decodeLocalSpeexFileAtPath:(NSString *)localSpeexFile intoLocalRawOrWavFileAtPath:(NSString *)decodedFile; // This takes one entire spx file and converts it to a wav or raw file with a header.

As an example:

[mySpeexFileDecodingController decodeLocalSpeexFileAtPath: myNSStringRepresentationOfTheLocalInputFilePath intoLocalRawOrWavFileAtPath: myNSStringRepresentationOfTheLocalDestinationFilePath];

A complete implementation example is in the sample app. If you want to test out the WAV file that is created, AVAudioPlayer can play back WAVs.

SpeexFileDecodingController has no fine-tuning properties since the WAV file will always be composed of PCM samples reflecting the input.

Properties of SpeexFileDecodingController:

BOOL verboseSpeexFileDecodingController;    // Set this TRUE in order to see error output

SpeexNSDataEncodingController

SpeexNSDataEncodingController is for converting lots and lots of buffers of PCM samples into Speex frames on the fly.

Imported into your class like so:

#import <SpeexKitDemo/SpeexNSDataEncodingController.h>

SpeexNSDataEncodingController can be used synchronously (blocking) without using its delegate methods, or asynchronously (multithreaded and internally queued — you do not need to implement these characteristics) via use of its delegate methods. The second approach is ideal for a streaming application.

Before using SpeexNSDataEncodingController it is necessary to set its runtime options after initialization by first setting the properties which will control the encoding, and then using the method:

- (void) setSpeexEncodingOptions;

To register the options and initialize the encoder. This makes the encoding process itself as time-efficient as possible.  Options you can set with setSpeexEncodingOptions are as follows:

NSString *mode; // options are "Narrowband" and "Wideband". For 8-bit PCM input use Narrowband, for 16-bit use Wideband. This is necessary to set.
BOOL denoise; // Attempt to remove noise from the buffer, optional
BOOL dereverb; // Attempt to cancel echos from the buffer, optional
int quality; // options are 0-10, default is 8. Not necessary to set.
BOOL variableBitrate; // Use VBR encoding. Optional.
int vbrBitrate; // If set to a number, enable variable bitrate at the described rate, defaults to -1 meaning do not set. Optional.
int averageBitRate; // If set to a number, enable average bitrate at the described rate, defaults to -1 meaning do not use. Optional.
BOOL vad; // Use voice activity detection, defaults to NO, optional. The Speex frame NSDictionary returned will have an extra NSNumber object @"SpeexVADDetectedSpeech" which will be of the TRUE or FALSE Boolean type of NSNumber.
int complexity; // encoding complexity from 0-10, default is 3, not necessary to set.
int vbrQuality; // Optional, can be set from 0-10.
int sampleRate; // Optional, only set if the encoder doesn't handle your sample rate automatically
BOOL verboseSpeexKit; // Show all error output
BOOL timeEncoding; // Show the timing of the buffer encoding to help you optimize your Speex setting choices for your application.

Having set these options, you can then use the following methods to either encode your NSDatas of PCM audio synchronously or asynchronously:

- (NSArray *) convertNSDataToSpeex:(NSData*)data;

- (void) asynchronouslyConvertNSDataToSpeex:(NSData*)data;

If you convert an NSData of PCM audio using convertNSDataToSpeex: it will return an NSArray of NSDictionaries, in which each NSDictionary has the following objects:

@”SpeexFrameNSData”,
@”SpeexFrameSizeNSNumber”,
@”SpeexLookahead”,

and optionally, if you have chosen to use voice activity detection,

@”SpeexVADDetectedSpeech”

What these dictionary objects represent:

@”SpeexFrameNSData” is an NSData containing exactly one frame of Speex-encoded audio. In Speex terms, a frame has an exact size which we want to maintain in the event that we need to do anything else with it (for instance, upload it to a service that expects distinct frames of Speex, or wrap it later on into a Speex file, or decode it buffer for buffer). A narrowband Speex frame is 320 bytes and a wideband frame is 640 bytes. Therefore, when you send an NSData of speech samples which contains multiple frames worth of Speex, you will get multiple NSDictionaries back in the returned array, each with a single Speex frame in it of the correct size under @”SpeexFrameNSData”.

@”SpeexFrameSizeNSNumber” will be an NSNumber in integer format which will tell you the size of the Speex frame that was created in bytes (not samples or seconds). A narrowband Speex frame is 320 bytes and a wideband frame is 640 bytes. This information is included so that you can do other operations which need this frame size programmatically without otherwise keeping track of it. If you have VAD (voice activity detection) on, this will return very short frames for buffers in which no speech was detected.

@”SpeexLookahead” is not something you need to do anything with unless you are later wrapping all of your Speex frames into a single Speex audio file. That process needs to know what the lookahead was (an internal Speex property that depends on the encoding settings selected) so this NSNumber in integer format is also returned to you, although you can ignore it if you aren’t using your Speex data for anything that requests that you deliver the lookahead value.

@”SpeexVADDetectedSpeech” is an NSNumber in BOOL format which is only present when you are using VAD (voice activity detection). It will be TRUE if speech was detected in the submitted PCM buffer and FALSE if speech was not detected. Please keep in mind that Speex VAD is hacky by the Speex project’s own estimation, so it might not suit your requirements in every application.

Since Speex encoding can be CPU-intensive and will perform differently on the different devices, you might not want to block and instead send your NSDatas of PCM samples to be queued and encoded in the background. In this case you will use the asynchronous method:

- (void) asynchronouslyConvertNSDataToSpeex:(NSData*)data;

And, after adding SpeexNSDataEncodingControllerDelegate to your class header and assigning your class object as the delegate of  SpeexNSDataEncodingController after you instantiate your SpeexNSDataEncodingController object, you can use the following delegate method of  SpeexNSDataEncodingControllerDelegate to receive the same NSArray of NSDictionaries containing the objects above as they are created:

- (void) asynchronousEncoderCreatedSpeexArray:(NSArray *)speexArray;

The sample app has a preprocessor #define that lets you switch between synchronous encoding and asynchronous encoding, so you can look at how those two possibilities work and perform, as well as see the details of setting the delegate, etc. Keep in mind that for asynchronous encoding, you will need to make sure that the SpeexNSDataEncodingController continues to be instantiated in the class you call it from the entire time that you are making use of its delegate methods.

SpeexNSDataDecodingController

SpeexNSDataEncodingController is for decoding lots and lots of buffers of Speex frames into PCM samples on the fly.

Imported into your class like so:

#import <SpeexKitDemo/SpeexNSDataDecodingController.h>

Very similar to SpeexNSDataEncodingController, with a synchronous and asynchronous option. The second approach is ideal for a streaming application. Like SpeexNSDataEncodingController, after init/alloc you will need to set your decoder options and then run :

- (void) setSpeexDecodingOptions;

so that the decoder can run with maximum efficiency once you are submitting multiple rounds of Speex frame NSDatas to it. The options you can use with SpeexNSDataDecodingController are as follows:

NSString *mode; // options are "Narrowband" and "Wideband". You should know what mode the Speex frames were encoded in. If you encoded them yourself, use the same mode setting you encoded them with. If they are being provided to you, the provider will document whether they are narrowband or wideband frames.
BOOL verboseSpeexKit; // Turn on error reporting.

To decode an NSData that contains a Speex frame synchronously, you use:

- (NSData *)decodeSpeexNSData:(NSData*)speexData withSpeexFrameSize:(int)speexFrameSize;

Note that it is necessary to know the frame size of the Speex frame, which you should have either from the SpeexKit encoding process or which should be given to you from the remote provider. Only one frame of Speex data can be submitted to this method per call; sending multiple frames at once in an NSData will result in an error.

This method will block and return an NSData which is the raw PCM sample buffer that was decoded from the submitted Speex frame.

Since Speex decoding can be CPU-intensive and will perform differently on the different devices, you might not want to block and instead send your NSDatas of Speex frames to be automatically queued and decoded in the background. In this case you will use the asynchronous method:

- (void) asynchronouslyDecodeSpeexNSData:(NSData*)speexData withSpeexFrameSize:(int)speexFrameSize;

And, after adding SpeexNSDataDecodingControllerDelegate to your class header and assigning your class object as the delegate of  SpeexNSDataDecodingController after you instantiate your SpeexNSDataDecodingController object, you can use the following delegate method of  SpeexNSDataDecodingControllerDelegate to receive the same NSArray of NSDictionaries containing the objects above as they are created:

- (void) asynchronousDecoderCreatedPCMData:(NSData *)pcmData;

The sample app has a preprocessor #define that lets you switch between synchronous decoding and asynchronous decoding, so you can look at how those two possibilities work and perform, as well as see the details of setting the delegate, etc. Keep in mind that for asynchronous decoding, you will need to make sure that the SpeexNSDataDecodingController continues to be instantiated in the class you call it from the entire time that you are making use of its delegate methods.

AudioFileWrapperController

Imported into your class like so:

#import <SpeexKitDemo/AudioFileWrapperController.h>

AudioFileWrapperController writes out audio files from either a single NSData containing PCM samples (as WAV) or an NSArray of NSDictionaries containing Speex frames (as .spx, or Speex file format).

To write out a WAV, you use the method:

- (NSError *) writeWavFileFromMonoPCMData:(NSData *)data withSampleRate:(int)sampleRate andBitsPerChannel:(int)bitRate toFileLocation:(NSString *)fileLocation;

Giving it an NSData that consists of all the PCM audio samples to write out. You must set the sample rate and the bit rate and give it a file location destination to write out to, as an NSString path representation. It only expects mono audio.

To write out a Speex file from an NSArray of encoded Speex frames, you must first have an NSArray that is full of NSDictionaries with the objects and object keys that are created by SpeexNSDataEncodingController. They should have all been originally encoded with the same settings. That means one NSArray containing NSDictionaries (only), where each NSDictionary has the following objects and keys:

@”SpeexFrameNSData” // One frame of encoded Speex as NSData

@”SpeexFrameSizeNSNumber” // The size of that frame in bytes

You submit it for wrapping as a file using the following method:

- (NSError *) writeSpeexFileFromArrayOfSpeexDictionaries:(NSArray *)speexArray inSpeexMode:(NSString *)speexMode toFileLocation:(NSString *)fileLocation;

You should know the speexMode from either the provider of the Speex frame or your own chosen encoding options.

There are complete examples of wrapping PCM samples into a WAV and wrapping Speex frames into a .spx file in the sample app.

I have some questions, how do I get support?

You can have one free email support incident with the demo version of SpeexKit, and as many questions on the forums as you like. To use your free email support incident, you must register your app and a verifiable name (company name or personal name) here.

You can also send as many sales inquiries as you like through the contact form, and you don’t need to register in order to do so, although a sales inquiry with a tech support question will be considered a support incident and we’ll ask you to register in order to have it engaged.

Once you have completed licensing of the framework for your app, you get two more email support incidents and continued forum support. Extra email support incidents for demo and licensed versions can always be purchased at the Politepix shop. Support contracts for multiple email support incidents with Politepix can also be purchased. Licensing the framework requires giving the exact application name that the framework will be linked to, so don’t purchase the license until you know the app name.

I’m ready to license the framework, where do I go?

That’s great! Thanks. If you already know the name of the app you are going to link the framework to you can complete licensing at the shop.

Speex License

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

Neither the name of the Xiph.org Foundation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

SpeexKit makes use of the Ogg wrapper code.

Ogg License

© 2012, Xiph.Org Foundation

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

Neither the name of the Xiph.org Foundation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

OpenEars Plugins


RapidEars

RapidEars is a paid plugin for OpenEars that lets you perform live recognition on in-progress speech for times that you can't wait for the user to pause! Try out the RapidEars demo free of charge.

Rejecto

Rejecto is a paid plugin for OpenEars that improves accuracy and UX by letting OpenEars ignore utterances of words that aren't in its vocabulary. Try out the Rejecto demo free of charge.

RuleORama

Did you know that the free version of OpenEars can perform recognition of fixed phrases using rules-based grammars? And RuleORama is a paid plugin that lets you use the same grammar format as stock OpenEars, but the grammars are fast enough to work with RapidEars. Try out the RuleORama demo free of charge.

NeatSpeech

NeatSpeech is a plugin for OpenEars that lets it do fast, high-quality offline speech synthesis which is compatible with iOS5 and iOS6, and even lets you edit the pronunciations of words! Try out the NeatSpeech demo free of charge.


Learn more about the OpenEars Plugin Platform

 

Help with OpenEars

There is free public support for OpenEars in the OpenEars Forums, and you can also purchase private email support incidents at the Politepix Shop. Most OpenEars questions are answered in the OpenEars support FAQ.