AirPlay problem, sample rate?

This topic has 9 replies, 2 voices, and was last updated 11 years ago by Halle Winkler.

Viewing 10 posts - 1 through 10 (of 10 total)

Advertisement: “Rejecto is a plugin for OpenEars™ and RapidEars that lets you ignore speech that isn't in your vocabulary!”

Author

Posts
April 20, 2013 at 8:01 pm #1017012

ransomweaver
Participant

Hello,

My app is using OpenEars to create WAV files that are stored and played back later. My playback system uses MPMusicPlayerController to play iPod music, and an OpenAL soundengine to play my custom wav files, with the app ducking the volume on the music, pausing the player, playing the wav files, then restarting the music. The OpenAL soundengine uses kAudioSessionCategory_MediaPlayback in order to play sounds in the background.

All this works well, except when sending the audio over AirPlay. The music sounds fine, but the OpenEars generated wav files are horribly degraded.

My feeling here is the problem is the sample rate mismatch between the 44.1khz music and the 16khz tts file. Even more damning, my app used to use wav files generated by Festival text2wave and downloaded from a server, and those files I made at 44.1 and worked fine with airplay.

So my question is: can I change Flite to create 44.1khz speech? I see in RuntimeValues.m there is input_sample_rate = 16000. I haven’t tried it changed to 44100, but I’m suspicious that it wouldn’t work and that only 8k and 16k are supported in flite.

Alternatively, does anyone know of a method to just upsample a 16k wav to 44.1? If It can at least not degrade the sample I’d be satisfied with that.

April 20, 2013 at 8:10 pm #1017013

Halle Winkler
Politepix

Welcome,

You definitely can’t change the output rate of the Flite speech. The runtime value you’re referencing is for Pocketsphinx, unfortunately, and it is for yet-unreleased features.

April 20, 2013 at 9:13 pm #1017014

ransomweaver
Participant

Hi,

Thought as much. Actually I am doing an amplification of the file (using a method from a stack overflow thread you directed me to some time ago. In that method (which uses ExtAudioFile api) it reads a wav file, including the sample rate, then sets a AudioStreamBasicDescription object with the sample rate to indicate the format for the returned samples. If I set that to 44100, the audio will actually sound fine on Airplay BUT the file is not complete; there is less than 1/2 the audio.

i wonder if I could add in an interpolation into this method to raise the samples up to the number that should be in a 44.1khz file.

Here’s the full method. Probably it involves doing this in the loop over the buffers. Any thoughts?

<code>

void ScaleAudioFileAmplitude(NSURL *theURL, float ampScale) {

OSStatus err = noErr;

ExtAudioFileRef audiofile;

ExtAudioFileOpenURL((CFURLRef)theURL, &audiofile);

assert(audiofile);

// get some info about the file’s format.

AudioStreamBasicDescription fileFormat;

UInt32 size = sizeof(fileFormat);

err = ExtAudioFileGetProperty(audiofile, kExtAudioFileProperty_FileDataFormat, &size, &fileFormat);

// we’ll need to know what type of file it is later when we write

AudioFileID aFile;

size = sizeof(aFile);

err = ExtAudioFileGetProperty(audiofile, kExtAudioFileProperty_AudioFile, &size, &aFile);

AudioFileTypeID fileType;

size = sizeof(fileType);

err = AudioFileGetProperty(aFile, kAudioFilePropertyFileFormat, &size, &fileType);

// tell the ExtAudioFile API what format we want samples back in

AudioStreamBasicDescription clientFormat;

bzero(&clientFormat, sizeof(clientFormat));

clientFormat.mChannelsPerFrame = fileFormat.mChannelsPerFrame;

clientFormat.mBytesPerFrame = 4;

clientFormat.mBytesPerPacket = clientFormat.mBytesPerFrame;

clientFormat.mFramesPerPacket = 1;

clientFormat.mBitsPerChannel = 32;

clientFormat.mFormatID = kAudioFormatLinearPCM;

clientFormat.mSampleRate = fileFormat.mSampleRate;

NSLog(@”Sample Rate is %1.2f”,clientFormat.mSampleRate);

clientFormat.mFormatFlags = kLinearPCMFormatFlagIsFloat | kAudioFormatFlagIsNonInterleaved;

err = ExtAudioFileSetProperty(audiofile, kExtAudioFileProperty_ClientDataFormat, sizeof(clientFormat), &clientFormat);

// find out how many frames we need to read

SInt64 numFrames = 0;

size = sizeof(numFrames);

err = ExtAudioFileGetProperty(audiofile, kExtAudioFileProperty_FileLengthFrames, &size, &numFrames);

// create the buffers for reading in data

AudioBufferList *bufferList = malloc(sizeof(AudioBufferList) + sizeof(AudioBuffer) * (clientFormat.mChannelsPerFrame – 1));

bufferList->mNumberBuffers = clientFormat.mChannelsPerFrame;

for (int ii=0; ii < bufferList->mNumberBuffers; ++ii) {

bufferList->mBuffers[ii].mDataByteSize = sizeof(float) * numFrames;

bufferList->mBuffers[ii].mNumberChannels = 1;

bufferList->mBuffers[ii].mData = malloc(bufferList->mBuffers[ii].mDataByteSize);

}

// read in the data

UInt32 rFrames = (UInt32)numFrames;

err = ExtAudioFileRead(audiofile, &rFrames, bufferList);

// close the file

err = ExtAudioFileDispose(audiofile);

// process the audio

for (int ii=0; ii < bufferList->mNumberBuffers; ++ii) {

float *fBuf = (float *)bufferList->mBuffers[ii].mData;

for (int jj=0; jj < rFrames; ++jj) {

*fBuf = *fBuf * ampScale;

fBuf++;

}

}

// open the file for writing

err = ExtAudioFileCreateWithURL((CFURLRef)theURL, fileType, &fileFormat, NULL, kAudioFileFlags_EraseFile, &audiofile);

// tell the ExtAudioFile API what format we’ll be sending samples in

err = ExtAudioFileSetProperty(audiofile, kExtAudioFileProperty_ClientDataFormat, sizeof(clientFormat), &clientFormat);

// write the data

err = ExtAudioFileWrite(audiofile, rFrames, bufferList);

// close the file

ExtAudioFileDispose(audiofile);

// destroy the buffers

for (int ii=0; ii < bufferList->mNumberBuffers; ++ii) {

free(bufferList->mBuffers[ii].mData);

}

free(bufferList);

bufferList = NULL;

}

</code>

April 20, 2013 at 9:29 pm #1017015

Halle Winkler
Politepix

I don’t have advice on this off the top of my head, but interpolating 16kHz to 44.1kHz is the kind of requirement that would probably make me wonder if the situation had become overly complicated in general. Maybe there’s something simpler than combining all of those different technologies?

April 20, 2013 at 9:30 pm #1017016

ransomweaver
Participant

No doubt I could use one of these:

http://www.mega-nerd.com/SRC/api.html

https://github.com/timmartin/libfooid/tree/master/libresample

April 20, 2013 at 9:35 pm #1017017

ransomweaver
Participant

Well, I have two basic requirements;

1) the app can play iPod music

2) the app can play iPod music and the app’s own wav files in the background.

And the only problem I have right now is that AirPlay doesn’t like an audio stream with 44.1 and 16k audio in it at the same time.

I’m not sure how I would go about fixing that, even with a completely different way of playing audio files, without changing the hertz of one or the other of the kinds of files I’m playing.

April 20, 2013 at 9:40 pm #1017018

Halle Winkler
Politepix

I guess I’d be curious about what AirPlay does when it gets songs with different sampling rates, since it’s normal for both 44.1k and 48k to be found in audio libraries.

April 20, 2013 at 9:42 pm #1017019

ransomweaver
Participant

I will look into that. Maybe the problem is my kind of audio session (mediaPlayback) but i need that to keep it alive in the background.

April 20, 2013 at 9:53 pm #1017020

Halle Winkler
Politepix

My instinct is that nothing should really be objecting to sample rate changes per se, since the whole deal with playing back a formatted file rather than a buffer stream is that the required data is encapsulated by the header so that a qualified player can deal with differences in file details such as sample rate, maybe bitrate, endianness, codec, whatever. This will also be true of other mediaplayer type objects such as videos. So I might be suspicious of other implementation details besides the sample rates of the files played.

I don’t really have insight into what this particular issue is and you might be 100% right that the most direct fix is to change the sample rate, I’m just sharing what my thought process on it would be if it were my implementation to debug.

April 20, 2013 at 10:13 pm #1017021

Halle Winkler
Politepix

Oh, and here’s a hint that just occurred to me: remember that you have control of the voice’s speed and pitch, so you can use naive methods of sample rate changing that change perceived speed and pitch and then compensate for it in the original voice settings, to a certain extent at least. That might get you as far as successfully resampling to a rate that fits into 44.1 better than 16k.
Author

Posts

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic.