AirPlay problem, sample rate?

Home Forums OpenEars AirPlay problem, sample rate?

Tagged: 

Viewing 10 posts - 1 through 10 (of 10 total)

  • Author
    Posts
  • #1017012
    ransomweaver
    Participant

    Hello,

    My app is using OpenEars to create WAV files that are stored and played back later. My playback system uses MPMusicPlayerController to play iPod music, and an OpenAL soundengine to play my custom wav files, with the app ducking the volume on the music, pausing the player, playing the wav files, then restarting the music. The OpenAL soundengine uses kAudioSessionCategory_MediaPlayback in order to play sounds in the background.

    All this works well, except when sending the audio over AirPlay. The music sounds fine, but the OpenEars generated wav files are horribly degraded.

    My feeling here is the problem is the sample rate mismatch between the 44.1khz music and the 16khz tts file.  Even more damning, my app used to use wav files generated by Festival text2wave and downloaded from a server, and those files I made at 44.1 and worked fine with airplay.

    So my question is: can I change Flite to create 44.1khz speech? I see in RuntimeValues.m there is input_sample_rate = 16000. I haven’t tried it changed to 44100, but I’m suspicious that it wouldn’t work and that only 8k and 16k are supported in flite.

    Alternatively, does anyone know of a method to just upsample a 16k wav to 44.1? If It can at least not degrade the sample I’d be satisfied with that.

    #1017013
    Halle Winkler
    Politepix

    Welcome,

    You definitely can’t change the output rate of the Flite speech. The runtime value you’re referencing is for Pocketsphinx, unfortunately, and it is for yet-unreleased features.

    #1017014
    ransomweaver
    Participant

    Hi,

    Thought as much. Actually I am doing an amplification of the file (using a method from a stack overflow thread you directed me to some time ago. In that method (which uses ExtAudioFile api) it reads a wav file, including the sample rate, then sets a AudioStreamBasicDescription object with the sample rate to indicate the format for the returned samples.  If I set that to 44100, the audio will actually sound fine on Airplay BUT the file is not complete; there is less than 1/2 the audio.

    i wonder if I could add in an interpolation into this method to raise the samples up to the number that should be in a 44.1khz file.

    Here’s the full method. Probably it involves doing this in the loop over the buffers. Any thoughts?

    <code>

    void ScaleAudioFileAmplitude(NSURL *theURL, float ampScale) {

    OSStatus err = noErr;

    ExtAudioFileRef audiofile;

    ExtAudioFileOpenURL((CFURLRef)theURL, &audiofile);

    assert(audiofile);

    // get some info about the file’s format.

    AudioStreamBasicDescription fileFormat;

    UInt32 size = sizeof(fileFormat);

    err = ExtAudioFileGetProperty(audiofile, kExtAudioFileProperty_FileDataFormat, &size, &fileFormat);

    // we’ll need to know what type of file it is later when we write

    AudioFileID aFile;

    size = sizeof(aFile);

    err = ExtAudioFileGetProperty(audiofile, kExtAudioFileProperty_AudioFile, &size, &aFile);

    AudioFileTypeID fileType;

    size = sizeof(fileType);

    err = AudioFileGetProperty(aFile, kAudioFilePropertyFileFormat, &size, &fileType);

    // tell the ExtAudioFile API what format we want samples back in

    AudioStreamBasicDescription clientFormat;

    bzero(&clientFormat, sizeof(clientFormat));

    clientFormat.mChannelsPerFrame = fileFormat.mChannelsPerFrame;

    clientFormat.mBytesPerFrame = 4;

    clientFormat.mBytesPerPacket = clientFormat.mBytesPerFrame;

    clientFormat.mFramesPerPacket = 1;

    clientFormat.mBitsPerChannel = 32;

    clientFormat.mFormatID = kAudioFormatLinearPCM;

    clientFormat.mSampleRate = fileFormat.mSampleRate;

    NSLog(@”Sample Rate is %1.2f”,clientFormat.mSampleRate);

    clientFormat.mFormatFlags = kLinearPCMFormatFlagIsFloat | kAudioFormatFlagIsNonInterleaved;

    err = ExtAudioFileSetProperty(audiofile, kExtAudioFileProperty_ClientDataFormat, sizeof(clientFormat), &clientFormat);

    // find out how many frames we need to read

    SInt64 numFrames = 0;

    size = sizeof(numFrames);

    err = ExtAudioFileGetProperty(audiofile, kExtAudioFileProperty_FileLengthFrames, &size, &numFrames);

    // create the buffers for reading in data

    AudioBufferList *bufferList = malloc(sizeof(AudioBufferList) + sizeof(AudioBuffer) * (clientFormat.mChannelsPerFrame – 1));

    bufferList->mNumberBuffers = clientFormat.mChannelsPerFrame;

    for (int ii=0; ii < bufferList->mNumberBuffers; ++ii) {

    bufferList->mBuffers[ii].mDataByteSize = sizeof(float) * numFrames;

    bufferList->mBuffers[ii].mNumberChannels = 1;

    bufferList->mBuffers[ii].mData = malloc(bufferList->mBuffers[ii].mDataByteSize);

    }

    // read in the data

    UInt32 rFrames = (UInt32)numFrames;

    err = ExtAudioFileRead(audiofile, &rFrames, bufferList);

    // close the file

    err = ExtAudioFileDispose(audiofile);

    // process the audio

    for (int ii=0; ii < bufferList->mNumberBuffers; ++ii) {

    float *fBuf = (float *)bufferList->mBuffers[ii].mData;

    for (int jj=0; jj < rFrames; ++jj) {

    *fBuf = *fBuf * ampScale;

    fBuf++;

    }

    }

    // open the file for writing

    err = ExtAudioFileCreateWithURL((CFURLRef)theURL, fileType, &fileFormat, NULL, kAudioFileFlags_EraseFile, &audiofile);

    // tell the ExtAudioFile API what format we’ll be sending samples in

    err = ExtAudioFileSetProperty(audiofile, kExtAudioFileProperty_ClientDataFormat, sizeof(clientFormat), &clientFormat);

    // write the data

    err = ExtAudioFileWrite(audiofile, rFrames, bufferList);

    // close the file

    ExtAudioFileDispose(audiofile);

    // destroy the buffers

    for (int ii=0; ii < bufferList->mNumberBuffers; ++ii) {

    free(bufferList->mBuffers[ii].mData);

    }

    free(bufferList);

    bufferList = NULL;

    }

    </code>

     

    #1017015
    Halle Winkler
    Politepix

    I don’t have advice on this off the top of my head, but interpolating 16kHz to 44.1kHz is the kind of requirement that would probably make me wonder if the situation had become overly complicated in general. Maybe there’s something simpler than combining all of those different technologies?

    #1017016
    ransomweaver
    Participant
    #1017017
    ransomweaver
    Participant

    Well, I have two basic requirements;

    1) the app can play iPod music

    2) the app can play iPod music and the app’s own wav files in the background.

    And the only problem I have right now is that AirPlay doesn’t like an audio stream with 44.1 and 16k audio in it at the same time.

    I’m not sure how I would go about fixing that, even with a completely different way of playing audio files, without changing the hertz of one or the other of the kinds of files I’m playing.

    #1017018
    Halle Winkler
    Politepix

    I guess I’d be curious about what AirPlay does when it gets songs with different sampling rates, since it’s normal for both 44.1k and 48k to be found in audio libraries.

    #1017019
    ransomweaver
    Participant

    I will look into that. Maybe the problem is my kind of audio session (mediaPlayback) but i need that to keep it alive in the background.

    #1017020
    Halle Winkler
    Politepix

    My instinct is that nothing should really be objecting to sample rate changes per se, since the whole deal with playing back a formatted file rather than a buffer stream is that the required data is encapsulated by the header so that a qualified player can deal with differences in file details such as sample rate, maybe bitrate, endianness, codec, whatever. This will also be true of other mediaplayer type objects such as videos. So I might be suspicious of other implementation details besides the sample rates of the files played.

    I don’t really have insight into what this particular issue is and you might be 100% right that the most direct fix is to change the sample rate, I’m just sharing what my thought process on it would be if it were my implementation to debug.

    #1017021
    Halle Winkler
    Politepix

    Oh, and here’s a hint that just occurred to me: remember that you have control of the voice’s speed and pitch, so you can use naive methods of sample rate changing that change perceived speed and pitch and then compensate for it in the original voice settings, to a certain extent at least. That might get you as far as successfully resampling to a rate that fits into 44.1 better than 16k.

Viewing 10 posts - 1 through 10 (of 10 total)
  • You must be logged in to reply to this topic.