OpenEars reduces sound playback quality

Home Forums OpenEars OpenEars reduces sound playback quality

Viewing 32 posts - 1 through 32 (of 32 total)

  • Author
    Posts
  • #10943
    hohl
    Participant

    When using OpenEars the sound playback quality gets reduced. Is there a way to prevent the AVAudioSession reconfiguring when using the PocketsphinxController the first time?

    #10944
    Halle Winkler
    Politepix

    Can you describe the reduction more specifically? It shouldn’t be possible for the bitrate or sample rate to be changed so I’m unclear on what aspect of playback is different. You can’t use PocketsphinxController without the audio session settings it needs.

    #10961
    hohl
    Participant

    What I’ve found out when using logging is:
    2012-09-05 12:13:51.599 Autoradio[5729:707] preferredBufferSize is incorrect, we will change it.
    2012-09-05 12:13:51.604 Autoradio[5729:707] PreferredBufferSize is now on the correct setting of 0.128000.
    2012-09-05 12:13:51.609 Autoradio[5729:707] preferredSampleRateCheck is incorrect, we will change it.
    2012-09-05 12:13:51.698 Autoradio[5729:707] preferred hardware sample rate is now on the correct setting of 16000.000000.

    May this result in reduction?
    It’s hard to describe, maybe because I am not a musician. I would say everything sounds more dull. Thought of a lowering of the bitrate?

    #10962
    hohl
    Participant

    I extended the logging a bit and recompiled the lib. That’s what I am getting:
    2012-09-05 12:29:57.733 Autoradio[5778:707] preferredBufferSize is incorrect, we will change it. Current value: 0.023000
    2012-09-05 12:29:57.747 Autoradio[5778:707] PreferredBufferSize is now on the correct setting of 0.128000.
    2012-09-05 12:29:57.755 Autoradio[5778:707] preferredSampleRateCheck is incorrect, we will change it. Current value: 44100.000000
    2012-09-05 12:29:57.945 Autoradio[5778:707] preferred hardware sample rate is now on the correct setting of 16000.000000.

    Sounds like a reduction of hardware sample rate? May I am able to change the check to something like if it is the prefereded kSamplesPerSecond or better or will this block the functionality of OpenEars?

    #10963
    hohl
    Participant

    Changed it to:
    if (fabs(preferredSampleRateCheck - kSamplesPerSecond) 0.0) { in AudioSessionManager.m:400 and it still works and the reduction doesn’t take place anymore.

    #10965
    Halle Winkler
    Politepix

    Ah, I understand now, you’re using a full 44.1k rate and PocketsphinxController requires (really requires) a 16k rate. If you convince it not to sample at 16k you will reduce the recognition accuracy severely. You’re correct that 16k recordings won’t sound as nice as 44.1k (CD quality) but if Pocketsphinx analyzed a 44.1k recording it would take forever.

    #10966
    hohl
    Participant

    Something is wrong with the code tag in this forum so I uploaded the change to line 400 in AudioSessionManager.m here: https://www.sourcedrop.net/Tyj72cb2147c9

    Will this have an influence on OpenEars?

    #10967
    Halle Winkler
    Politepix

    Why do you need to make a CD-quality recording using the same stream that Pocketsphinx is using?

    #10968
    hohl
    Participant

    Ah ok. I understand. But since it still works with the small dictionary I am using I’ll let it like that.

    #10969
    Halle Winkler
    Politepix

    Will this have an influence on OpenEars?

    Yup, see my answer that slipped in ahead of your last post.

    #10970
    Halle Winkler
    Politepix

    OK, that’s your call. I think that the perceived speech as far as pocketsphinx is concerned will seem quite different but I’ve also had the experience that it does perform the recognition after all, but there is a big loss of accuracy. For a small vocabulary it’s true that you might find it tolerable regardless so I’m glad to hear it works all right for your application. Do me a favor and mention your override in future support questions so that I can distinguish between potential issues that are normal and potential issues which could be a side-effect of your change.

    #10971
    Halle Winkler
    Politepix

    Just for some background on why it’s like this, for speech perception purposes there isn’t a big improvement in perception for higher sampling rates than 16k (and mono), which means that most speech recognition software will attempt recognition with a maximum of a 16k sample rate because it means there are far fewer samples that have to be analyzed. For non-speech applications such as music it’s naturally always going to be be better to use a higher sample rate and stereo if possible. But generally, even for speech that humans listen to, you also don’t get a lot of extra “bang for the buck” for going from 16k to 44.1k because the comparison standard is telephone bandwidth, which is generally standardized at 8k and compressed, making 16k PCM already a big step up. The reason that the recognition is compromised is that it assumes that a “chunk” of speech is likely to occur within a certain number of samples in a timeframe, and it’s more like 3x the samples in which the speech is occurring, so it is really not going to map well to the recordings which are in the acoustic model (which are actually 8k but the input functions compensate for the doubling of the input rate)

    #10972
    hohl
    Participant

    But I need high quality playback since my application is a media player and 16k isn’t acceptable for that kind of application. Why does OpenEars needs to change the global playback quality?

    #10973
    Halle Winkler
    Politepix

    It isn’t actively changing the sample rate for playback, it is using the required recording and playback audio session type with a 16k record rate, which might override the playback rate as an unintended side effect. It’s actually a bit surprising to me that the playback rate of a media object is being affected at all, can you show me your object playback code as a test sample so I can replicate and look into it when there is time?

    #10974
    hohl
    Participant
    #10975
    Halle Winkler
    Politepix

    Right, but you must be using it during the recognition activity because otherwise the AVPlayer audio session would completely override the OpenEars audio session, so my interest is in how you are using it so it is possible for its playback settings to conflict with those of the OpenEars audio session.

    #10976
    hohl
    Participant

    Are you looking for this?
    NSError *audioSessionError = nil;
    [[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryPlayAndRecord error:&audioSessionError];
    [[AVAudioSession sharedInstance] setActive:YES error:&audioSessionError];
    if (audioSessionError != nil) {
    NSLog(@"Something went wrong with initialising the audio session!");
    }

    AudioSessionSetActive(true);
    AudioSessionAddPropertyListener(kAudioSessionProperty_AudioRouteChange, ARAudioSessionPropertyListener, nil);

    AVPlayer is just played and OpenEars session starts when triggered by the user. AVPlayer still plays in background, but I’ll going to make the volume of it lower during OpenEars session in future to provide better results.

    #10977
    Halle Winkler
    Politepix

    Why are you doing your own Audio Session management (serious question, maybe there is a good reason for it despite it being in conflict with the OpenEars instructions)?

    #1017602
    markmakingmusic
    Participant

    Is there a fix or workaround for this? Our app is a streaming music player, which uses Open Ears to detect speech, which is then used to trigger commands for the player (play, stop, pause, next, etc.).

    We need playback to be set at 44.100 (which we set when our audio session is set up). However, there is an extreme degradation in quality when we enable Open Ears speech detection. Any ideas?

    #1017603
    Halle Winkler
    Politepix

    Welcome Mark,

    The issue is that PocketsphinxController can’t perform speech recognition on a sample rate other that 16k, so your choice is between 16k playback with less sound resolution or 44.1k recognition with lower-accuracy recognition and other potential problems such as buffer overruns. Option 3 is to separate the two functions and set the session as needed when switching between them, which doesn’t sound like it’s available to you in your usage case.

    The best bet for a permanent workaround would be to put some research into how to change the audio driver so that it mixes its own input requirements with external output requirements instead of using its own output settings. I can also do this when there is time, but there are a number of things ahead of that feature at the moment so it will be a bit. Have you tried setting the AudioSessionManager allowMixing property to TRUE? A quick search of the forums should explain more about that.

    #1020046
    fdim
    Participant

    I am bringing this topic back from the dead. I am running a fairly simple experiment. I don’t do any music manipulation in my app. I use the device’s iPod app. When I start openears by default it takes over the audio session and the music stops.

    If I set “audioSessionMixing” property to “YES” then the audio is indeed mixed but with the above problem of dropping the sound quality probably due to dropping of the sample rate. Questions:

    – Is there any way to set a different sample rate between recording and playback
    – Is it possible to hold two different (channels, streams not sure about the terminology) that each one handle audio with a different sample rate?

    #1020047
    Halle Winkler
    Politepix

    Sorry, neither of those things are currently possible. The RecordAndPlay audio session does actually force the playback stream to be enabled and to be the same sample rate as the record stream. I’ve put in a large period of research very recently trying to see if there is any way to decouple them in the driver so that OpenEars could use the mic stream without having any effect on playback, and I had no success and found no reports of other successful experiments with this. I’m probably not 100% done experimenting with this since I would also like to release the playback settings entirely, but the last round made no headway after a lot of investigation and it will be a few months at the earliest before it’s possible to delve into it again.

    The sample rate that the driver sets for the mic stream is the only one that can be used with acceptable recognition results.

    #1020071
    Halle Winkler
    Politepix

    Quick question, since it seems to be causing unexpectedly good results in iOS7 in other playback-related areas that had issues due to the audio session requirements in previous versions: have you tried setting PocketsphinxController’s audioMode setting to @”VoiceChat” to see if that helps at all with playback sample rates?

    #1020076
    fdim
    Participant

    Yes I have. From what I can hear @”VoiceChat” only differs in volume compared to @”Default”. Quality is unfortunately the same.

    After a quick look I had I think that a possible way to solve this is to leave the default sample rate intact (44.1k) and then use an audio converter audio unit that will downsample the recording input realtime to 16k and feed it into openears. Have you been down that road?

    #1020127
    fdim
    Participant

    Sorry to bug you again on this. I am willing to tamper with the code to try and fix this myself in the way suggested in my previous post. I just want to know if you ‘ve ever tried it that way and it failed so as to give up on it.

    #1020129
    Halle Winkler
    Politepix

    Hi,

    No, I have more constraints there. It would have to take in and convert any sample rate and any other characteristics that might come in (such as interleaved, compressed, stereo, vbr, etc), test across all those kinds of stuff passing through, and without having any effect on performance or overhead including on old phones and including in RapidEars. From my perspective the way to fix that annoyance in the long term is to figure out how to successfully decouple the input callback from the output callback because that decreases the overall complexity and the complexity in this particular case, but that seems to be sensitive core audio code and extremely underdocumented, so I haven’t gotten anywhere with it yet if not for a lack of trying recently.

    This is a little outside of the scope of support that I offer because it’s getting pretty low-level and questions in this area usually lead to new questions (which is reasonable and there’s absolutely nothing wrong with it, but there aren’t necessarily the resources on my end to talk it through a lot). I completely understand that it’s important to your spec and I’m sorry I can’t support that feature right now.

    #1022655
    dandoen
    Participant

    Halle,

    Love your work.

    I’m having the same issue as above and am wondering if you’ve gotten any further with this?

    Thanks,
    Dan

    #1022658
    Halle Winkler
    Politepix

    Hi Dan,

    Thank you! Yes, this is under development right now.

    #1022660
    dandoen
    Participant

    Thanks, great to hear.
    Any news on when you’re planning to release next update?

    Also, want to try out what the user Hohl (in this thread) suggested (increasing the bitrate) as I have a super small vocabulary. Where can I find the source so I alter and build myself?

    #1022661
    Halle Winkler
    Politepix

    Any news on when you’re planning to release next update?

    When it’s ready ;) . It’s the only thing I’m working on, so when all the old stuff works and the new stuff works I will be very happy to release it, but I think if I gave a release date I’d probably end up making a liar of myself.

    Also, want to try out what the user Hohl (in this thread) suggested (increasing the bitrate) as I have a super small vocabulary. Where can I find the source so I alter and build myself?

    I really question whether this could work acceptably as a user experience but there is no harm in experimenting. The source is all in the distribution you downloaded, just open up OpenEars.xcodeproj.

    #1022662
    dandoen
    Participant

    Totally get it re: release date ;)
    And again, thanks. I’ll do some testing and see if it’s acceptable for my use case.

    #1022666
    Halle Winkler
    Politepix

    Thanks for your understanding!

Viewing 32 posts - 1 through 32 (of 32 total)
  • You must be logged in to reply to this topic.