Halle Winkler

Forum Replies Created

Viewing 100 posts - 2,001 through 2,100 (of 2,164 total)

  • Author
    Posts
  • in reply to: Changing voice level of flitecontroller #11450
    Halle Winkler
    Politepix

    There are a lot of things here that are an issue. updateLevelsUI in the sample app is just intended to be a way to show how you can read the read-only levels property on a separate thread so it doesn’t block. It is called many times a second. That means that your slider/volume code is being continuously hit whether it is being interacted with or not as the method just tries to read the read-only flite level property and display it in the UI, and it also means that your volume is being changed from the thread that the AVAudioPlayer it addresses is definitely not on. I would first totally uncouple your volume changing/volume slider code from the UI updating example that is in the sample app. You should be able to change the volume and update your UI whose purpose is changing the volume on mainThread in a normal method which returns IBAction and which only is concerned with letting the user change the volume. The updateUI method can continue to just handle reading the level and displaying it in the UI.

    If the “Welcome to OpenEars” statement continues to be played at unexpected times after that, that probably means that something in the code is causing an interruption that causes the entire listening loop to be reset, resulting in pocketsphinxDidCompleteCalibration being called, which in the sample app results in “Welcome to OpenEars” being spoken.

    in reply to: OpenEars intigartion with camera control #11449
    Halle Winkler
    Politepix

    Hi,

    I don’t think that logging has verbosePocketSphinx enabled. If it does, that means that your app has an issue that is blocking before [self.pocketsphinxController startListeningWithLanguageModelAtPath:lmPath dictionaryAtPath:dicPath languageModelIsJSGF:NO];, not during it, since the logging would show it starting but stopping somewhere, but this logging shows nothing that happens after the language model is generated. I recommended showing the relationship between your picker code and the OpenEars code before — without that or the output with verbosePocketsphinx there’s no way to know what is happening since the code above is the code which works in the sample app.

    in reply to: OpenEars intigartion with camera control #11423
    Halle Winkler
    Politepix

    What is the relationship between the picker code and the code above? Probably it’s not the case that startListeningWithLanguageModelAtPath: doesn’t trigger but rather that it gets to a certain point in the loop and has trouble. If you turn on verbosePocketSphinx and OpenEarsLogging the output will probably tell you a lot about the reason that startListeningWithLanguageModelAtPath: isn’t getting good results. You can search the log output for the words “error” or “warning” specifically or you can post it here (but please make sure both forms of logging have been turned on first so I can really see everything that is happening).

    in reply to: OpenEars intigartion with camera control #11420
    Halle Winkler
    Politepix

    Hi Jay,

    This question is a little too broad for the forum, sorry. Feel free to ask specific questions about your code.

    in reply to: Optimization for short utterances #11371
    Halle Winkler
    Politepix

    I’ve fixed the NSLog statement for the next version so the sample app doesn’t create confusion about the framework behavior and updated the online documentation and tutorial.

    in reply to: TTS say phonemes #11369
    Halle Winkler
    Politepix

    Take a look at the GraphemeGenerator.h source further, it is there for getting phonemes out of words. That’s really all the help I can provide on this one, sorry.

    in reply to: TTS say phonemes #11368
    Halle Winkler
    Politepix

    Hi Hubbo,

    Not sure what the issue is, but in my experience the phonemes-only speech is harder to understand and more unpleasant to listen to than the basic speech so I don’t recommend bothering. Definitely do your experiments using a better voice than KAL since there’s no way of knowing how much of its comprehensibility comes from the features that are removed when doing phoneme-only speech such as all variance and inflection.

    in reply to: Optimization for short utterances #11367
    Halle Winkler
    Politepix

    The log always says “a second of silence” because that’s just what an NSLog statement says in the sample app. It isn’t related to the functionality of the property secondsOfSilenceToDetect and the log statement doesn’t come from the framework.

    secondsOfSilenceToDetect defaults to .7 seconds currently and if you change it it will be shorter or longer, but the difference between .7 seconds and for instance .33 isn’t going to be a big perceptual difference (although the very short delay can cause issues since any intermittent noise followed by a pause can trigger recognition) because you will still have the following sequence of events which all use time: the speech continuing until to completion, the silence after the complete speech, and then the time to process the complete speech.

    RapidEars doesn’t use a period of silence at all because it recognizes speech while the speech is in-progress rather than performing recognition on a completed statement (for instance, if you say “go right” it will first return the live hypotheses “go” and then “go right” as you are in the process of speaking the phrase — RapidEars doesn’t wait for a silence period to recognize). For your goal of using OpenEars-style speech recognition that only happens after a silence but with a shorter silence period it isn’t necessary for you to use RapidEars. But, since OpenEars defaults to a short period of silence out of the box, the differences from shortening it more than the default aren’t going to be dramatic; expect it to be a smaller change in the user experience.

    in reply to: Report each word heard? #11360
    Halle Winkler
    Politepix

    I believe that this would now be possible using RapidEars: https://www.politepix.com/rapidears

    in reply to: Return hypothesis #11353
    Halle Winkler
    Politepix

    Thanks for the heads-up that this isn’t self-evident, I have added it to the documentation.

    in reply to: Return hypothesis #11351
    Halle Winkler
    Politepix

    Correct, that was actually the origin of the crash — when there is a null hyp in nbest, Pocketsphinx doesn’t malloc an nbest structure at all, and my function was unaware of that and trying to release the same number of nbests as was requested. Seeing fewer than your maximum is a sign that everything is working.

    Halle Winkler
    Politepix

    Thanks :)

    Halle Winkler
    Politepix

    If you don’t mind, I’d like to change the title to mention the Nuance issue so it’s Google-able and so that casual readers of the forum don’t get the impression that OpenEars isn’t compatible with iOS6, OK with you?

    Halle Winkler
    Politepix

    Yup, that matches my expectation. Here is what I would guess is going on:

    There are a few different iOS objects which take over the audio session and do not return it to the state it was in after they are no longer instantiated. Specifically, some of them turn off recording. So if you are responsible for an SDK that does continuous listening (e.g. both OpenEars and Nuance) you will get a lot of complaints about it ceasing to work after using AVPlayer or similar, because the iOS object removes the audio input. So you will put in a sanity check that makes sure to fix the audio session before your product does its thing.

    OpenEars does its sanity check when it re-opens the audio unit so that it is possible to play video while speech recognition is in suspended mode, since this is a very frequently-requested feature.

    The call you make to the Nuance object above almost certainly initializes their own listeners for audio session changes, which they deal with in their own way.

    I think that what you are seeing is that OpenEars is performing its sanity check before it tries to open its audio unit, discovers that Nuance has changed the audio session, fixes it, and Nuance discovers that OpenEars has changed the audio session, fixes it, comedy ensues.

    I can’t speculate about why it’s only happening with iOS6 but it could easily be a race condition that was always there but was resolving differently in iOS5.

    Halle Winkler
    Politepix

    Another approach for locating the conflict would be to make a copy of your app and to start removing functionality in stages until PocketsphinxController starts working again, at which point you can suspect the last thing your removed as at least a partial cause. I would be very interested in more info about this since it is certainly not something I’m blasé about if it is an interaction with OpenEars.

    Halle Winkler
    Politepix

    OK, since that means that it is something relating to code that I don’t have access to, let me explain what is weird in the logging and maybe it will point you towards what to check out.

    Early on in the app session your category is set (correctly) the first time and we’d normally expect it to remain that way for the rest of the app session unless it is overridden by a call to audiosession or AVAudioSession or the use of a media object like AVPlayer or MPMoviePlayerController:

    2012-09-25 15:38:46.061 test[819:907] audioCategory is now on the correct setting of kAudioSessionCategory_PlayAndRecord.

    Then some very normal stuff is done to bluetooth, default to speaker, etc, with no errors returned. At this point the audio session settings look very normal and if an attempt were made here to start the audio unit I would expect it to work.

    Then the device has some kind of DNS library issue which might be unrelated:

    2012-09-25 15:38:46.194 test[819:907] [NMSP_ERROR] check status Error: 696e6974 init -> line: 318

    However, if this is an automatic function of the Nuance SDK it is a sign that its objects might be instantiated at this point.

    PocketsphinxController continues starting up and gets as far as it can without any errors before it needs to start the audio unit, which is when things get weird:

    2012-09-25 15:38:46.407 test[819:907] Audio route has changed for the following reason:
    2012-09-25 15:38:46.408 test[819:907] There has been a change of category
    2012-09-25 15:38:46.409 test[819:907] The previous audio route was SpeakerAndMicrophone
    2012-09-25 15:38:46.535 test[819:907] This is not a case in which OpenEars performs a route change voluntarily. At the close of this function, the audio route is Speaker
    2012-09-25 15:38:46.536 test[819:907] Audio route has changed for the following reason:
    2012-09-25 15:38:46.541 test[819:907] There has been a change of category
    2012-09-25 15:38:46.542 test[819:6607] Set audio route to Speaker
    2012-09-25 15:38:46.542 test[819:907] The previous audio route was MicrophoneBuiltIn
    2012-09-25 15:38:46.544 test[819:6607] Checking and resetting all audio session settings.
    2012-09-25 15:38:46.545 test[819:907] This is not a case in which OpenEars performs a route change voluntarily. At the close of this function, the audio route is Speaker
    2012-09-25 15:38:46.546 test[819:6607] audioCategory is incorrect, we will change it.
    2012-09-25 15:38:46.782 test[819:6607] audioCategory is now on the correct setting of kAudioSessionCategory_PlayAndRecord.
    2012-09-25 15:38:46.786 test[819:6607] bluetoothInput is incorrect, we will change it.
    2012-09-25 15:38:46.791 test[819:5c03] 15:38:46.792 shm_open failed: “AppleAURemoteIO.i.724ba” (23) flags=0x2 errno=2
    2012-09-25 15:38:46.795 test[819:5c03] 15:38:46.795 AURemoteIO::ChangeHardwareFormats: error 3
    2012-09-25 15:38:46.796 test[819:5c03] 15:38:46.797 shm_open failed: “AppleAURemoteIO.i.724ba” (23) flags=0x2 errno=2
    2012-09-25 15:38:46.797 test[819:5c03] 15:38:46.798 AURemoteIO::ChangeHardwareFormats: error 3
    2012-09-25 15:38:46.805 test[819:6607] bluetooth input is now on the correct setting of 1.
    2012-09-25 15:38:46.808 test[819:6607] categoryDefaultToSpeaker is incorrect, we will change it.
    2012-09-25 15:38:46.808 test[819:907] Audio route has changed for the following reason:
    2012-09-25 15:38:46.810 test[819:907] There has been a change of category
    2012-09-25 15:38:46.811 test[819:907] The previous audio route was Speaker
    2012-09-25 15:38:46.893 test[819:6607] CategoryDefaultToSpeaker is now on the correct setting of 1.
    2012-09-25 15:38:46.895 test[819:6607] preferredBufferSize is correct, we will leave it as it is.
    2012-09-25 15:38:46.897 test[819:6607] preferredSampleRateCheck is correct, we will leave it as it is.
    2012-09-25 15:38:46.896 test[819:5c03] 15:38:46.896 shm_open failed: “AppleAURemoteIO.i.724ba” (23) flags=0x2 errno=2
    2012-09-25 15:38:46.898 test[819:6607] Setting the variables for the device and starting it.
    2012-09-25 15:38:46.900 test[819:6607] Looping through ringbuffer sections and pre-allocating them.
    2012-09-25 15:38:46.895 test[819:907] This is not a case in which OpenEars performs a route change voluntarily. At the close of this function, the audio route is SpeakerAndMicrophone
    2012-09-25 15:38:46.899 test[819:5c03] 15:38:46.899 AURemoteIO::ChangeHardwareFormats: error 3
    2012-09-25 15:38:46.907 test[819:907] Audio route has changed for the following reason:
    2012-09-25 15:38:46.908 test[819:907] There has been a change of category
    2012-09-25 15:38:46.909 test[819:907] The previous audio route was ReceiverAndMicrophone
    2012-09-25 15:38:47.497 test[819:6607] Started audio output unit.
    2012-09-25 15:38:47.499 test[819:907] This is not a case in which OpenEars performs a route change voluntarily. At the close of this function, the audio route is SpeakerAndMicrophone

    I count 4 category changes and 4 route changes resulting from it, all in approximately one second. That seems like something trying to override the OpenEars settings in an automatic way. You can see that at the moment that the audio unit begins, it isn’t set to a category which has a recording input, because the next thing that happens is that OpenEarsLogging announces that there was a route change to a route that contains a recording input (the last line above). The fact that the audio unit is trying to start at a time in which it seems like it has a mic input, then it disappears while the unit starts, is probably the origin of the crash.

    Are you completely positive that there is nothing going on with your Nuance SDK at the time that PocketsphinxController has started? Because that would make a lot of sense as the source of objects which are listening for audio session changes and automatically reacting to them. Otherwise, do a case-insensitive search of your app for “audiosession” and see if there are any AudioSession or AVAudioSession calls made by the app. You might also want to look for AVPlayer, AVAudioRecorder and/or MPMovieController objects (or other objects that assert their own audio session settings) that are instantiated at that time. I will keep looking into it in the meantime so please let me know if you find a cause in your app.

    Halle Winkler
    Politepix

    Do you see this on the same device and OS with the sample app?

    Halle Winkler
    Politepix

    OK, I’ll check it out and get back to you.

    Halle Winkler
    Politepix

    Which iPhone?

    Halle Winkler
    Politepix

    OK, I haven’t heard of this before but it looks like something is going awry with the audio session. OpenEarsLogging turns on verbosity for the audio session, can you run your test again with [OpenEarsLogging startOpenEarsLogging] running and show me the output? Is there anything special about the app as far as the audio session or other media objects goes?

    in reply to: TTS say phonemes #11258
    Halle Winkler
    Politepix

    You can also just change the variance on the SLT voice to a very low value in order to get that zero-inflection phoneme effect.

    in reply to: TTS say phonemes #11257
    Halle Winkler
    Politepix

    Hi, sorry for the fact that I didn’t see this.Here is a function I use in grapheme generator to obtain phones for arbitrary text:

    const char * flite_text_to_phones(const char *text,
    cst_voice *voice,
    const char *outtype)
    {
    const char * phones;

    cst_utterance *u;

    u = flite_synth_text(text,voice);
    flite_process_output(u,outtype,FALSE);
    phones = print_phones(u);

    delete_utterance(u);

    return phones;
    }

    But this of course involves two synthesis passes. I do it with a really fast voice in OpenEars so it isn’t that arduous but it’s probably still noticeable.

    If I recall correctly, the phonemes used in Flite are the same ones used in Pocketsphinx with the exception that Pocketphinx’s ah needs to be turned into ax.

    in reply to: [Resolved] Flite question #11238
    Halle Winkler
    Politepix

    Sounds good!

    in reply to: [Resolved] Flite question #11234
    Halle Winkler
    Politepix

    Welcome,

    Check out the FAQ for these answers:https://www.politepix.com/openears/support

    I think it’s fine to use the text I mentioned in the FAQ as long as you link to the CMU license somewhere (either to a reprint of it on your site or on one of their sites). Conforming to the CMU license isn’t something I can speak definitively about since it isn’t my license but in my experience the goal isn’t to put long licenses into your apps but to make sure that app users can access the licenses. Thanks for paying attention to crediting and licensing!

    in reply to: [Resolved] Avoid Junk Words Detection #11117
    Halle Winkler
    Politepix

    Updated: you can now handle out of vocabulary rejection using Politepix’s Rejecto plugin for OpenEars.

    in reply to: [Resolved] OutOfVocabulary #11116
    Halle Winkler
    Politepix

    Updated: you can now handle out of vocabulary rejection using Politepix’s Rejecto plugin for OpenEars.

    in reply to: [Resolved] How to reject out-of-vocabulary utterances? #11115
    Halle Winkler
    Politepix

    Updated: you can now handle out of vocabulary rejection using Politepix’s Rejecto plugin for OpenEars.

    in reply to: Keeping audio running when app goes into background #10999
    Halle Winkler
    Politepix

    Awesome, good to have the confirmation.

    in reply to: Keeping audio running when app goes into background #10997
    Halle Winkler
    Politepix

    OK, good luck!

    in reply to: Keeping audio running when app goes into background #10995
    Halle Winkler
    Politepix

    I’m surprised to hear that it’s possible and in the thread it’s actually the other developer who has a method, so I can’t really help with implementing his approach unfortunately.

    in reply to: Keeping audio running when app goes into background #10992
    Halle Winkler
    Politepix

    This is the only information I have on the subject:
    https://www.politepix.com/forums/topic/flite-in-background/

    in reply to: Exact Hypothesis #10986
    Halle Winkler
    Politepix

    On the engine side of things, only if you use JSGF, but this will significantly slow down your recognition and IMO it also reduces accuracy.

    You can also just screen your hypotheses for the results you are looking for, i.e. if you receive something other than the complete phrase in the order you expect it, ignore it.

    in reply to: TTS say phonemes #10984
    Halle Winkler
    Politepix

    OK, but this is a standard complaint about the KAL voices and one I’ve rarely heard about the higher-quality 16-bit voices:

    I think the current TTS isn’t always accurate and sometimes hard to understand unless the word is in a sentence

    There is no OpenEars function to just say phonemes. If you’re handy with C and want to read up on the Flite public API, you can change FliteController’s implementation of Flite to accept an input of phonemes instead of words and use Flite’s flite_synth_phones function on a returned CST utterance that then needs to be turned into a CST wave, and recompile the framework to give your app access to the changed method. It’s possible but the steps involved are unfortunately outside of the support scope of this forum.

    in reply to: Putting in pauses in flitecontroller #10980
    Halle Winkler
    Politepix

    Sorry, this isn’t currently possible, but I’ll take it under advisement as a useful future feature.

    in reply to: OpenEars reduces sound playback quality #10977
    Halle Winkler
    Politepix

    Why are you doing your own Audio Session management (serious question, maybe there is a good reason for it despite it being in conflict with the OpenEars instructions)?

    in reply to: OpenEars reduces sound playback quality #10975
    Halle Winkler
    Politepix

    Right, but you must be using it during the recognition activity because otherwise the AVPlayer audio session would completely override the OpenEars audio session, so my interest is in how you are using it so it is possible for its playback settings to conflict with those of the OpenEars audio session.

    in reply to: OpenEars reduces sound playback quality #10973
    Halle Winkler
    Politepix

    It isn’t actively changing the sample rate for playback, it is using the required recording and playback audio session type with a 16k record rate, which might override the playback rate as an unintended side effect. It’s actually a bit surprising to me that the playback rate of a media object is being affected at all, can you show me your object playback code as a test sample so I can replicate and look into it when there is time?

    in reply to: OpenEars reduces sound playback quality #10971
    Halle Winkler
    Politepix

    Just for some background on why it’s like this, for speech perception purposes there isn’t a big improvement in perception for higher sampling rates than 16k (and mono), which means that most speech recognition software will attempt recognition with a maximum of a 16k sample rate because it means there are far fewer samples that have to be analyzed. For non-speech applications such as music it’s naturally always going to be be better to use a higher sample rate and stereo if possible. But generally, even for speech that humans listen to, you also don’t get a lot of extra “bang for the buck” for going from 16k to 44.1k because the comparison standard is telephone bandwidth, which is generally standardized at 8k and compressed, making 16k PCM already a big step up. The reason that the recognition is compromised is that it assumes that a “chunk” of speech is likely to occur within a certain number of samples in a timeframe, and it’s more like 3x the samples in which the speech is occurring, so it is really not going to map well to the recordings which are in the acoustic model (which are actually 8k but the input functions compensate for the doubling of the input rate)

    in reply to: OpenEars reduces sound playback quality #10970
    Halle Winkler
    Politepix

    OK, that’s your call. I think that the perceived speech as far as pocketsphinx is concerned will seem quite different but I’ve also had the experience that it does perform the recognition after all, but there is a big loss of accuracy. For a small vocabulary it’s true that you might find it tolerable regardless so I’m glad to hear it works all right for your application. Do me a favor and mention your override in future support questions so that I can distinguish between potential issues that are normal and potential issues which could be a side-effect of your change.

    in reply to: OpenEars reduces sound playback quality #10969
    Halle Winkler
    Politepix

    Will this have an influence on OpenEars?

    Yup, see my answer that slipped in ahead of your last post.

    in reply to: OpenEars reduces sound playback quality #10967
    Halle Winkler
    Politepix

    Why do you need to make a CD-quality recording using the same stream that Pocketsphinx is using?

    in reply to: OpenEars reduces sound playback quality #10965
    Halle Winkler
    Politepix

    Ah, I understand now, you’re using a full 44.1k rate and PocketsphinxController requires (really requires) a 16k rate. If you convince it not to sample at 16k you will reduce the recognition accuracy severely. You’re correct that 16k recordings won’t sound as nice as 44.1k (CD quality) but if Pocketsphinx analyzed a 44.1k recording it would take forever.

    in reply to: TTS say phonemes #10956
    Halle Winkler
    Politepix

    Yup, it’s the second-worst voice out of eight. I think it would be a good use of time to brush up on the documentation about the different voices and try the better ones first.

    in reply to: TTS say phonemes #10954
    Halle Winkler
    Politepix

    Which voice are you using?

    in reply to: FliteController properties #10949
    Halle Winkler
    Politepix

    variance is the degree to which inflection is given a perceived sense of randomness.

    in reply to: OpenEars reduces sound playback quality #10944
    Halle Winkler
    Politepix

    Can you describe the reduction more specifically? It shouldn’t be possible for the bitrate or sample rate to be changed so I’m unclear on what aspect of playback is different. You can’t use PocketsphinxController without the audio session settings it needs.

    in reply to: Using AVFoundation and OpenEars #10941
    Halle Winkler
    Politepix

    You could try keeping your sample buffers and writing them out to a WAV file and submitting the WAV file to the runRecognitionOnWavFileAtPath: method. You won’t get voice audio detection/continuous recognition but you can submit the speech at the end of the capture.

    in reply to: Using AVFoundation and OpenEars #10939
    Halle Winkler
    Politepix

    I don’t think that is going to be trivial. I’m sure it is in some way possible but I doubt it can be done while enjoying any of the convenience functions of AVCaptureSession or AudioSessionManager/ContinuousAudioUnit. It’s unfortunately outside of the scope of the support I can give here.

    in reply to: Using AVFoundation and OpenEars #10937
    Halle Winkler
    Politepix

    There’s only one audio stream, it can’t be streamed into two objects simultaneously.

    in reply to: OpenEar crashes when detecting voice #10930
    Halle Winkler
    Politepix

    Ah, gotcha. OK, glad it’s working for you!

    in reply to: OpenEar crashes when detecting voice #10928
    Halle Winkler
    Politepix

    You just need to get the verbosePocketSphinx logging turned on and it will tell you what is going wrong.

    in reply to: Using AVFoundation and OpenEars #10926
    Halle Winkler
    Politepix

    Are you capturing audio at the same time as you are trying to do speech recognition?

    in reply to: OpenEar crashes when detecting voice #10925
    Halle Winkler
    Politepix

    OpenEarsLogging and verbosePocketSphinx aren’t related. OpenEarsLogging logs the basic functionality of the audio driver etc, and verbosePocketSphinx logs what is going on under the surface for pocketsphinx, which is where your issue is. I don’t think it’s possible that you won’t get any new logging output when you turn on verbosePocketSphinx since the crash is occurring after pocketsphinx is starting. Please double-check that it is turned on so you can show your logs.

    Acoustic model and language model is generated dynamically, so this shouldn’t be missing.

    The language model can be generated dynamically, but the acoustic model is part of the “framework” folder that has to be dragged into an app and cannot be dynamically generated. My guess is that the acoustic model isn’t in your new app.

    in reply to: OpenEar crashes when detecting voice #10922
    Halle Winkler
    Politepix

    To find out why PocketsphinxController is crashing, set verbosePocketSphinx to true. It probably can’t find all or part of the acoustic model or the language model in your new app.

    in reply to: force speech analysis to begin?? #10914
    Halle Winkler
    Politepix

    I don’t see that as a big performance issue for audio of that length.

    in reply to: force speech analysis to begin?? #10912
    Halle Winkler
    Politepix

    What specifically do you think would be an I/O issue?

    in reply to: force speech analysis to begin?? #10910
    Halle Winkler
    Politepix

    (I should say: I don’t expect that there is any way, which doesn’t mean that it’s impossible, just that my educated guess is any workaround will lead to more problems down the road than it solves right now).

    in reply to: force speech analysis to begin?? #10909
    Halle Winkler
    Politepix

    You can’t do that with secondsOfSilenceToDetect. You can fake the first part by immediately suspending listening when listening begins (using the relevant OpenEarsEventsObserver callbacks) and then unsuspending it when you want to begin your arbitrary interval. But there is no way to force recognition/avoid voice audio detection submitting recognition in its own time.

    My first suggestion would probably work very similarly to your wish though — instead of starting up recognition and then starting a timer, start a timer that starts an AVAudioRecorder and when your timer runs out, submit the PCM audio to runRecognitionOnWavFileAtPath. It should be functionally the same as what you want as far as I can tell.

    in reply to: force speech analysis to begin?? #10907
    Halle Winkler
    Politepix

    It’s only possible to do voice audio detection recognition with OpenEars on recordings using its driver. What you could try is to make a WAV recording of the speech and then submit it at the end to the method runRecognitionOnWavFileAtPath:usingLanguageModelAtPath:dictionaryAtPath:languageModelIsJSGF:

    in reply to: Scores used in Openears #10902
    Halle Winkler
    Politepix

    Sorry I don’t have a lot of info on hand about the way that the scoring works — I made a decision to pass the score along via the API because it was available as data and it seemed like overreaching to decide not to pass it back through the callback, but based on my discussions with the CMU Sphinx folks it doesn’t provide a lot of viable info for language models of the size that are appropriate for iPhone apps so I haven’t done a lot of investigation of its intricacies myself. My general advice is that you should base any logic that makes use of the score on data that emerges from well-organized and diverse tests rather than an interpretation of the scoring method.

    in reply to: Scores used in Openears #10901
    Halle Winkler
    Politepix

    I would just recommend looking into the source in the framework project to get the exact formula/e.

    in reply to: Scores used in Openears #10899
    Halle Winkler
    Politepix

    recognitionScore is equivalent to confidence score, but in pocketsphinx that is extraordinarily dependent on size of language model and environment and speaker, meaning that you have to be very conservative about using it for any program logic. I recommend testing a lot under many circumstances before deciding how to use scoring or whether to use it.

    For n-best, are all the scores not being returned?

    in reply to: Optimization for short utterances #10887
    Halle Winkler
    Politepix

    I would recommend reducing it and doing some user testing to see what the minimum is for your application before you have an issue with utterances being cut off.

    in reply to: Conflict with AudioToolBox #10866
    Halle Winkler
    Politepix

    No prob!

    in reply to: Conflict with AudioToolBox #10860
    Halle Winkler
    Politepix

    Hi,

    This is the authoritative discussion on this, but my impression is that it’s a bug or questionable feature of the audio session: https://www.politepix.com/forums/topic/keep-system-sounds-while-listening/

    I unfortunately don’t have any insight into this issue beyond the thread discussion.

    Say, would you be so kind as to remove the salty language in your language model when you post your logs? I don’t have a content filter on the forum because it’s almost never needed, but it’s better kept out of Google’s index for the site. Thank you!

    in reply to: change AVAudioSessionCategory #10812
    Halle Winkler
    Politepix

    Hi,

    There’s no built-in mechanism for changing the audio session category inside the framework (in fact, it’s required by the framework that you let it set its own audio session settings for good results) but you can always make your own calls to the audio session using standard AVAudioSession methods in parts of your app which don’t need to actively use OpenEars classes.

    in reply to: Speech not always detected #10808
    Halle Winkler
    Politepix

    I don’t really want to give advice here regarding forcing the audio session to reset because in 99% of cases the issue is not due to the audio session and folks will read the steps here and do stuff to the audio session directly and end up with messed-up apps that are very confusing for me to troubleshoot. That said, in this one case it might be worth your while to go investigate how the shared audio session manager is started by the internal classes and give it a go.

    in reply to: Speech not always detected #10806
    Halle Winkler
    Politepix

    OK, the issue is that the video player completely changes the audio session, so if you continuously play a video while PocketsphinxController is suspended, it guarantees that its built-in audio session reset behavior won’t work. I think the only option is to find a way to do what you need to do without always running a video.

    in reply to: Speech not always detected #10804
    Halle Winkler
    Politepix

    It might not be an actual mistake, but possibly some kind of limit to how well OpenEars can override the audio session with respect to the timing of your video if it’s close. What is in your log excerpt isn’t anything bad, but it might be helpful to see the whole thing (minus your own app logging which I don’t need to see). It does sound like the video might be changing the audio session and your recognition loop has quiet input as a result or possibly a wrong sample rate or something similar.

    Are you playing the video before starting the recognition loop or during it?

    in reply to: Speech not always detected #10802
    Halle Winkler
    Politepix

    Can you turn on OpenEarsLogging and show the log? Is the same issue there if you use the sample app and make “NO” one of the dynamically-created words in the dynamic model without any of your video code?

    in reply to: Optimization for short utterances #10776
    Halle Winkler
    Politepix

    You could try RapidEars and see if it helps if you’re open to non-free solutions. If I recall correctly, your implementation isn’t a supported method, so you might have audio session problems.

    in reply to: Optimization for short utterances #10725
    Halle Winkler
    Politepix

    Sure, check out the float property of PocketsphinxController “secondsOfSilenceToDetect”. I just moved it into the class so you could set it programmatically.

    in reply to: OpenEars and RapidEars Delegates #10630
    Halle Winkler
    Politepix

    Hi,

    What I would like to know is the order in which the delegates are called.

    Good question — this is really pretty dependent on what is happening/what you are doing. It isn’t so much that there is a particular order to expect but that there are particular events which will result in a certain delegate callback. The basic thing that you will see is the start event, then lots and lots of updates of the live speech event (as you mention) followed by a finalized speech event.

    However pocketsphinxDidStopListening doesn’t appear to be called, should this not be called as some point before pocketsphinxDidStartListening is called? Or should pocketsphinxDidStartListening not be called except for the very first time?

    I think this is an example of flawed naming on my part — pocketsphinxDidStartListening and pocketsphinxDidStopListening are not actually analogs. pocketsphinxDidStartListening is called when entering the listening loop, pocketsphinxDidStopListening is called when turning off the recognition engine finally.

    What causes rapidEarsDidDetectFinishedSpeechAsWordArray to be called? Does it still work on the second of silence?

    Correct, there are lots of attempts to recognize during the speech, and then once there is a pause there is a finalized, higher-accuracy attempt that is very similar to the default recognition behavior of OpenEars. It can be turned off (and should be turned off to save a few cycles) if you are only interested in the live speech but I left the option in there of using it so you aren’t excluded from the old-style pause-based recognition if you choose RapidEars. You can turn it off by setting this:

    [self.pocketsphinxController setFinalizeHypothesis:TRUE];

    to this:

    [self.pocketsphinxController setFinalizeHypothesis:FALSE];

    Just to confirm but pocketsphinxDidReceiveHypothese should no longer should be callled?

    Correct.

    What sort of delay if any will be caused when it’s switching between these states? I’m mainly interested in trying to find out if any words will be lost if they are said between rapidEarsDidDetectFinishedSpeechAsWordArray and pocketsphinxDidStartListening being called, how is the reconigition loop affected, should I make the user wait before continuing to speak?

    Just like with OpenEars, the engine is not taking in new audio while it is performing that pause-based finalized recognition (if you tell it to stop finalizing the expected behavior is that it shouldn’t have gaps in listening — let me know if that isn’t the case). But there shouldn’t be a delay in the time between returning the hypothesis and going back to listening.

    in reply to: Using RapidEars and the SampleApp #10583
    Halle Winkler
    Politepix

    Excellent :) . Give it a try on a device for the best recognition quality (the simulator recognition is not so great).

    in reply to: Using RapidEars and the SampleApp #10581
    Halle Winkler
    Politepix

    I think for clarity I would prefer it, at first I thought I was doing something wrong.

    OK, good to know that should be improved. I will need to take a look at your project in-depth a little later when I can reattach the references, but the only thing I noticed right off the bat is that you’ve added the -ObjC flag correctly but only for release and not debug. I guess if you’re then running it in debug it will probably be sad :) . Is it possible this is the issue?

    in reply to: Using RapidEars and the SampleApp #10579
    Halle Winkler
    Politepix

    OK, that line is just intended to refer to changing all of the incidents of the old-style recognition start to the new-style recognition start (since it occurs many times in the sample app) in the same way as the example that comes right before it, but if it’s unclear as-is I can revise it to be clearer, thanks for the feedback.

    All righty, I’m not sure what the issue is with the project so could you put up your modified sample app somewhere for me to download and take a look at? You can remove the OpenEars framework from the sample app folder to save file size if you want (but don’t remove the RapidEars plugin since I need to see how that is connected to the project). Thanks!

    in reply to: Using RapidEars and the SampleApp #10569
    Halle Winkler
    Politepix

    Based on the log, the plugin isn’t being used at all in the project since the error is that a method that is in the plugin isn’t available to the project. Can you elaborate on your comment:

    had to adapt the guide slightly this line is not right:

    “Then replace all of the other occurences of startListeningWithLanguageModelAtPath: with startRealtimeListeningWithLanguageModelAtPath:”

    Instead I replaced it with

    [self.pocketsphinxController startRealtimeListeningWithLanguageModelAtPath:self.pathToGrammarToStartAppWith andDictionaryAtPath:self.pathToDictionaryToStartAppWith]; // Starts the rapid recognition loop.

    Since the instructions say to replace all of the references to startListeningWithLanguageModelAtPath: with startRealtimeListeningWithLanguageModelAtPath:, which appears to be what you did? I’m still confused about what part of the instructions you are saying wasn’t right and I think maybe this could be related to the issue you’re experiencing.

    My impression is that the plugin isn’t added to the project target, or it isn’t being imported into the class in which you are using it by following these lines from the instructions:

    1. Open up ViewController.m in the editor and up at the top where the header imports are, after the line:

    #import <OpenEars/PocketsphinxController.h>
    add the following lines:

    #import <RapidEarsDemo/PocketsphinxController+RapidEars.h>
    #import <RapidEarsDemo/OpenEarsEventsObserver+RapidEars.h>

    in reply to: Using RapidEars and the SampleApp #10567
    Halle Winkler
    Politepix

    Hi,

    Sorry you are having difficulty integrating the plugin. Please turn on logging and show the output so I can assist — the instructions are known to work so there must be a minor implementation issue which the OpenEars logging may assist with. Here is a link to how to turn logging on: https://www.politepix.com/openears/yourapp/#logging

    Can you elaborate on what the difference is between your correction and the original instructions? They look like they are the same to me but maybe I am overlooking something.

    Just checking since I recall you are using an earlier version of OpenEars, did you follow the instructions to update to OpenEars 1.1 before adding RapidEars?

    in reply to: Keyword spotting using OpenEars #10541
    Halle Winkler
    Politepix

    Sounds a little big for local recognition (I think 500-1000 words is probably better) but the only way to know for sure is to test.

    Try setting n-best to return the 1 best hyp with a score and I think you should receive the score per word.

    in reply to: Keyword spotting using OpenEars #10539
    Halle Winkler
    Politepix

    Hi Maria,

    Take a look at the sample app and search for “nbest”.

    in reply to: Pausing app while FliteController is speaking #10534
    Halle Winkler
    Politepix

    Sounds like a nice app, always happy to hear about that kind of use of OpenEars. This is more of a general application design question but I don’t mind taking a stab at it.

    I think that the most efficient way to deal with this kind of issue is to launch the follow-up method (in this case whatever the “ask next question” method is) _from_ fliteDidFinishSpeaking (as part of an OpenEarsEventsObserver instance that is instantiated in the class whether the follow-up method lives). Generally, a good pattern for that kind of approach is to have a “queue” of questions which are added to an NSMutableArray at whatever point you know what they are supposed to be, and every time fliteDidFinishSpeaking is called you check the queue to see if there are any questions left in it to ask. If there is a next question in the queue, you launch your question method using that question from fliteDidFinishSpeaking and also remove it from the queue. Eventually you will run out of questions and fliteDidFinishSpeaking will not result in another question being asked. If logic dictates that there are new questions that should be asked, you add them to the queue. Does that make sense?

    in reply to: Pausing app while FliteController is speaking #10525
    Halle Winkler
    Politepix

    What do you mean by pausing the main application?

    in reply to: How does recognition score works? #10204
    Halle Winkler
    Politepix

    Hi Luis,

    It’s very much relative to that particular speaker and session and the size of the language model. It can’t be reliably used to create arbitrary cutoffs in my experience except perhaps with extremely low values (like -500000 or lower). I wish it were more useful for the task of evaluating accuracy but I haven’t encountered a case yet where it was possible to rely on the score so I’ve reversed my old advice that it could be used in this way.

    in reply to: Identifying the time when a particular word is spoken #10027
    Halle Winkler
    Politepix

    There is no feature that can do this, sorry.

    in reply to: Problems with Audio Session #10023
    Halle Winkler
    Politepix

    OK, so that means that the issue is that it used to work because your AIR app was overriding the audio session in some way that it requires after OpenEars first established it, and it had the random luck to not break recognition (or if it had a negative effect on recognition, which is possible, we aren’t directly aware of it) but now that OpenEars does a sanity check for the required settings at every recognition round, the AIR app doesn’t control the audio session setting so it can’t do its audio playback in whatever the form is that it requires.

    You have a fix ,which is to break AudioSessionManager, but this will also probably have negative effects on OpenEars’ performance. However, the previous AIR overriding that was working with the versions before 1.0 most likely also had the same effects so it might not seem like a problem. My random guess about the iPad is that of all your 4.x compatible devices, it may be the only one with only a single mic and whatever the AIR audio session settings are, they may force recognition to occur on a second mic.

    in reply to: Problems with Audio Session #10021
    Halle Winkler
    Politepix

    I’m a bit confused by the description “main app”, are there multiple apps in some sense?

    in reply to: Problems with Audio Session #10018
    Halle Winkler
    Politepix

    What kind of sounds are these, sounds you are actively playing or system sounds?

    in reply to: Problems with Audio Session #10012
    Halle Winkler
    Politepix

    Hmm, just to rule out any of the early bugs, can you upgrade to 1.1? I don’t really think that is your issue but it would be a good first step to set the level. Also it has nice new features, one of which is easier logging which might show us if there are any errors in the audio session manager.

    in reply to: Problems with Audio Session #10010
    Halle Winkler
    Politepix
    in reply to: Detecting single letters in the alphabet #9960
    Halle Winkler
    Politepix

    This is not a good application of the library, unfortunately.

    in reply to: Keyword spotting using OpenEars #9781
    Halle Winkler
    Politepix

    Hi Maria,

    The new version is up (I haven’t had time to update the detailed documentation so I haven’t announced it yet but you can see it at https://www.politepix.com/openears) so you can give it a try. There is a preprocessor define that turns on n-best in the new sample app so you can uncomment it to experiment with n-best and scoring.

    in reply to: Small Phrase Problem #9750
    Halle Winkler
    Politepix

    Hi,

    ARPA is probabilistic and increasing the probability of a phrase to 100% would break it as designed. What you are looking for is JSGF, which is a rules-based grammar. If you search for JSGF on this forum you should get a lot of starting info to research it further.

    in reply to: Keyword spotting using OpenEars #9718
    Halle Winkler
    Politepix

    OK, I think that the upcoming n-best feature should handle this for you but there isn’t anything in the current version which I can think of which will help. I’m hoping to release the next version around next Monday.

    in reply to: Keyword spotting using OpenEars #9716
    Halle Winkler
    Politepix

    OK, to clarify my own understanding of the issue, is what is happening that the actual phrase is “call my friend Maxim” and what is being reported is “Molly Glen Maxim” or something along those lines with the non-name words being replaced by similar-sounding names? Or is the issue that the rest of the sentence is being recognized correctly but you want to disregard non-name words which are in your language model?

    The end goal is just to be informed that “Maxim” was detected in the sentence without needing to know the specifics about the other words, is that correct? I don’t think there is a way to get per-word scores for a multiple word sentence, but n-best scoring will be coming up in the next version of OpenEars.

    in reply to: Politepix Public License and Library Credits #9674
    Halle Winkler
    Politepix

    Congratulations on your app! Feel free to promote it in the sticky topic at the top of the forum once it’s accepted.

    The license is at the root of the distribution, it’s called license.txt. It has some credit boilerplate language.

    in reply to: Gender and Detection #9652
    Halle Winkler
    Politepix

    Hello,

    Nope, no gender switch. The most effective thing I can recommend would be adapting the acoustic model with a large set of speech recordings of female speakers, using only speech related to your language model:

    http://cmusphinx.sourceforge.net/wiki/tutorialadapt

    in reply to: How can I increase accuracy? #9644
    Halle Winkler
    Politepix

    No problem. There is another potential complication that isn’t immediately obvious but that I’ve been trying to make a point of mentioning more frequently here, which is that a lot of developers specify apps with the idea that the device can be pretty far away from the user, but this actually gives the device speech recognition task an additional disadvantage that a desktop speech recognition application would be unlikely to have: a big mismatch between the design of the available microphone and the use that is being made of it. You can even see this with Siri if you open Notes and do dictation from a distance; return time from the server will get slower and accuracy will decrease because the iPhone mic is designed to be spoken directly into and to reject “background noise” which might be your user if they are far enough away and there are competitive sounds.

    This isn’t as big a deal with command and control language models/grammars, but as soon as you’re past 20 words or so you can start to see an impact. So another approach is to see if you can educate your users to not put too much distance between themselves and the device during app use.

    in reply to: How can I increase accuracy? #9638
    Halle Winkler
    Politepix

    Hi,

    Yes, recognizing numbers in isolation seems to be a difficult task for speech recognition engines.

    1) Trying to create a better language model by using a different toolkit such as SRILM MITLM or IRSLM

    3) Using LanguageModelGenerator

    Most language modeling software uses a set or subset of a few existing algorithms, so I don’t think you need to do a lot of experimentation there. The LanguageModelGenerator uses another good package so you could probably just try out whether its output is preferable and then call it a day.

    Build a acoustic model model with just the numbers 1-10

    Don’t you need 1-100? But you might want to investigate this approach and/or adapting the existing model with your new data: http://cmusphinx.sourceforge.net/wiki/tutorialadapt

    It seems like the task of creating an acoustic model that just recognizes 1-100 with a number of different voice contributors and accents is constrained enough to be feasible for an app project.

    Using JSGF instead ARPA

    In my opinion after some recent experimentation, JSGF is too slow for a good UX. Other developers do use it so as I said this is a matter of opinion. You can use the garbage loop approach for out of vocabulary rejection as well with ARPA as with JSGF: http://sourceforge.net/p/cmusphinx/discussion/help/thread/cefe4df3 which could be something that improves your results if the issue is too many false positives rather than too many false negatives or transposed recognitions.

    in reply to: [Resolved] Performance on Mobile Device #9629
    Halle Winkler
    Politepix

    OK, just for future reference there is a FAQ here where some similar questions are answered: https://www.politepix.com/openears/support

    The intention of the simulator driver is basically to let you debug everything else in your app in the simulator without Pocketsphinx breaking — that is pretty much the full extent of its ambitions :) .

    The device driver is tuned for Pocketsphinx and the iOS audio unit so it will be fast and reliable under high load with a small memory footprint (and so it won’t decrease the lifetime of the device flash drive by reading/writing continuously when Pocketsphinx is running). It can’t be translated into a simulator driver that works identically with all the different desktop devices that it might host without that becoming its own project, and I’d rather put the time into the device code. I also suspect that if the simulator driver were really good, some developers might test on it and be really surprised by the real-world performance of their app.

    in reply to: [Resolved] Performance on Mobile Device #9623
    Halle Winkler
    Politepix

    Hang on, when you say simulator performance, are you talking about performance or word accuracy rates? You will see terrible accuracy if you only test on the simulator, there are warnings about it everywhere.

Viewing 100 posts - 2,001 through 2,100 (of 2,164 total)