Halle Winkler

Forum Replies Created

Viewing 100 posts - 1,801 through 1,900 (of 2,166 total)

  • Author
    Posts
  • in reply to: Won't Recognize Q, CUE, or QUEUE #1015716
    Halle Winkler
    Politepix

    Also turn on OpenEarsLogging and verboseCMUCLMTK so you get any relevant output from the process of generating the language models.

    in reply to: Won't Recognize Q, CUE, or QUEUE #1015715
    Halle Winkler
    Politepix

    Yes, step one is definitely making sure that these new words are present in your language model and phonetic dictionary. Also, turn on verbosePocketsphinx so you receive any complaints from pocketsphinx about your language model or dictionary.

    in reply to: Won't Recognize Q, CUE, or QUEUE #1015712
    Halle Winkler
    Politepix

    The issue of doing recognition with several individual words that are only a syllable long and all rhyme with each other is not a satisfactorily-solved issue in speech recognition. This is another variation of the general issue of recognition of the English alphabet, which you can read people trying to find fixes for in every speech-recognition-related resource, unfortunately. There is no contextual cue for which one is the “real one” in the case you’re describing so as soon as there is any distance from the mic, the sounds are going to get mixed up.

    The strategy for dealing with it is going to be some combination of removing confusing words from the model and fusing multiple words together that you know will be spoken together.

    An example is that you don’t need the loose letter “U” if its presence there is just in order to let “U.S.” be recognized. In that case, make the word “U.S.”:

    U.S. Y UW AH S

    This will also improve the accuracy of words that are spoken near utterances of “U.S.”.

    The next issue I see is that the “Q1” etc segment has a couple of obscure words before and after it, which suggests to me that this is a big language model. Do you have the opportunity to switch between smaller, more contextually-specific language models?

    Can you do counting in either its own language model that you switch to, or with some kind of prefix? e.g. “Category 2” instead of just “2”.

    The last thing is that you haven’t shown the entry in the language model or the pocketsphinx logging output, so I don’t know for sure whether your alteration is actually in your language model as far as pocketsphinx is concerned. If you remove “U” and “2”, are you able to recognize “Q1”? If not, there might be an issue in the language model in general.

    In case you have confirmed that the language model is OK, and none of these approaches are options for you (although they are almost always options for an app that you can make design decisions about), the last possibility is to do it as a JSGF ruleset rather than a statistical ARPA model. Searching this forum for JSGF should help you get started.

    in reply to: Won't Recognize Q, CUE, or QUEUE #1015708
    Halle Winkler
    Politepix
    in reply to: Won't Recognize Q, CUE, or QUEUE #1015707
    Halle Winkler
    Politepix

    Hmm, single letters that rhyme with other single letters are very challenging for recognition.

    Since you already know the number of required quarters, something sneaky you can try is to have Q1 etc be the entire word, that is, instead of trying to recognize the combination of Q and 1, you will have a word “Q1” in your language model and you’ll edit the dictionary used so that the entry for Q1 reads as follows:

    Q1   K Y UW W AH N
    

    You’d do this for each quarter. Having the multiple syllable/sound combinations available for distinguishing between the quarters should make them recognizable.

    Halle Winkler
    Politepix

    Hi Dalee,

    Nope, that won’t work for an .epub. Ebook reading is an implementation question that is part of how you design and construct your app. OpenEars is a speech API so it doesn’t try to address app implementation questions like the best way to process ebooks into readable NSStrings. I’m sure that searching Stack Overflow for the keywords that are part of your implementation question will result in lots of useful information about where to get started.

    Halle Winkler
    Politepix

    Welcome Dalee,

    You can’t pass in a document, but the code to convert a text file into an NSString is just 1-2 lines, here’s an example:

    http://stackoverflow.com/questions/2171341/how-to-get-the-contents-of-a-text-file-stored-locally-in-the-documents-director

    Halle Winkler
    Politepix

    I wanted to mention that in order to look into this issue it is still important to receive an example of this which occurs naturally in your app, since it is needed in order to design an appropriate fix and to assign the fix a priority. So far, there has never been a bug report of this occurring “in the wild” because it was designed to be an improbable event, so it would be good to get a real example of how it occurred in your app in a form that prevents the pause token from being used, in order to understand the first case of it appearing as an issue.

    If your app reads different sources, just let me know about a source which leads to this issue — you can also inform me by email in order to keep it private.

    Halle Winkler
    Politepix

    Hi Hitoshi,

    Your English is great and I know how it is to have to speak with subtlety in a second language, since I have to use a second language frequently and also worry that I sound too brusque when I am making requests.

    I will definitely take your suggestion on board and consider the best way to integrate it in a future version, thank you for your suggestions. The most likely solution to the long unpunctuated speech question will just be to force a split on long unpunctuated text streams, since the lack of punctuation means there will be no contextual cues and the location of the split can be arbitrary (meaning: without any punctuation, we don’t know where the writer of the sentence meant to have clauses or emphasis, so we can split it anywhere because there is no better option). I’m not sure if I want to change the API to accomplish this goal but I will consider what you said.

    To give you a preview for how I would be likely to do the long sentence splitting, I will probably implement an NSScanner to count incidents of whitespace between words and pick an arbitrary value that are “too many” spaces without there also being any punctuation or pauses, and insert a pause token before sending it to synthesis. So you could also use this as your workaround right now if you need this immediately.

    Halle Winkler
    Politepix

    Understood, but I’ve also recommended a couple of other approaches — you could keep using partials but use the higher-quality algorithm which will return fewer times (but be more accurate) by setting:

    [self.pocketsphinxController setFasterPartials:FALSE];
    [self.pocketsphinxController setFasterFinals:FALSE];

    Those will still return partials, so that isn’t the OpenEars-style approach.

    Or you could just ignore partials that don’t match the utterance you are trying to detect, and short circuit your listening at the time that you receive a partial that matches the utterance you are trying to detect. There’s no requirement to display every partial to the user or run a method based on every one; you can also just check in the callback to see if it is the matching hypothesis and only react to the first one that is.

    My other suggestion was to compare an incoming hypothesis to the previous one and only display/invoke new methods when it no longer matches the hypothesis that came before, i.e. throwing out any repeated hypotheses for logical purposes.

    So there are a few ways that you can get the results you are looking for — it’s really a question of what is the best approach for your application goals.

    Halle Winkler
    Politepix

    That’s what I would expect – first the first word in the utterance is spoken, then the second, and the hypothesis grows along with the number of words spoken. Your utterance starts with a single word and then a second is added, so the hypothesis matches that.

    Something you could try if you want fewer callbacks is to experiment with the following settings in your initialization code for the PocketsphinxController+RapidEars object:

    To use a slightly-less “live” method you can use these settings:

    [self.pocketsphinxController setFasterPartials:FALSE];
    [self.pocketsphinxController setFasterFinals:FALSE];

    Alternately, you can ignore partial hypotheses (like your initial ones that just say “WORD” once) and wait for the final hypotheses in the rapidEarsDidDetectFinishedSpeechAsWordArray callback, but request them faster using these settings:

    [self.pocketsphinxController setFinalizeHypothesis:TRUE];
    [self.pocketsphinxController TRUE];

    Halle Winkler
    Politepix

    A hypothesis should appear as “word word” if they say “word word” and just as “word” if they just say “word”. So in that case, I think you’d be fine just suspending listening when you catch and react to the hypothesis you are waiting for the first time. Would that fit your requirement?

    Halle Winkler
    Politepix

    RapidEars works a bit differently than stock OpenEars and needs a slightly different approach to programming. With OpenEars, it waits until speech is complete and returns a single hypothesis, so you can just assume that there will be a one to one relationship with receiving a hypothesis and your programmatic reaction to receiving a hypothesis. RapidEars is continuously processing when it has started to detect speech, meaning that it will keep reasserting the present hypothesis until it changes or the utterance ends, because it is continuously re-scoring the hypothesis (seeing if it becomes more confident in a different hypothesis, or more confident in the current hypotheses, and seeing if more words are spoken after the current hypotheses, etc).

    You can work with this style in a couple of ways. One way is just to react to your keyword the first time you catch it and not worry about the rest of the hypotheses , i.e. short circuit the listening process when you get a match. Another way (I think this is closer to your request) would be to call a “keywordDetected:” method in the callback, but only call it for a new word (meaning you store the hypothesis and only forward the method if the new hypothesis doesn’t match the stored hypothesis, meaning it is a different hypothesis. Does that make sense?

    in reply to: Application size is 30mb after implementation #1015655
    Halle Winkler
    Politepix

    Welcome,

    Sure, check out the FAQ: https://www.politepix.com/openears/support

    Halle Winkler
    Politepix

    Hi Hitoshi,

    Can you give an example of a real sentence from your app that you need to say that encounters this limit and is not possible to place any commas, periods, or the pause token in? The sentence consisting of >20 repetitions of the letter w in a row doesn’t look like something that occurs in a real app interaction, but if I’m not correct about that, you could perform it in your app without a problem by programmatically placing a comma or a pause token in between the middle two “w”s, or between all of them. Unfortunately it isn’t possible to return anything from say:withNeatSpeech:usingVoice: because it is an asynchronous method and the time of the utterance is known after synthesizing it.

    The size of the maximum unpunctuated utterance was intentionally chosen based on the fact that it is several times larger than real sentence clauses in English. You can see this in the case of your choice of “w”, which contains the syllables of two complete words (‘double’ and ‘you’). In order for a sentence clause to occur which needed to render as many syllables as your test sentence, it would need to be an unpunctuated clause containing ~52 words. I’m not aware of a clause like this. But the pause token was added to the API specifically so that you would never need to have your users hear speech that is cut off, since you can programmatically insert it into long text that lacks punctuation.

    I’m not opposed to examining this in the long term, but I’d want to start with a real usage case that is creating an issue for someone in their app.

    in reply to: T2S breaks video/audio capturing in progress #1015638
    Halle Winkler
    Politepix

    Super, happy to hear it was so easy!

    in reply to: T2S breaks video/audio capturing in progress #1015636
    Halle Winkler
    Politepix

    Got it, OK. TTS is the conventional term (wanted to let you know so if you need to ask questions elsewhere folks will immediately know what it’s about). You’re correct, FliteController is reasserting a PlayAndRecord audio session and you’re right that it only needs playback if you have no use for speech recognition. What you can try is to change the audio session that AudioSessionManager asserts and then recompile the framework and see if it helps. You’ll make the change in AudioSessionManager.m. If you look around line 271 of that file you’ll see a commented-out list of audio session types, and you’ll basically want to do a search and replace for kAudioSessionCategory_PlayAndRecord, replacing it with the other audio session types you want to try as the audio session that FliteController reasserts before its playback. You could also comment out invocations to AudioSessionManager in the framework and see if it just does the right thing without any interaction with the audio session.

    in reply to: T2S breaks video/audio capturing in progress #1015634
    Halle Winkler
    Politepix

    Welcome,

    Is T2S referring to OpenEars’ text to speech, or is that another framework you are also using?

    in reply to: NeatSpeech) the question about the speaking speed #1015625
    Halle Winkler
    Politepix

    Hello Hitoshi,

    It is dependent on encapsulated details of the voice (it is specific to each voice) and it is also subject to change, so I can’t easily answer this question with a formula. If you’re curious about this for a specific voice, I think the full logging might give you enough information that you could time the playback and find out.

    Halle Winkler
    Politepix

    Hello Hitoshi,

    This isn’t a bug, there is a maximum length that is possible with a single utterance that has no punctuation in order to prevent unacceptable memory overhead. The maximum length is longer than unpunctuated sentences ever are in English. Just continue to use punctuation or add the pause token that is described in the documentation.

    in reply to: NeatSpeech Model #1015617
    Halle Winkler
    Politepix

    Yes, it is a required part of the framework.

    in reply to: When will a Wave file start saving? #1015562
    Halle Winkler
    Politepix

    If you are worried that a recognition might sometimes take less time than a file write (I’m not sure I would worry about this in practice), you can keep an int property that is the number of WAV writeouts and another that is the number of returned hypotheses and do something special when a hyp returns and the number of hyps is larger than the number of WAV writeouts. It is also a good way to test whether I’m right.

    in reply to: When will a Wave file start saving? #1015560
    Halle Winkler
    Politepix

    That would definitely be my expectation. Since this is the first time I’ve heard of this use, it would be great if you could let me know if that expectation is correct, but I’d be pretty surprised if it weren’t.

    in reply to: When will a Wave file start saving? #1015558
    Halle Winkler
    Politepix

    If so, then if that date is earlier than hypothesis callback timestamp, I can safely suppose that they’re paired (the wave file with a given hypothesis)?

    I would expect this assumption to be correct. It is difficult to imagine a circumstance under which a hypothesis will return before the wav is able to be written out. But you won’t always have a matching hypothesis for a WAV because not every hypothesis has content. You can decide what you’d like to do about those cases since it will vary from project to project.

    Should I stop my saveThatWaveController on every detectFinishedSpeech/stopListening/suspendRecognition, or it will be stopped automatically (so the wasSaved will be invoked on every case )?

    You should just leave it running; it will only do something when it has speech to save.

    Sorry for this many question, but that wavWasSaved holds really no information about the content in it.

    The only thing it knows about is the timestamp, since it isn’t bound to recognition, which is a positive thing because it means you can run it with recognition turned off in order to use OpenEars as a voice activity detector, or with n-best recognition on (many hypotheses for a single utterance, some of which are null) or when a null hypothesis is silently returned such as happens for the majority of hypotheses when using Rejecto. Although SaveThatWave works well with hypotheses, it saves all detected speech, not all non-null hypotheses.

    in reply to: PocketSphinx stops listening after I change language file. #1015517
    Halle Winkler
    Politepix

    Which name are you referring to? The file name that you pass to LanguageModelGenerator when you are requesting the generated model or something else?

    in reply to: PocketSphinx stops listening after I change language file. #1015512
    Halle Winkler
    Politepix

    Hi Giebler,

    It’s a requirement that language models have unique names, not really a bug, but I’ll mention it in the docs for clarity.

    in reply to: PocketSphinx stops listening after I change language file. #1015498
    Halle Winkler
    Politepix

    The first step is turning on verbosePocketsphinx so you can see any recognition errors or warnings in the console. Is it an ARPA model or a JSGF grammar? You do need to give your models unique names.

    in reply to: Soft-start Engine #1015476
    Halle Winkler
    Politepix

    Sort of — these are all things that happen when the engine is started (calibration, listening, language model switching), so they aren’t responsible for starting it. Switching language models is something you can do while listening is in progress so the impression that it starts listening comes from the context in which you are preventing entry into the listening loop.

    I think what you’re seeing is that the overall listening method is recursive, so events which return it to the top of the loop will end-run your method of preventing recognition. I think the startup time is just a second or so, are you seeing significantly longer waits to start?

    in reply to: Soft-start Engine #1015474
    Halle Winkler
    Politepix

    Yup, for speech recognition the optimal environment is always as quiet as possible, since background noise will either occlude the speech or cause an attempt to recognize it. So if the users are in the car and they are just using the built-in phone mic, it’s a good suggestion for them to turn off the radio. The important thing about calibration is that it is done on an environment that matches the speech environment, meaning that if the user is going to talk over the radio even if you suggest that they not do that, you want the radio on during calibration because silence in that case means “the user isn’t talking but there is quieter radio noise running in the background”.

    in reply to: Soft-start Engine #1015472
    Halle Winkler
    Politepix

    Welcome,

    This is not actually advisable, because the lag is the voice activity detection checking the noise levels in the room and calibrating itself to distinguish between silence and speech in the current conditions before the user starts speaking. If this is done at some arbitrary time before the user is just about to talk, the calibration isn’t being performed for the environment which exists in the timeframe in which the user is speaking. This will lead to error-prone recognition.

    in reply to: Optimizing open ears for single word recognition #1015467
    Halle Winkler
    Politepix

    Looks that way to me. I haven’t tested that code so you’re a bit on your own with it, but those look like constants so I expect that you’d want to uncomment the one corresponding to the recording type you want to use.

    in reply to: Any callback on sayWithNeatSpeech; completition? #1015465
    Halle Winkler
    Politepix

    OK, I will enter it as a feature request.

    in reply to: Any callback on sayWithNeatSpeech; completition? #1015463
    Halle Winkler
    Politepix

    You should be able to catch the end of a NeatSpeech utterance using the standard OpenEars OpenEarsEventsObserver delegate method – (void) fliteDidFinishSpeaking, is that not working for you? Receiving the actual string is not available.

    in reply to: Optimizing open ears for single word recognition #1015457
    Halle Winkler
    Politepix

    OK, yes the steps described in that post should help you with insufficient sensitivity, although please keep the downside in mind — if you increase the sensitivity you will also have more incidental noises triggering recognition (chirping birds, etc). Sometimes sensitivity issues are related to doing testing in a single environment and if you optimize for increasing or decreasing sensitivity based on one environment you will see a decline in performance on the other end of the spectrum (developers usually like to work in quiet environments but users like to do speech recognition in noisy ones). Just wanted to mention that issue in advance.

    When you make a change to the framework, you need to clean and build the framework project (OpenEars.xcodeproj). Then the new framework should be picked up in your app the next time you build and clean. You can test this by selecting the framework file in your app project and selecting “view in finder” and then when it takes you to the finder file, do “get info” and see if its last modified date is the time that you built the framework project.

    in reply to: Optimizing open ears for single word recognition #1015454
    Halle Winkler
    Politepix

    You can’t adjust the sensitivity of the voice activity detection, sorry. It sounds like it is too sensitive, is that correct? If the issue is that the users are speaking too quietly or from too far away for recognition to be accurate when it is triggered, this is generally something that is best addressed with user education: “MyApp works best if you speak clearly from no more than $DISTANCE away”. If the issue is that utterances are triggering recognition that are not user speech that relates to the app, Rejecto was designed to give a null result under that circumstance rather than a wrong hypothesis. If I have it backward and the issue is insufficient mic sensitivity, you could experiment with the suggestions in this thread:

    https://www.politepix.com/forums/topic/add-mode-options-next-version/

    in reply to: Any recommendation on Hypothesis scoring limits? #1015441
    Halle Winkler
    Politepix

    Hello,

    Here are the previous discussions:

    https://www.politepix.com/forums/topic/how-does-recognition-score-works/
    https://www.politepix.com/forums/topic/scores-used-in-openears/

    You can’t use ranges to limit because you don’t know what the environmental factors are, but something you can experiment with is detecting cases where the top n-best and the second-closest n-best are close and the overall range is large. i.e. 1-best is -50000 and 2-best is -50500 suggests to me that it’s a close call. But you can’t say “below -50000 is wrong”.

    Halle Winkler
    Politepix

    Glad it’s working for you.

    Halle Winkler
    Politepix

    Welcome,

    This is happening because you dragged in the voice frameworks, but you didn’t drag in the folder called VoiceData that is also in the demo disk image. So the frameworks are added to your project but the actual voice data is not (those are the models that it is complaining about not having). I can work on returning a more informative error under this circumstance.

    When you drag in the VoiceData folder from the disk image, make sure that in the add dialog “Create groups for any added folders” is selected and NOT “Create folder references for any added folders” so that the VoiceData folder models are added at the right location in your bundle, and clean your project before rebuilding to reduce the likelihood of Xcode being spooky. Let me know if that helps.

    in reply to: Pause playback #1015421
    Halle Winkler
    Politepix

    Cool, glad to hear it.

    in reply to: Pause playback #1015419
    Halle Winkler
    Politepix

    Hello,

    This isn’t a feature of FliteController, but it could be added to stock OpenEars pretty easily by adding a new method to pause/unpause FliteController’s AVAudioPlayer if it is playing. In NeatSpeech your option is to send a request to stop, but this will stop at the earliest opportunity to finish an utterance rather than pausing immediately.

    in reply to: Dynamic Grammar Generation #1015412
    Halle Winkler
    Politepix

    Sorry about the tag removal, I haven’t yet figured out a way to let people paste their whole JSGF grammar without also allowing arbitrary HTML (which is a security issue).

    Now my question is whether it’s better to supply the generateLanguageModelFromArray method with a list of the single words that comprise the sentences and let OpenEars/RapidEars figure out the sentences or should I give it a list of all the possible sentences?

    I would give the entire sentences; this will cause LanguageModelGenerator to give increased probability to the sentence word sequence.

    Is there a way of obtaining something similar to what I’m doing with JSGF using the LanguageModelGenerator?

    Not exactly since JSGF and ARPA models address two very different design issues. Unfortunately RapidEars doesn’t currently support JSGF, but lately there has been a lot more usage of JSGF in OpenEars (I think this is because performance on the device has improved to the extent that the performance hit for using JSGF isn’t as arduous as it used to be) so I will give adding it to RapidEars some thought, even though JSGF will definitely be less rapid.

    One thing that it occurs to me that you can try, and I guess this is how I would attempt to solve this, would be to change the probabilities in your language model by hand to raise them for bigrams and trigrams which represent the target sentences and lower them for unigrams that represent “loose” words. You’ll need to use the LanguageModelGenerator-produced .arpa file as your language model rather than the .DMP and open it up in a text editor. I would start by only changing the probability of the trigrams (the probability is the value at the start of the line). The probability is from a negative number to zero where zero is neutral and less than zero is lowered probability.

    in reply to: #1015304
    Halle Winkler
    Politepix

    Awesome! You’re very welcome.

    in reply to: #1015300
    Halle Winkler
    Politepix

    It is possible to turn a DMP back into a text file, but you don’t actually need to do that. The DMP isn’t part of the acoustic model, it is just a plain probability model that says “this or that word is most likely to be combined with the other word when the user speaks, or be by itself, etc”. It doesn’t know anything about what the words sound like, it just helps accuracy by trying to predict tendencies in speech based purely on the input strings that you give it. LanguageModelGenerator should be able to create your DMP just fine so you don’t have to do anything with that. The issue is that the phonetic dictionary lookup method that LanguageModelGenerator uses currently expects that the phonemes will match an acoustic model with English language phonemes, so it’s the .dic that it produces that you have to throw out. If you start Pocketsphinx using a DMP that LanguageModelGenerator creates, but you create your own .dic by hand using selected entries from the acoustic model’s .dic file, I think it should work for you.

    in reply to: #1015297
    Halle Winkler
    Politepix

    Ha, no problem, that’s why everyone edits the logs :) . My biggest objection to folks reporting issues with the simulator is when they post OMG TEH ACCURACY !!!ELEVEN!1! on Stack Overflow and then it turns out that they were testing accuracy on the simulator.

    But LanguageModelGenerator and basic things like finding file paths should be the same between the simulator and the device because they are deterministic and don’t rely on an audio driver, and if you’ve read up enough to have gotten my repeated suggestions not to test accuracy on the simulator than its fine to test something like this. Keep in mind that one thing you won’t get to see when you use the simulator is what will happen with constrained resources, meaning that if the DMP and dic files with the Russian model contain tens of thousands of words, they will probably work on the simulator but possibly crash or at least ridiculously underperform on the device.

    OK, so let’s level-set again because I’m a little confused between the Russian model/Spanish model situation. It looks to me like you are now just testing the Spanish model, is that correct? Can you first start by making absolutely sure that you remove all acoustic model files from your app (I would just start a new app from scratch at this point) and then re-add just the model that you are using? It’s really common that when developers are working with multiple language models they get issues due to having the acoustic model files mixed from two different models at the root of their bundle, and these issues are really hard for me to troubleshoot because the files are found by Pocketsphinx so there is no overt error or warning.

    Next, I think there is going to be a general issue with the fact that LanguageModelGenerator only works with English-language phonemes and the acoustic models you are using do not use the same phoneme set. You can see that right here:

    ERROR: “dict.c”, line 193: Line 1: Phone ‘EY’ is mising in the acoustic model; word ‘A’ ignored
    ERROR: “dict.c”, line 193: Line 2: Phone ‘AH’ is mising in the acoustic model; word ‘ABANDONA’ ignored
    ERROR: “dict.c”, line 193: Line 3: Phone ‘AH’ is mising in the acoustic model; word ‘ABANDONADA’ ignored
    ERROR: “dict.c”, line 193: Line 4: Phone ‘AH’ is mising in the acoustic model; word ‘ABANDONADAS’ ignored
    ERROR: “dict.c”, line 193: Line 5: Phone ‘AH’ is mising in the acoustic model; word ‘ABANDONADO’ ignored
    ERROR: “dict.c”, line 193: Line 6: Phone ‘AH’ is mising in the acoustic model; word ‘ABANDONADOS’ ignored
    ERROR: “dict.c”, line 193: Line 7: Phone ‘AH’ is mising in the acoustic model; word ‘ABANDONAN’ ignored
    ERROR: “dict.c”, line 193: Line 8: Phone ‘AH’ is mising in the acoustic model; word ‘ABANDONAR’ ignored
    ERROR: “dict.c”, line 193: Line 9: Phone ‘AH’ is mising in the acoustic model; word ‘ABANDONARA’ ignored
    ERROR: “dict.c”, line 193: Line 10: Phone ‘AH’ is mising in the acoustic model; word ‘ABANDONARLA’ ignored
    ERROR: “dict.c”, line 193: Line 11: Phone ‘AH’ is mising in the acoustic model; word ‘ABANDONARON’ ignored
    ERROR: “dict.c”, line 193: Line 12: Phone ‘AH’ is mising in the acoustic model; word ‘ABANDONE’ ignored
    ERROR: “dict.c”, line 193: Line 13: Phone ‘EH’ is mising in the acoustic model; word ‘BREGO’ ignored
    ERROR: “dict.c”, line 193: Line 14: Phone ‘EH’ is mising in the acoustic model; word ‘F’ ignored
    ERROR: “dict.c”, line 193: Line 15: Phone ‘IY’ is mising in the acoustic model; word ‘FRICA’ ignored
    ERROR: “dict.c”, line 193: Line 16: Phone ‘AY’ is mising in the acoustic model; word ‘I’ ignored
    ERROR: “dict.c”, line 193: Line 17: Phone ‘EY’ is mising in the acoustic model; word ‘K’ ignored
    ERROR: “dict.c”, line 193: Line 18: Phone ‘EH’ is mising in the acoustic model; word ‘LVAREZ’ ignored
    ERROR: “dict.c”, line 193: Line 19: Phone ‘EH’ is mising in the acoustic model; word ‘LVARO’ ignored
    ERROR: “dict.c”, line 193: Line 20: Phone ‘AA’ is mising in the acoustic model; word ‘R’ ignored

    What is happening is that you have these words transcribed phonetically with English-language phonemes like AY and EH (probably via the fallback method), but those phonemes are not present in your acoustic model so the words have to be excluded by Pocketsphinx. Rather than generating these dictionaries dynamically you will have to create them by hand. The .dic file that comes with the Spanish acoustic model ought to have the correct phonetic transcriptions for its words. It is possible to use the DMP file that LanguageModelGenerator generates, since that is a probability model and doesn’t directly interact with the acoustic model.

    in reply to: #1015295
    Halle Winkler
    Politepix

    OK, go ahead and post the complete log rather than an excerpt and we’ll see what it says. The error you posted is weird because it seems to think that mainBundle is the folder above the sample app (../OpenEarsSampleApp.app) and I doubt it should be referencing that path using OpenEarsSampleApp as its reference point at all, let alone a directory that is outside of the sandbox. Have you made any changes to any part of the code relating to that path?

    in reply to: #1015293
    Halle Winkler
    Politepix

    Let’s take this one at a time. We can troubleshoot the mdef first since it is going to cause a crash. How did you verify that there is a file called mdef in the bundle? It has to be in the root of the bundle so something to check is whether it is actually inside of a folder in mainBundle.

    in reply to: #1015291
    Halle Winkler
    Politepix

    OK, my curiosity got the better of me and I checked out the model myself. Do you get the same error if you rename the mixture_weights file to sendump?

    in reply to: #1015290
    Halle Winkler
    Politepix

    Hi Guntis, welcome.

    That error is due to the missing sendump you mentioned. I’m pretty sure that Pocketsphinx requires that sendump file so I’m sort of confused about why the Russian model doesn’t have one, but there must be a reason because nsh is the main Pocketsphinx developer so it isn’t a mistake. My suggestions for proceeding with this are as follows: the first step is to level-set and make sure that this works with Pocketsphinx outside of OpenEars (i.e. rule out, or discover, an issue with OpenEars that is causing this). Do you have a Linux VM or dedicated box you can install Pocketsphinx on and test the model? My other advice or an alternative step you might want to take (depending on how complicated it is for you to test on Linux) would be to ask Nickolay whether the sendump file is required and/or how to use the model with Pocketsphinx with no sendump, which you can do at the CMU Sphinx forums (or he might pop in and answer your SO question since he also follows the OpenEars tag): http://sourceforge.net/p/cmusphinx/discussion/help/

    Something else to keep in mind is that if the DMP/dic files are large-vocabulary recognition files (i.e. contain a vocabulary that is large enough for general dictation tasks, with tens of thousands of words) they will be too big for offline speech recognition on a handheld device.

    in reply to: Enabling Bluetooth Support #15171
    Halle Winkler
    Politepix

    Thanks for the offer, I’d love to borrow the device for a few days of testing but I think we might be impractically distant — I’ll get in touch though, maybe we can figure something out.

    in reply to: Enabling Bluetooth Support #15169
    Halle Winkler
    Politepix

    Great, thank you for this info. I will see if I can get a hold of one of those devices.

    in reply to: Enabling Bluetooth Support #15166
    Halle Winkler
    Politepix

    Welcome,

    Sorry you’re seeing an issue. The reason that I’ve labeled the bluetooth support experimental is that I can’t test against every device, so I do think it’s possible that it can happen that a particular device has quirks. The good news is that so far, this is actually the first time a developer has reported an issue with a particular bluetooth device in combination with the sample app since I added bluetooth support a year ago, so please treat it as a bug report and show me the full logging output from the sample app if it manifests the same issue and let me know the device, and if the opportunity to test and fix it comes up I will do so. I’d appreciate getting to see the full OpenEarsLogging output and the verbosePocketSphinx output since that will tell the whole story.

    It does sound like there could be a general configuration issue in your own app if it performs notable differently from the sample app — I would investigate whether your app makes changes to the audio session either by calls to AVAudioSession or lower-level audiosession calls since that is the most likely way for an app to change OpenEars’ audio handling. Something else that will change the audio session is using certain media objects such as video players or some audio players. The last thing that I think might lead to issues is if you are doing anything that might override OpenEars’ threading behavior.

    You can always access the source code and change it and recompile it for your own app — the framework source is right in the distribution in the OpenEars folder.

    in reply to: Error while integrating Neatspeech #15165
    Halle Winkler
    Politepix

    Welcome Ravi,

    This can happen if you didn’t add the -ObjC other linker flag or if the voices weren’t added to your target when you imported the voices folder, usually due to something going wrong with this step:

    “In order to use NeatSpeech, as well as importing the framework into your OpenEars-enabled project, it is also necessary to import the voices and voice data files by dragging the “Voice” folder in the disk image into your app project. Make sure that in Xcode’s “Add” dialog, “Create groups for any added folders” is selected. Make sure that “Create folder references for any added folders” is not selected or your app will not work.”

    Also make sure that your app target is checked in that “Add file” dialog so the items which are being added are also being added to your target.

    The error just means that the code for the NeatSpeechVoice “Emma” is not available to your project.

    in reply to: Knowing the present string being spoken by TTS #15133
    Halle Winkler
    Politepix

    OK, that makes sense, but there is no callback which identifies the word that is being spoken because the entire phase is synthesized at once when you use FliteController. The best workaround would be to send a series of very short statements, highlighting the statement as you send it.

    in reply to: Knowing the present string being spoken by TTS #15131
    Halle Winkler
    Politepix

    Welcome,

    Isn’t it just the string that you have entered into FliteController’s say:withVoice: method? Maybe I’m not understanding your question 100%, can you elaborate on why you can’t use your own string that you entered into say:withVoice:?

    in reply to: Delay pocketsphinxDidDetectSpeech #15085
    Halle Winkler
    Politepix

    Sorry, that is something you’d need to troubleshoot further on your own. It could be that you aren’t instantiating it at the right point in the logical flow of the app, it could be that there is an issue in the logical flow of your app with where you are trying to initiate the vibration effect (similarly to with the original question in this topic) or it could be that soundMixing isn’t a fix for what you are trying to do. It isn’t a supported feature so regretfully there is a time issue for me with getting too deeply into exploring the different potential reasons it might not be working yet.

    in reply to: Delay pocketsphinxDidDetectSpeech #15083
    Halle Winkler
    Politepix

    OK, glad that was helpful. You don’t actually have to modify the framework in order to turn on sound mixing, it is not currently part of the public API and therefore likely to change in future versions but for the time being you can turn on sound mixing simply by including the line you referenced above right before you do startListeningWithLanguageModelAtPath:. It might be necessary for you to import AudioSessionManager.h in the view controller from which you want to do that.

    in reply to: Delay pocketsphinxDidDetectSpeech #15081
    Halle Winkler
    Politepix

    OK, can you explain to me a little more about why you can’t suspend recognition at the time that you start the playback of your own AVAudioPlayer and resume it when you receive the delegate callback that your own AVAudioPlayer has completed playback? It doesn’t yet make sense to me why you’d need to suspend for an arbitrary period of time when you know the moment that you can suspend and the moment that you can resume in order to not have recognition in progress during your sound playback.

    in reply to: Delay pocketsphinxDidDetectSpeech #15078
    Halle Winkler
    Politepix

    Welcome,

    pocketsphinxDidDetectSpeech is a delegate method, so you don’t want to delay it since you don’t call it directly, you just want to address the underlying functionality that you control directly which is whether recognition is engaged or not. Suspending recognition before playing your sound and resuming it afterwards should work perfectly for the goal of halting recognition during other media playback, so if it isn’t working perfectly we should figure out why. What happens when you suspend before you play your sound back and resume after your sound is done playing?

    in reply to: Double commands in hypothesis #14932
    Halle Winkler
    Politepix

    OK, thanks for the logging. I’ve never received a report of this issue before (not that I don’t believe it, just that it’s not a common issue) and the OS and device you’re using is part of the testbed, so I think I’d want to check out the app code in order to learn more about what is happening — would it be possible for you to make a stripped-down sample app that manifests the issue to send me so I can see it?

    in reply to: New version Flite issue #14929
    Halle Winkler
    Politepix

    Heh, I was just coming in here to see if I could find the old guide to definitively removing the old version somewhere, when you posted that you sorted it out yourself :) . Nice work and thank you for updating me.

    Halle Winkler
    Politepix

    OK, that shouldn’t cause an issue as long as you are positive that you are linking to the demo framework from the non-licensed apps and not the licensed framework.

    Are you positive that you set the -ObjC “Other Linker Flag” in the target of the app that is having the issue? It looks like the RapidEars demo was somehow just not quite successfully installed by the exact steps in the tutorial. That is usually the step that would cause a method that is in the plugin to not work.

    Halle Winkler
    Politepix

    OK, could you tell me what the versions are that are shown on the front page of the pdf of the OpenEars documentation and the pdf of the RapidEars documentation? The Info.plist isn’t used for frameworks.

    Quick question, didn’t you have a working install of RapidEars previously? Just checking if something has changed in your setup.

    https://www.politepix.com/forums/topic/problem-switching-between-openears-and-rapidears/

    Halle Winkler
    Politepix

    Hi Matt,

    Which version of RapidEars and which version of OpenEars? Do other methods of RapidEars work? You can find version numbers in the included documentation with both downloads.

    in reply to: Optimizing open ears for single word recognition #14908
    Halle Winkler
    Politepix

    This is generally due to speaking too far away from the built-in device mic. Its optimal distance is telephoning distance so if the device is far away you won’t get as good results as with the headset mic.

    What exactly is happening when the recognition is wrong, is it something like the kid said “cat” but it recognized “hat”, where both “cat” and “hat” are words that are in the language model, or more like the kid said something unrelated but it was recognized as either “cat” or “hat”?

    The best advice I can give is to optimize a language model so it doesn’t have a lot of very similar-sounding short words in it, because that is the most challenging circumstance to get right. In that case I might want to use smaller language models and maybe try and see if Rejecto handles rejecting out of vocabulary speech (that’s only helpful if you are getting recognitions due to words which aren’t in the language model at all).

    I will take the request about having an option for putting the language models elsewhere under advisement for the next version of OpenEars, you make a good point.

    in reply to: Error when integrating the NeatSpeech demo #14906
    Halle Winkler
    Politepix

    Fantastic! Glad it’s working for you and enjoy the party.

    in reply to: Error when integrating the NeatSpeech demo #14904
    Halle Winkler
    Politepix

    by the way, the NeatSpeech voices are really good compared to the free ones…

    And thanks for this! Very nice to hear.

    in reply to: Error when integrating the NeatSpeech demo #14903
    Halle Winkler
    Politepix

    This would be the lazy instantiation approach:

    1. Make sure you’ve imported FliteController+NeatSpeech.h in the VC header after the import of FliteController.h,
    2. Create an ivar and property of the voice and of the FliteController in the VC header, synthesize both in the VC implementation, and for each, override their accessor method with the following lazy accessors:

    - (Emma *)emma {
    	if (emma == nil) {
    		emma = [[Emma alloc]initWithPitch:0.0 speed:0.0 transform:0.0];
    	}
    	return emma;
    }
    
    - (FliteController *)fliteController {
    	if (fliteController == nil) {
    		fliteController = [[FliteController alloc] init];
            
    	}
    	return fliteController;
    }
    
    

    Then, you don’t initialize either ever, or do any checking of whether they are instantiated, and you don’t have to queue, you just reference them like so:

    [self.fliteController sayWithNeatSpeech:@”I have always wished for my computer to be as easy to use as my telephone; my wish has come true because I can no longer figure out how to use my telephone.” withVoice:self.emma];

    Also, just for sanity, double-check that you’ve added the -ObjC other linker flag to the target.

    in reply to: Error when integrating the NeatSpeech demo #14902
    Halle Winkler
    Politepix

    Any VC. They can also be instantiated in a model that is controlled in a VC without any multithreading; the only reason I say to put them in a VC is that they should be on mainThread and not in a singleton but instead something which has a particular location in the view hierarchy and is normally memory managed.

    How are you struggling? Are you instantiating the voice and the fliteController in the emma and fliteController lazy instantiation method that is shown in the tutorial and then referencing them with self. as in:

    [self.fliteController sayWithNeatSpeech:@”I have always wished for my computer to be as easy to use as my telephone; my wish has come true because I can no longer figure out how to use my telephone.” withVoice:self.emma];

    ?

    I’m here to help, just let me know what the hangup is and I’m sure we can figure it out.

    in reply to: Error when integrating the NeatSpeech demo #14896
    Halle Winkler
    Politepix

    Just to explain a bit more about the internal queueing, you can send text to sayWithNeatSpeech: whenever you want, and if speech is currently in progress the new text will be queued behind the scenes and spoken when previous queued speech is done. Or you can send a single very large piece of text and NeatSpeech will break it down and queue it up on its own. You can also dump the queue. It’s built on the assumption that you will need to queue and manages its whole process of putting synthesis on a secondary thread and keeping the results that are delivered by OpenEarsEventsObserver on mainThread.

    in reply to: Error when integrating the NeatSpeech demo #14895
    Halle Winkler
    Politepix

    OK, I see a few issues. The first is that the tutorial gives an example of how to do the memory management for both FliteController and FliteController+Neatspeech voices, and it’s a good idea to use it since it avoids issues related to memory management. This looks like the initialization occurs inside of an instance method of a shared object, which seems like there are a few ways it could be going wrong. There’s no need to put NeatSpeech inside a singleton or do something with queueing since NeatSpeech manages its own queue internally and it is multithreaded and expects to be instantiated in one view controller, not in a singleton whose thread we don’t know.

    I would just set it up like the tutorial example:

    https://www.politepix.com/openears/tutorial

    in reply to: Error when integrating the NeatSpeech demo #14893
    Halle Winkler
    Politepix

    You can also contact me through the contact form and I’ll give you an address to email your code or project to if you want to use your free support email.

    in reply to: Error when integrating the NeatSpeech demo #14892
    Halle Winkler
    Politepix

    Hi,

    Can you show the code you used? It just sounds a bit like the Emma voice (or whichever voice) is not instantiated at the time you are calling it.

    in reply to: The pocketsphinxDidReceiveHypothesis is never fired #14844
    Halle Winkler
    Politepix

    That’s great! I’m happy I could help.

    in reply to: The pocketsphinxDidReceiveHypothesis is never fired #14842
    Halle Winkler
    Politepix

    Oh, I just noticed this from your question — there is no public method called startVoiceRecognitionThread, so calling a method with this name is probably the issue. If you are doing anything with PocketsphinxController’s threading it will probably cause issues since PocketsphinxController handles its own multithreading. Maybe the best approach is to do a new installation based on the tutorial: https://www.politepix.com/openears/tutorial

    in reply to: The pocketsphinxDidReceiveHypothesis is never fired #14839
    Halle Winkler
    Politepix

    Hmm, this is known working without any issues, so I think it’s just going to turn out to be OpenEarsEventsObserver delegate method connection issue.

    in reply to: Recording OpenEars Audio Input to File #14833
    Halle Winkler
    Politepix

    The underscore is part of the linker reporting, it isn’t related to the binary which definitely works with the current version of OpenEars.

    This is an normal issue with installing the plugin — can you make sure that you’ve followed all of the steps in the tutorial at https://www.politepix.com/openears/tutorial including adding the -ObjC linker flag in the right place and making sure that your project isn’t still linked to an old version of OpenEars?

    in reply to: Rejecto – LanguageModel #14824
    Halle Winkler
    Politepix

    Hiya and welcome back,

    I can take this under advisement as a requested feature, but the main thing Rejecto does is create a language model that incorporates the rejection features and has the rejecting elements added to the language model’s probability calculation, so there is no getting around the requirement to recalculate the lm’s probability model even if you start with a completed one. It also has to check and make sure that you aren’t already using one of the rejection phonemes in your real model and if so remove that phoneme from the rejection model, meaning that a premade .dic would also still need processing.

    Not saying there is no way, just that it isn’t trivial and it won’t vastly cut down on processing time.

    If you are seeing unpleasantly slow generation for a dynamic model, maybe you want to look at this tip I wrote up which had a suggestion at the end for avoiding any repeated use of the fallback pronunciation generation technique (i.e. the slow one):

    https://www.politepix.com/2012/12/04/openears-tips-and-tricks-5-customizing-the-master-phonetic-dictionary-or-using-a-new-one/

    in reply to: OpenEars on Mac #14636
    Halle Winkler
    Politepix

    Welcome,

    Unfortunately it can’t because the audio driver is extremely adapted to iOS audio, and I haven’t ported it because I think dealing with all of the possible variations in OS X desktop audio would be a huge support job but for a much smaller userbase, so basically not a good fit for a project such as this one. Sorry I can’t help you with that, pf.

    in reply to: NeatSpeech Problem #14431
    Halle Winkler
    Politepix

    OK, if the files are in there, there shouldn’t be an error that the files can’t be found, so why don’t you send the unhappy project over with any private stuff stripped out and I’ll investigate.

    in reply to: NeatSpeech Problem #14429
    Halle Winkler
    Politepix

    OK, the app bundle should not have folders in it called Voices or VoiceData. If the app bundle has these folders in it, the radio button selection in the “Add” dialog box at the time of importing the voice files is definitely on the wrong setting and the app will not be able to use NeatSpeech voices — you will get the exact error that you received.

    The only thing you should see in the app bundle as a result of adding NeatSpeech are the loose files that can be found within the folder VoiceData (but not the folder itself) at the root level of the app bundle. The folder that says Voices has no purpose inside the app bundle since it just contains frameworks which get compiled directly into the app product binary. You should see groups called Voices and VoiceData in your project file navigator, but never in your app bundle.

    Just remove the added folders from your project and add them again with the correct settings in the dialog box, exactly as they are described in the tutorial. This is the important line from the tutorial:

    Make sure that in Xcode’s “Add” dialog, “Create groups for any added folders” is selected. Make sure that “Create folder references for any added folders” is not selected or your app will not work.

    “Create folder references” is selected in your add dialog currently — Xcode only has a single way of creating subfolders within an app bundle, and that radio button selection is it, so it’s the cause of the issue.

    SLT will work fine because it doesn’t rely on any data files that need to be found in the app bundle.

    If you want, you’re welcome to email me your project (with the classes and resources stripped out) and I can double-check what the issue is. If you want to do that, just send me a note via the contact form and I’ll send you the email address.

    in reply to: NeatSpeech Problem #14426
    Halle Winkler
    Politepix

    What do you see when you look at the inside of your app bundle?

    in reply to: NeatSpeech Problem #14424
    Halle Winkler
    Politepix

    You can definitively verify whether this is the issue by selecting your app product under the Products group in the file navigator, right-clicking on it, and selecting show in finder. When you see the app product in the finder, you can right-click on it and request viewing the package contents. Once you are at the root level of the bundle (where your image files are added, etc), if the files were successfully added they will be visible in the app bundle at root level. If they aren’t in there at all, they weren’t added to the app target. If they are in there but they aren’t in the root level (they are inside of a folder that is at the root level) that means that when they are added, the add dialog box settings are incorrect.

    I suppose one last option is that it is possible that you are dragging the folder containing the voice data files into a group that already represents a folder within your app bundle. See if you get better results from dragging the folder in to the file navigator right into the project file icon rather than a subfolder.

    in reply to: NeatSpeech Problem #14422
    Halle Winkler
    Politepix

    Welcome,

    So far, this kind of error has always been due to the wrong settings on the “Add” import dialog such as the ones referenced in this similar issue with acoustic model resources being added or not added:

    https://www.politepix.com/openears/support/#Q_My_app_crashes_when_listening_starts

    Basically, the voice resources were not successfully added to your more complex project in the expected location in the bundle, but they were successfully added to the brand-new project in the expected location in the bundle. In the more complex project they are probably in a subfolder inside the app bundle where NeatSpeech can’t find them because they are expected to be at root level. This difference is a result of the setting mentioned here:

    Make absolutely sure that in the add dialog “Create groups for any added folders” is selected and NOT “Create folder references for any added folders”

    Halle Winkler
    Politepix

    Yeah, I suppose that somehow or other it has to be a project settings issue if the tutorial method works but it doesn’t work in the existing project. Glad to hear you are getting better results with a new project and I hope it continues to go smoothly.

    Halle Winkler
    Politepix

    Hi Matthew,

    Generally this only happens if the settings in the “Add files” dialog when doing the import are wrong with regard to these instructions:

    Make absolutely sure that in the add dialog “Create groups for any added folders” is selected and NOT “Create folder references for any added folders” because the wrong setting here will prevent your app from working.

    The other possibility is that the version of OpenEars in your existing app is an old version, or the path to the framework in the search path leads to an old version.

    Otherwise, it’s always a good step to clean the project, and to quit and restart Xcode. Let me know if any of this helps.

    in reply to: Playing a pre-recorded sound before synthesized speech. #14189
    Halle Winkler
    Politepix

    Hello,

    The ‘audioPlayerDidFinishPlaying:successfully:’ would only allow audio to be inserted afterwards.

    Only if you start playing the AVAudioPlayer sound after the speech returns the OpenEarsEventsObserver method. If you initiate the speech as a result of the audioPlayerDidFinishPlaying:successfully: method of a sound that you play using AVAudioPlayer being called, the sound will precede the speech.

    in reply to: Playing a pre-recorded sound before synthesized speech. #14144
    Halle Winkler
    Politepix

    Hello,

    You can play a sound whenever you like using standard AVAudioPlayer methods and their delegates. Just initiate speech once the AVAudioPlayer delegate method audioPlayerDidFinishPlaying:successfully: returns.

    in reply to: Double commands in hypothesis #13839
    Halle Winkler
    Politepix

    That’s funny, thanks for letting me know. It’s a big hint about what the underlying issue might be and I’m glad you have a workaround for now. When you get the time to send me the full log output I will see if I can track down the issue.

    in reply to: Double commands in hypothesis #13835
    Halle Winkler
    Politepix

    Also let me know which mic you are using when you’re getting these results and how far you are from the device, thanks.

    in reply to: Double commands in hypothesis #13834
    Halle Winkler
    Politepix

    OK, that sounds a bit buggy. It’s possibly an iPad 3 issue with the cmninit value that we could probably fix right now. Your code looks reasonable to me.

    Can I ask you to turn on verbosePocketsphinx and verboseLanguageModelGenerator and OpenEarsLogging and then print the log here? I’d like to see a log for 5 recognition rounds and it would be great if you would separately tell me what you really said.

    in reply to: Double commands in hypothesis #13832
    Halle Winkler
    Politepix

    Oh, another question — you posted this in the OpenEars plugins section, but from the info in the question I’ve been assuming that the question is actually about OpenEars without a plugin. Is this incorrect and the question is about one of the OpenEars plugins, or is it just about OpenEars itself?

    in reply to: Double commands in hypothesis #13831
    Halle Winkler
    Politepix

    OK, is it happening on the first hypothesis of the session or does it also happen afterwards?

    in reply to: Double commands in hypothesis #13829
    Halle Winkler
    Politepix

    Any chance you’re testing on the Simulator?

    in reply to: ConvertInput error in pocketsphinxDidReceiveHypothesis #13608
    Halle Winkler
    Politepix

    Cool, I’m glad to hear you’ve seen an improvement and I appreciate your updating the thread.

    in reply to: Timing of Open Ears Word Recognition #13547
    Halle Winkler
    Politepix

    Hi Matt,

    OpenEars uses pause-based continuous recognition, so it always has to wait for a half-second (or so) pause before it knows it can perform recognition on the entire utterance. RapidEars is a plugin for OpenEars which can do realtime recognition in which the speech is analyzed at the same time it enters the microphone with the least latency before returning results that is allowed by the speed of the device CPU.

    in reply to: Problem switching between OpenEars and RapidEars #13501
    Halle Winkler
    Politepix

    Hello,

    It shouldn’t be necessary to have two instances, and it is probably harmful since they may both be accessing the driver and the VAD in a way that is unexpected due to ARC.

    Both PocketsphinxController and PocketsphinxController+RapidEars use a PocketsphinxController instance, so you should be able to use a single instance of PocketsphinxController for both, and when you want to listen with RapidEars use RapidEars’ start method of startRealtimeListeningWithLanguageModelAtPath: and when you want to listen without RapidEars use the basic PocketsphinxController startListeningWithLanguageModelAtPath: method. Just be sure that you use the stopListening method for either before you start the other one. I’ve personally used both from the same PocketsphinxController instance in the same session so I would expect it to work.

    Let me know if this helps.

    in reply to: Need clarification on reducing binary size. #13473
    Halle Winkler
    Politepix

    Now that I’ve had time to double-check, confirming the fact that the “Deployment Postprocessing” build setting hasn’t had a change of name in recent versions of Xcode.

    in reply to: Changing Noise Level for Detecting Speech #13241
    Halle Winkler
    Politepix

    Hi Matt,

    There is no built-in way to do this, but you can investigate this approach with using a different audio unit/audio session type in this thread:

    https://www.politepix.com/forums/topic/add-mode-options-next-version/

    in reply to: Need clarification on reducing binary size. #13236
    Halle Winkler
    Politepix

    Hello,

    1) I believe it should still be called Deployment Postprocessing. I would search for that phrase in the search field.

    2) Correct, that is what it says in the FAQ as well.

    in reply to: Problems using AudioServicesPlaySystemSound with openEars #13128
    Halle Winkler
    Politepix

    This is due to the audio session settings used by the framework. If you want to sidestep the entire issue, just play the sound with AVAudioPlayer. Otherwise you can take a look at the approaches from me and others in these threads:

    https://www.politepix.com/forums/topic/keep-system-sounds-while-listening/
    https://www.politepix.com/forums/topic/conflict-with-audiotoolbox/
    https://www.politepix.com/forums/topic/pocketsphinx-disables-vibrate/
    https://www.politepix.com/forums/topic/simultaneous-mpmovieplayercontroller-video-and-speech-recognition/

Viewing 100 posts - 1,801 through 1,900 (of 2,166 total)