OpenEars and the main thread

Home Forums OpenEars OpenEars and the main thread

Viewing 15 posts - 1 through 15 (of 15 total)

  • Author
    Posts
  • #12045
    hohl
    Participant

    Does OpenEars (or especially the language model creator) needs to be launched on the main thread? Or does the language model creator automatically use the main thread on the line he need it?

    I am debugging an application which tries to run the language model creation in the background, but sometimes while running the language model creation the main thread get blocked (but the language model creator is launched in a background thread). I am not yet sure if there is some failure in my implementation or if it’s something internally in OpenEars?

    #12046
    Halle Winkler
    Politepix

    OpenEars expects to be launched on mainThread and to handle its own multithreading, but I wouldn’t automatically expect a problem with launching it on a secondary thread. Certain operations will always be returned to mainThread (notifications sent to OpenEarsEventsObserver and delegate callbacks of OpenEarsEventsObserver will always go to mainThread, I think that audio playing in FliteController must be initiated mainThread IIRC, maybe there are some other examples of things which expect to initiate from or return to mainThread). But, if all you are doing is running LanguageModelGenerator and you are running it on a background thread, my guess is that the issue is in the actual threading, because LanguageModelGenerator works as a series of single-threaded activities so it should be possible to background. Are you using blocks, NSTimer or NSThread?

    #12181
    hohl
    Participant

    Thanks for your response.

    While debugging the application without finding something special blocking I’ve started thinking that the blocking is produced by the I/O to the devices flash memory. Does OpenEars take heavy usage of the disk, which may blocking other resources (SQLite database) to load? What would be a good approach to throttle the disk I/O usage of OpenEars?

    #12183
    Halle Winkler
    Politepix

    Nope, it doesn’t make any notable continuous use of the disk. Have you completely ruled out something going on in your app? If you move the operations in the sample app to another thread, do you see the same results? I would try to simplify the test case as much as possible until you can replicate the issue with only one code addition.

    #12659
    hohl
    Participant

    I’ll let you know, if I can find what the lags causes. But at least there aren’t any exceptions or unexpected results when using on background. Just in the case of my application (with arround 3x~100 entries in background) it takes some time which will also block the main thread (noticeable as non-reacting UI).

    #12660
    Halle Winkler
    Politepix

    Yup, I would expect that to take a bit of time. Is there a requirement that they be dynamic, or would this blog post help: https://www.politepix.com/2012/11/02/openears-tips-1-create-a-language-model-before-runtime-from-a-text-file/

    My other question is about the kind of multithreading. Have you experimented with doing it with a block (or even a timer) if you’ve been using NSThread? I do a lot of multithreaded coding and I used to use NSThread and I’ve entirely switched over to blocks/GCD because it’s so much cleaner.

    #13103
    hohl
    Participant

    I am only using GCD since it is much cleaner. It must be dyamic since I am creating a language map of the artist and album names on the users device. The problem is, there isn’t any nice kind of notification when the user added new music to the library so I need to update the language model in the background based on a schedule plan. And while this is happening the user shouldn’t be blocked using other parts of the app or even other parts of the iOS system.

    Maybe it is because the artist names aren’t classic english words? When using OpenEarsLogging I get log entries of using gall back methods for nearly all the words in the created language model. ( https://www.sourcedrop.net/4Loa58d7ba3b3 )

    #13104
    Halle Winkler
    Politepix

    Have you taken the advice from OpenEarsLogging to convert everything to uppercase first? You can do this easily by taking your NSString and messaging [myString uppercaseString]

    #13105
    hohl
    Participant

    Just looked up what ‘convertGraphemes’ does and it looks like a very heavy task (looks like creating stuff with Text-To-Speech). And yes, everything is upper case. The problem is the language map only contains names! And having ~100 names like ‘KONTRUST’, ‘SKRILLEX’ or ‘DEICHKIND’ which all aren’t english words.

    How much work is it to create such a phonetic dictionary? Would it help to create such a dictionary for artist names?

    #13106
    Halle Winkler
    Politepix

    That is my suggestion. Since 1.2 LanguageModelGenerator has been able to take a custom or customized master phonetic dictionary in place of cmu07a.dic, and for this task I think the best approach is to add to cmu07a.dic and keep using it. In order to get the pronunciations to add I would do the following steps (this involves a little hacking to OpenEars and I’m leaving implementation up to you):

    1. Source some kind of comprehensive list of the most popular tracks with whatever cutoff for popularity and timeframe makes sense for you. Feed the whole thing into LanguageModelGenerator using the simulator and save everything that triggers the fallback method.

    2. Attempt to pick all the low-hanging fruit by trying to identify any compound nouns which consist of words which are actually in the master phonetic dictionary, just not with each other.

    3. Do the tedious step with the remaining words of transcribing their phonemes using the same phoneme system found in cmu07a.dic.

    4. Add them all to cmu07a.dic.

    5. Sort cmu07a.dic alphabetically (very important), use.

    With some good choices about your source data you could make a big impact on the processing time. The rest of it might have to be handled via UX methods such as further restricting the number of songs parsed or only using songs which are popular with the user, and informing the user in cases where this could lead to confusion.

    Remember that for Deichkind and similar, you aren’t just dealing with a unknown word but also a word that will be pronounced with phonemes which aren’t present in the default acoustic model since it is made from North American speech. There’s nothing like that “ch” sound in English, at least as it is pronounced in Hochdeutsch. If your target market is German-speaking and you expect some German-named bands in there it might be worthwhile to think about doing some acoustic model adaptation to German-accented speech. I just did a blog post about it.

    #13107
    Halle Winkler
    Politepix

    BTW, it’s a good idea to do this in any case, because the fallback method is quite a shot in the dark for this kind of application (i.e. an application like band name listing, where spelling something weirdly or cryptically is part of the aesthetic of the field in question), so if you have too much fallback you are just going to have holes in the pronunciation matching and the user might say it correctly without getting a result.

    #13108
    hohl
    Participant

    The recognition already works well, only the creation takes very long. What has surprised myself.

    Another thought of mine was to create a cache for “- (NSString *) convertGraphemes:(NSString *)phrase {“. But I didn’t debugged how long this method takes and if this would improve something.

    I’ll have a look at creating an custom cmu07a.dic addition which adds the most used names from some kind of public charts and will get updated regularly by some automated routine which doesn’t run on the device itself (instead only delivering the result).

    #13109
    Halle Winkler
    Politepix

    Yup, the fallback can’t be sped up particularly. It’s actually vastly faster since 1.1 and just needs a fraction of a second now, but as you’ve observed it is a (very) stripped-down form of speech synthesis so there’s no getting around the CPU load and time.

    If this were my thing I think I would keep my own Core Data model of the songs on the device along with their already-created pronunciation, and on launching the app I’d do a comparison of the user’s songs and my database of the user’s songs and just add pronunciations for new ones.

    #13110
    hohl
    Participant

    Yes, the last approach sounds like the best solution. I just have to look up how the LanguageModelGenerator works and how to use it with already pre-created pronunciation.

    #13121
    Halle Winkler
    Politepix

    How about this very simple approach: every time you launch you generate a complete language model, however, every time you generate a phonetic dictionary you add any unique entries to cmu07a.dic (alphabetized). Consequently, first run will be the big time-consuming generation, every subsequent generation will never use the fallback for a name that has ever been generated via the fallback. No core data/synchronization/repeated fallbacks.

Viewing 15 posts - 1 through 15 (of 15 total)
  • You must be logged in to reply to this topic.