TTS say phonemes

Home Forums OpenEars TTS say phonemes

Tagged: 

Viewing 15 posts - 1 through 15 (of 15 total)

  • Author
    Posts
  • #10953
    Hubbo
    Participant

    How can I use Openears to say Phonemes?

    I would like to build an app that reads out single pre-programmed words, coded in phonemes.
    I think the current TTS isn’t always accurate and sometimes hard to understand unless the word is in a sentence, so I would like to use phonemes instead.

    I see fliteController doesn’t have many methods other than really ‘say’, however, I did come across this thread: here and mentions:
    float flite_phones_to_speech(const char *text, cst_voice *voice, const char *outtype)
    which appears to me to be in C++ (or similar). How do I access this function/method in Objective-C?

    Cheers!

    #10954
    Halle Winkler
    Politepix

    Which voice are you using?

    #10955
    Hubbo
    Participant

    After a quick play around I am using “cmu_us_kal16”, why? does this make a difference?

    #10956
    Halle Winkler
    Politepix

    Yup, it’s the second-worst voice out of eight. I think it would be a good use of time to brush up on the documentation about the different voices and try the better ones first.

    #10982
    Hubbo
    Participant

    I must be missing something then. I’d tried all of them, and I preferred this one.

    How do I just say phonemes? is this possible? is it in the documentation (please point me in the right direction – Thanks!!!!).

    #10984
    Halle Winkler
    Politepix

    OK, but this is a standard complaint about the KAL voices and one I’ve rarely heard about the higher-quality 16-bit voices:

    I think the current TTS isn’t always accurate and sometimes hard to understand unless the word is in a sentence

    There is no OpenEars function to just say phonemes. If you’re handy with C and want to read up on the Flite public API, you can change FliteController’s implementation of Flite to accept an input of phonemes instead of words and use Flite’s flite_synth_phones function on a returned CST utterance that then needs to be turned into a CST wave, and recompile the framework to give your app access to the changed method. It’s possible but the steps involved are unfortunately outside of the support scope of this forum.

    #11186
    Hubbo
    Participant

    Thx Halle, I’ve updated to 1.2 and gone back to SLT voices, but still trying to get phonemes to work.

    I’ve ended up created my own flite_phones_to_wave function (essentially copying flite_text_to_wave and replacing the reference of flite_synth_text with flite_synth_phones). However I’m still have dramas.
    Do you know the format of the phonemes for the parameter in flite_synth_phones function? I can’t seem to find any documentation for this. Cheers.

    #11257
    Halle Winkler
    Politepix

    Hi, sorry for the fact that I didn’t see this.Here is a function I use in grapheme generator to obtain phones for arbitrary text:

    const char * flite_text_to_phones(const char *text,
    cst_voice *voice,
    const char *outtype)
    {
    const char * phones;

    cst_utterance *u;

    u = flite_synth_text(text,voice);
    flite_process_output(u,outtype,FALSE);
    phones = print_phones(u);

    delete_utterance(u);

    return phones;
    }

    But this of course involves two synthesis passes. I do it with a really fast voice in OpenEars so it isn’t that arduous but it’s probably still noticeable.

    If I recall correctly, the phonemes used in Flite are the same ones used in Pocketsphinx with the exception that Pocketphinx’s ah needs to be turned into ax.

    #11258
    Halle Winkler
    Politepix

    You can also just change the variance on the SLT voice to a very low value in order to get that zero-inflection phoneme effect.

    #11364
    Hubbo
    Participant

    Thanks Halle!

    I’m really struggling here, more function are playing really weird sounds and don’t resemble anything close to actual speech.

    I tried your function, thanks, but run into trouble because I don’t have the print_phones() function.

    Finding the right documentation is proving to be a nightmare.

    Do you know what format the ‘text’ should be when passing to flite_synth_phones(text,voice)? All I manage to get is a jumbled miss of noise.

    #11365
    Hubbo
    Participant

    Example, if i pass “AX P AA R T M AX N T” to flite_synth_phones it sounds like a machine gun. Maybe I’m doing something else wrong somewhere.

    #11366
    Hubbo
    Participant

    Now that I think about it, print_phones() would help me out heaps :-)

    #11368
    Halle Winkler
    Politepix

    Hi Hubbo,

    Not sure what the issue is, but in my experience the phonemes-only speech is harder to understand and more unpleasant to listen to than the basic speech so I don’t recommend bothering. Definitely do your experiments using a better voice than KAL since there’s no way of knowing how much of its comprehensibility comes from the features that are removed when doing phoneme-only speech such as all variance and inflection.

    #11369
    Halle Winkler
    Politepix

    Take a look at the GraphemeGenerator.h source further, it is there for getting phonemes out of words. That’s really all the help I can provide on this one, sorry.

    #11747
    Halle Winkler
    Politepix

    Just wanted to follow up on this issue with the TTS voice quality and mention that there is now a plugin for OpenEars which lets it use better TTS voices which are as fast as the Flite voices but much clearer, and it can process long statements and multiple statements in a row much faster than the Flite voices. It’s called NeatSpeech and you can read more about it here: https://www.politepix.com/neatspeech

Viewing 15 posts - 1 through 15 (of 15 total)
  • You must be logged in to reply to this topic.