TTS say phonemes

Tagged: phonemes

This topic has 14 replies, 2 voices, and was last updated 11 years, 5 months ago by Halle Winkler.

Viewing 15 posts - 1 through 15 (of 15 total)

Advertisement: “Don't want to wait for pauses before receiving speech recognition results? try RapidEars!”

Author

Posts
September 5, 2012 at 7:37 am #10953

Hubbo
Participant

How can I use Openears to say Phonemes?

I would like to build an app that reads out single pre-programmed words, coded in phonemes.
I think the current TTS isn’t always accurate and sometimes hard to understand unless the word is in a sentence, so I would like to use phonemes instead.

I see fliteController doesn’t have many methods other than really ‘say’, however, I did come across this thread: here and mentions:
float flite_phones_to_speech(const char *text, cst_voice *voice, const char *outtype)
which appears to me to be in C++ (or similar). How do I access this function/method in Objective-C?

Cheers!

September 5, 2012 at 9:02 am #10954

Halle Winkler
Politepix

Which voice are you using?

September 5, 2012 at 9:04 am #10955

Hubbo
Participant

After a quick play around I am using “cmu_us_kal16”, why? does this make a difference?

September 5, 2012 at 9:16 am #10956

Halle Winkler
Politepix

Yup, it’s the second-worst voice out of eight. I think it would be a good use of time to brush up on the documentation about the different voices and try the better ones first.

September 5, 2012 at 11:53 pm #10982

Hubbo
Participant

I must be missing something then. I’d tried all of them, and I preferred this one.

How do I just say phonemes? is this possible? is it in the documentation (please point me in the right direction – Thanks!!!!).

September 6, 2012 at 8:20 am #10984

Halle Winkler
Politepix

OK, but this is a standard complaint about the KAL voices and one I’ve rarely heard about the higher-quality 16-bit voices:

I think the current TTS isn’t always accurate and sometimes hard to understand unless the word is in a sentence

There is no OpenEars function to just say phonemes. If you’re handy with C and want to read up on the Flite public API, you can change FliteController’s implementation of Flite to accept an input of phonemes instead of words and use Flite’s flite_synth_phones function on a returned CST utterance that then needs to be turned into a CST wave, and recompile the framework to give your app access to the changed method. It’s possible but the steps involved are unfortunately outside of the support scope of this forum.

September 18, 2012 at 7:45 am #11186

Hubbo
Participant

Thx Halle, I’ve updated to 1.2 and gone back to SLT voices, but still trying to get phonemes to work.

I’ve ended up created my own flite_phones_to_wave function (essentially copying flite_text_to_wave and replacing the reference of flite_synth_text with flite_synth_phones). However I’m still have dramas.
Do you know the format of the phonemes for the parameter in flite_synth_phones function? I can’t seem to find any documentation for this. Cheers.

September 24, 2012 at 4:04 pm #11257

Halle Winkler
Politepix

Hi, sorry for the fact that I didn’t see this.Here is a function I use in grapheme generator to obtain phones for arbitrary text:

const char * flite_text_to_phones(const char *text,
cst_voice *voice,
const char *outtype)
{
const char * phones;

cst_utterance *u;

u = flite_synth_text(text,voice);
flite_process_output(u,outtype,FALSE);
phones = print_phones(u);

delete_utterance(u);

return phones;
}

But this of course involves two synthesis passes. I do it with a really fast voice in OpenEars so it isn’t that arduous but it’s probably still noticeable.

If I recall correctly, the phonemes used in Flite are the same ones used in Pocketsphinx with the exception that Pocketphinx’s ah needs to be turned into ax.

September 24, 2012 at 4:04 pm #11258

Halle Winkler
Politepix

You can also just change the variance on the SLT voice to a very low value in order to get that zero-inflection phoneme effect.

September 28, 2012 at 1:45 am #11364

Hubbo
Participant

Thanks Halle!

I’m really struggling here, more function are playing really weird sounds and don’t resemble anything close to actual speech.

I tried your function, thanks, but run into trouble because I don’t have the print_phones() function.

Finding the right documentation is proving to be a nightmare.

Do you know what format the ‘text’ should be when passing to flite_synth_phones(text,voice)? All I manage to get is a jumbled miss of noise.

September 28, 2012 at 2:26 am #11365

Hubbo
Participant

Example, if i pass “AX P AA R T M AX N T” to flite_synth_phones it sounds like a machine gun. Maybe I’m doing something else wrong somewhere.

September 28, 2012 at 2:43 am #11366

Hubbo
Participant

Now that I think about it, print_phones() would help me out heaps :-)

September 28, 2012 at 6:45 am #11368

Halle Winkler
Politepix

Hi Hubbo,

Not sure what the issue is, but in my experience the phonemes-only speech is harder to understand and more unpleasant to listen to than the basic speech so I don’t recommend bothering. Definitely do your experiments using a better voice than KAL since there’s no way of knowing how much of its comprehensibility comes from the features that are removed when doing phoneme-only speech such as all variance and inflection.

September 28, 2012 at 7:10 am #11369

Halle Winkler
Politepix

Take a look at the GraphemeGenerator.h source further, it is there for getting phonemes out of words. That’s really all the help I can provide on this one, sorry.

October 26, 2012 at 10:53 am #11747

Halle Winkler
Politepix

Just wanted to follow up on this issue with the TTS voice quality and mention that there is now a plugin for OpenEars which lets it use better TTS voices which are as fast as the Flite voices but much clearer, and it can process long statements and multiple statements in a row much faster than the Flite voices. It’s called NeatSpeech and you can read more about it here: https://www.politepix.com/neatspeech
Author

Posts

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic.