Using Flite to pronounce a word a certain way

Home Forums OpenEars Using Flite to pronounce a word a certain way

Viewing 6 posts - 1 through 6 (of 6 total)

  • Author
    Posts
  • #7111
    rmangino
    Participant

    Hi,

    I am using OpenEars 0.911. I am using the speech recognition functionality using .dic & .languagemodel files – a domain specific vocabulary (none of which are “normal” words – they do not exist in the cmu07a.dic file).

    I have been hand-correcting the .dic file to help with recognition.

    What I would *like* to do is use Flite to help my users correctly pronounce a given word. It isn’t apparent to me whether or not I can have any control in the way Flite pronounces a given word – is this correct?

    For example, lmtool creates the following for the name “Bryson”:

    BRYSON B R AY S AH N

    I have changed it to be:

    BRYSON B R IY S AH N
    [again, this is being done for a domain specific reason]

    I would like Flite to be able to pronounce the word using the 2nd set of phones.

    Any suggestions would be greatly appreciated.

    Thanks!

    #7114
    Halle Winkler
    Politepix

    The simplest way to fake out Flite with an alternate pronunciation, if you have the luxury of knowing the words needed in advance, is to have Flite say a series of single-syllable words with the phonemes you actually want. I did this many times in AllEars because the Flite voice I chose didn’t say email, iPhone or mobile in a way that I expected to be immediately clear to the user. So instead I gave it @”eem ale”, @”eye phone”, @”moe bile” which worked out pretty well. Is that kind of workaround an option for you?

    #7123
    rmangino
    Participant

    Hi Halle,

    As always, thank you for the great (and prompt) response.

    I can definitely create a mapping between phones and the “correct” pronunciation text for use with Flight.

    The issue I’m running into right now is that Flight seems to have its own ideas as to which parts of a word should be emphasized (which nullifies the entire point of providing a spoken pronunciation example in the first place).

    I’ve read through much of the material at http://www.speech.cs.cmu.edu/flite/ but I can’t seem to find anything pertaining to how I can modify/impact Flite’s emphasis choices.

    Have you looked into this at all?

    Thanks!

    #7124
    Halle Winkler
    Politepix

    I have looked into it briefly in the course of my own attempts to improve the pronunciation in AllEars, but I don’t have functional code to share with you because I ended up choosing to not do this with Flite. I can’t remember what the exact reason was but early on in the troubleshooting process I decided to keep it simpler.

    My not-finished idea was to give Flite SSML text:

    http://www.w3.org/TR/speech-synthesis/

    Which it is meant to support. I can’t remember if I couldn’t get it working, or got it partially working and then found out that it didn’t give fine-grained enough control, or what the issue was.

    You can give Flite SSML input as a text file with the function:

    float flite_ssml_to_speech(const char *filename,
    cst_voice *voice,
    const char *outtype)

    Which you can search the library for. Getting the output over to the FliteController method as a waveform I have no advice on, but I’m sure it’s possible. Hope this is helpful.

    There is also a function for giving Flite explicit phones:

    float flite_phones_to_speech(const char *text,
    cst_voice *voice,
    const char *outtype)

    Which may be helpful if there is some method that I haven’t come across for marking a phone for emphasis, although my recollection is that it results in equal emphasis for all phonemes, i.e. robot voice.

    A good place to look for implementation templates is the main() of flite.c that is part of the Flite download.

    #7125
    Halle Winkler
    Politepix

    Oh, something else to keep in mind is that the Flite phonemes do not map identically to the US English Pocketsphinx phonemes. I think that spending a little bit of time looking at the implementation of GraphemeGenerator from the library should probably point up any important differences, since I know it was an issue I needed to solve in order to set up the fallback pronunciation technique for creating a dictionary entry when the word isn’t found in the main dictionary file. It might be as minor as AX -> AH, just take a look (keeping in mind that you are doing the opposite process as GraphemeGenerator by taking a Pocketsphinx phoneme and converting it into a Flite one).

    #7126
    rmangino
    Participant

    Hah – thanks Halle – that is *exactly* what I was just doing! :)

Viewing 6 posts - 1 through 6 (of 6 total)
  • You must be logged in to reply to this topic.