HomeForumsOpenEarsFlite speech synthesis times for the top quality voices

This topic has 4 voices, contains 17 replies, and was last updated by  oakdemirci 286 days ago.

Viewing 18 posts - 1 through 18 (of 18 total)
Author Posts
Author Posts
July 24, 2011 at 7:40 am #7371

sarinsukumar

Hi halle,
I am here ready to get on to this and test it, My application has long text to pass to the flite. does this gives a streaming? or just multi-thread with the same delay as the last one?

July 24, 2011 at 9:08 pm #7375

Halle

Identical to the stock code but the synthesis occurs on a background thread so it doesn’t block mainThread.

July 26, 2011 at 10:27 am #7377

marco

Hi halle,
OpenEars is amazing. I’m currently using it right now for an app that i’m developing. I’m not really sure if i should be posting it here or i should make another topic, but is there a way to make the fliteController method say:withVoice: execute faster? I’m using it in my app to implement a countdown timer for 10 seconds. It’s taking at least 2 seconds for every “shout” it performs. Any help would be much appreciated.

July 26, 2011 at 10:33 am #7378

Halle

Hi Marco,

Two seconds to speak a single number seems awfully long, can you tell me the voice, OS and device that is with?

If you turn on OPENEARSLOGGING it will actually time the voice synthesis for you, so you can see what the exact number is to tell me.

July 26, 2011 at 10:44 am #7380

Halle

Another question: are you creating a new FliteController for every shout, or are you using the recommended memory management from http://www.politepix.com/openears/yourapp in the part which begins “The last convention is that when the instructions say to instantiate an object”? Because if you are initializing the entire controller each time that will really add some setup time.

July 26, 2011 at 10:58 am #7382

marco

Hi Halle,

Didn’t expect you to reply so fast! awesome. Anyway, the voice i’m using is “cmu_us_slt” as it is the “clearest” and “sexiest” voice IMO, OS is iOS 4.3.3, device is iPhone 4. I enabled the OPENEARSLOGGING and it takes an average of 0.745 seconds to run flite. I’m using the same FliteController for every shout.

I did however made an IF statement in the delegate method, “fliteDidFinishSpeaking”, checking if the number is still greater than zero, then it would decrement and shout the current number until zero.

July 26, 2011 at 11:05 am #7383

Halle

OK, if I’m not mistaken, that is the most complex voice on offer, so you are seeing slightly better than realtime performance which may be the best we can hope for. The 8k voices will perform better but they won’t sound as good.

If you are always using the voices to count down, you could actually not use voice synthesis but pre-recorded audio clips since you know the entire set of required statements in advance.

July 26, 2011 at 11:14 am #7384

marco

Wow. i never thought of using pre-recorded audio clips. Thank you for the tip! The reason why I’m using this is part of my app also uses the voice recognition. So i figured I would just use the fliteController to do TTS because it’s already part of the library. Thanks again for the tip!

July 26, 2011 at 11:21 am #7385

Halle

No problem, glad to hear that’s an option for you.

July 26, 2011 at 11:37 am #7386

sarinsukumar

Hi halle,
I also experiance the same, i am doing the voice synthesis, so the voice “cmu_us_slt” is taking 1 or 2 second for the initial sentance synthesis, I found there is an option for streaming. i am going to try that.
I found on 256 sample generation there is call back and we can grab those samples and give it to the player, But how will I give this chunks to the player? please advice.

July 26, 2011 at 2:39 pm #7388

Halle

You can just add another audio unit driver that uses the same buffer size as the existing audio unit driver, that queues up the streamed data and plays it back through the callback. You’ll need to do some testing to find the different amounts that need to be buffered before playback on the different devices and with the different voices so that they don’t skip.

July 26, 2011 at 2:58 pm #7391

sarinsukumar

hi halle,
did you mean avqueueplayer? I am notable to find out how to initialize the avplayeritem with a buffer. there seems to be methods to initialize it with files but not from buffers. can you point to something helpful?

July 26, 2011 at 2:59 pm #7392

Halle

Nope, I mean audio unit but you should code your driver in whatever technology you are already proficient in. There are many different approaches for playback from a buffer (FliteController already uses one, it just isn’t a buffer whose contents change over time so there is no need for a driver that needs to interface with the Flite streaming functionality).

August 6, 2011 at 2:19 pm #7466

oakdemirci

Hi Halle, really thanks for your effort and OpenEars. I’m trying to develop a tts application with OpenEars. As Sarin and Marco, I have some problem with performance of cmu_us_slt voice. It takes “some” time to synt. the voice.
I think I’m using “the recommended memory management from your help files” =)). I’m pasting the output. (BTW I also downloaded the multithreaded flite, but have not tried it yet. it would be great if you could write some hints about multithreading)
I dont prefer using prerecorded audio, because the app generates voice from a long description in text column from database,

Thanks in advance.

2011-08-06 16:01:13.546 MyApp[107:707] OPENEARSLOGGING: Flite interrupting existing talking if necessary.
2011-08-06 16:01:13.547 MyApp[107:707] OPENEARSLOGGING: I’m running flite
2011-08-06 16:01:16.874 MyApp[107:707] OPENEARSLOGGING: I’m done running flite and it took 3.326091 seconds
2011-08-06 16:01:16.876 MyApp[107:707] OPENEARSLOGGING: Flite audio player was nil when referenced so attempting to allocate a new audio player.
2011-08-06 16:01:16.877 MyApp[107:707] OPENEARSLOGGING: Loading speech data for Flite concluded successfully.
2011-08-06 16:01:17.236 MyApp[107:707] OPENEARSLOGGING: Flite sending suspend recognition notification.
2011-08-06 16:01:20.474 MyApp[107:707] OPENEARSLOGGING: AVAudioPlayer did finish playing with success flag of 1
2011-08-06 16:01:20.627 MyApp[107:707] OPENEARSLOGGING: Flite sending resume recognition notification.
2011-08-06 16:01:21.134 MyApp[107:707] OPENEARSLOGGING: Flite interrupting existing talking if necessary.
2011-08-06 16:01:21.136 MyApp[107:707] OPENEARSLOGGING: I’m running flite
2011-08-06 16:01:23.810 MyApp[107:707] OPENEARSLOGGING: I’m done running flite and it took 2.672839 seconds
2011-08-06 16:01:23.815 MyApp[107:707] OPENEARSLOGGING: Flite audio player was nil when referenced so attempting to allocate a new audio player.
2011-08-06 16:01:23.818 MyApp[107:707] OPENEARSLOGGING: Loading speech data for Flite concluded successfully.
2011-08-06 16:01:23.882 MyApp[107:707] OPENEARSLOGGING: Flite sending suspend recognition notification.
2011-08-06 16:01:26.442 MyApp[107:707] OPENEARSLOGGING: AVAudioPlayer did finish playing with success flag of 1
2011-08-06 16:01:26.594 MyApp[107:707] OPENEARSLOGGING: Flite sending resume recognition notification.
2011-08-06 16:01:27.104 MyApp[107:707] OPENEARSLOGGING: Flite interrupting existing talking if necessary.
2011-08-06 16:01:27.106 MyApp[107:707] OPENEARSLOGGING: I’m running flite
2011-08-06 16:01:29.734 MyApp[107:707] OPENEARSLOGGING: I’m done running flite and it took 2.625272 seconds
2011-08-06 16:01:29.736 MyApp[107:707] OPENEARSLOGGING: Flite audio player was nil when referenced so attempting to allocate a new audio player.
2011-08-06 16:01:29.737 MyApp[107:707] OPENEARSLOGGING: Loading speech data for Flite concluded successfully.
2011-08-06 16:01:29.781 MyApp[107:707] OPENEARSLOGGING: Flite sending suspend recognition notification.
2011-08-06 16:01:32.319 MyApp[107:707] OPENEARSLOGGING: AVAudioPlayer did finish playing with success flag of 1
2011-08-06 16:01:32.473 MyApp[107:707] OPENEARSLOGGING: Flite sending resume recognition notification.
2011-08-06 16:01:32.976 MyApp[107:707] OPENEARSLOGGING: Flite interrupting existing talking if necessary.
2011-08-06 16:01:32.978 MyApp[107:707] OPENEARSLOGGING: I’m running flite
2011-08-06 16:01:36.921 MyApp[107:707] OPENEARSLOGGING: I’m done running flite and it took 3.940393 seconds
2011-08-06 16:01:36.922 MyApp[107:707] OPENEARSLOGGING: Flite audio player was nil when referenced so attempting to allocate a new audio player.
2011-08-06 16:01:36.923 MyApp[107:707] OPENEARSLOGGING: Loading speech data for Flite concluded successfully.
2011-08-06 16:01:36.978 MyApp[107:707] OPENEARSLOGGING: Flite sending suspend recognition notification.
2011-08-06 16:01:40.769 MyApp[107:707] OPENEARSLOGGING: AVAudioPlayer did finish playing with success flag of 1
2011-08-06 16:01:40.922 MyApp[107:707] OPENEARSLOGGING: Flite sending resume recognition notification.
2011-08-06 16:01:41.427 MyApp[107:707] OPENEARSLOGGING: Flite interrupting existing talking if necessary.
2011-08-06 16:01:41.429 MyApp[107:707] OPENEARSLOGGING: I’m running flite
2011-08-06 16:01:44.807 MyApp[107:707] OPENEARSLOGGING: I’m done running flite and it took 3.375439 seconds
2011-08-06 16:01:44.808 MyApp[107:707] OPENEARSLOGGING: Flite audio player was nil when referenced so attempting to allocate a new audio player.
2011-08-06 16:01:44.809 MyApp[107:707] OPENEARSLOGGING: Loading speech data for Flite concluded successfully.
2011-08-06 16:01:44.872 MyApp[107:707] OPENEARSLOGGING: Flite sending suspend recognition notification.
2011-08-06 16:01:48.129 MyApp[107:707] OPENEARSLOGGING: AVAudioPlayer did finish playing with success flag of 1
2011-08-06 16:01:48.283 MyApp[107:707] OPENEARSLOGGING: Flite sending resume recognition notification.
2011-08-06 16:01:48.788 MyApp[107:707] OPENEARSLOGGING: Flite interrupting existing talking if necessary.

  • This reply was modified 286 days ago by  oakdemirci.
August 6, 2011 at 2:48 pm #7468

sarinsukumar

Hi,
Even by multi threading , you would not be able to reduce the delay. The only way i think is to implement it using flite streaming option with the help of audiounit.

August 6, 2011 at 2:51 pm #7469

Halle

Hi oakdemirci,

I can only tell you what I told marco and sarinsukumar — you are using the most complex voice and if it performs better than realtime you’re getting good results. Expected behavior for those 16k voices is that they will take more time to synthesize speech than the actual duration of the speech waveform, so they are not recommended for slow devices or very long statements.

There isn’t actually anything to document with the multithreading, it’s a drop-in replacement for the files and the replacement files are well commented.

August 6, 2011 at 2:53 pm #7470

Halle

Even by multi threading , you would not be able to reduce the delay.

True, but it will prevent the synthesis from blocking the app, which is an issue for a synthesis time that is longer than a fraction of a second.

August 6, 2011 at 3:10 pm #7471

oakdemirci

Ok, by multi threading I thought that, I have to implement different flite instances, and manage them accordingly, not just file replacement. I have already replaced the filed but it did not differ, as you said. Anyways thank you both. =)

Viewing 18 posts - 1 through 18 (of 18 total)

You must be logged in to reply to this topic.