NeatSpeech) Speech is interrupted(text of the long sentences, no punctuation)

Tagged: neatspeech

This topic has 7 replies, 2 voices, and was last updated 11 years, 1 month ago by hitoshi.

Viewing 8 posts - 1 through 8 (of 8 total)

Advertisement: “Don't want to wait for pauses before receiving speech recognition results? try RapidEars!”

Author

Posts
February 14, 2013 at 9:29 am #1015623

hitoshi
Participant

Hello, Halle.

With the licensed NeatSpeech version,
when I check the operations for speed test on the iPhone4 device(iOS6.1),
I got to find a bug.

I created a long text that contains no punctuation.
And, I set the text to the argument of the “sayWithNeatSpeech:withVoice:”.

As a result, when 21.62 seconds has elapsed since the start speaking(The bottom line is ‘FliteController.audioPlayer.isPlaying=YES’), the speech was stopped.

It is the same result even if I change the slow speed.(When 21.62 seconds has elapsed, it stopped.)
And, This is the same result on the iPhone5.

Thank you for your consideration.
Your anticipated assistance in this matter is greatly appreciated.

Best regards,
Hitoshi

February 14, 2013 at 9:36 am #1015624

Halle Winkler
Politepix

Hello Hitoshi,

This isn’t a bug, there is a maximum length that is possible with a single utterance that has no punctuation in order to prevent unacceptable memory overhead. The maximum length is longer than unpunctuated sentences ever are in English. Just continue to use punctuation or add the pause token that is described in the documentation.

February 16, 2013 at 8:25 pm #1015647

hitoshi
Participant

Umm.. This isn’t a bug.
Is that so? I’m sorry to hear that…

I think I should not let the users listen to incomplete speech.
In the next version, I’d like you to to consider that “sayWithNeatSpeech:withVoice:” returns the BOOL value.
(When incomplete voice is made, I wish the sound engine to return NO without reconstructing sound.)

As far as I examined, when becoming incomplete sound, “FliteController.audioPlayer” becomes the following condition.
FliteController.audioPlayer.duration = 20.83sec
FliteController.audioPlayer.data.length = 2,000,044byte

This means that there is possibility to become incomplete data when it reconstructs slowly using short text.
For example, it occurs in the following case;
speed=-1.0f
text=”w w w w w w w w w w w w w w w w w w w w w w w w w w ” (Only 52 characters!)

I think that it is difficult to judge with only text length.
It is appropriate that we judge by the result of creating speech data, isn’t it?

Thank you very much for your understanding.
Hitoshi

February 17, 2013 at 1:09 am #1015649

Halle Winkler
Politepix

Hi Hitoshi,

Can you give an example of a real sentence from your app that you need to say that encounters this limit and is not possible to place any commas, periods, or the pause token in? The sentence consisting of >20 repetitions of the letter w in a row doesn’t look like something that occurs in a real app interaction, but if I’m not correct about that, you could perform it in your app without a problem by programmatically placing a comma or a pause token in between the middle two “w”s, or between all of them. Unfortunately it isn’t possible to return anything from say:withNeatSpeech:usingVoice: because it is an asynchronous method and the time of the utterance is known after synthesizing it.

The size of the maximum unpunctuated utterance was intentionally chosen based on the fact that it is several times larger than real sentence clauses in English. You can see this in the case of your choice of “w”, which contains the syllables of two complete words (‘double’ and ‘you’). In order for a sentence clause to occur which needed to render as many syllables as your test sentence, it would need to be an unpunctuated clause containing ~52 words. I’m not aware of a clause like this. But the pause token was added to the API specifically so that you would never need to have your users hear speech that is cut off, since you can programmatically insert it into long text that lacks punctuation.

I’m not opposed to examining this in the long term, but I’d want to start with a real usage case that is creating an issue for someone in their app.

February 19, 2013 at 10:32 am #1015682

hitoshi
Participant

Hi, Halle.

I gave an extreme example. I know that this case is very rare on the real text.
However, I have ever heard the sentence that was interrupted on our app with NeatSpeech.
(Sorry, I do not remember whether it was what kind of sentence.
It was certain that I made the app speak most slowly.)

Our application deals with a variety of text datas.
It is too difficult to determine the point at which to set the punctuations.

If I set a lot of punctuations for a rainy day(=when it was set to speed=-1.0f),
the engine would unnatural speech in the normal speed.
And yet, I am anxious lest the speech is interrupted.

I suggest the following;
Would you like to provide new methods split “sayWithNeatSpeech:withVoice:” ?
e.g.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[FliteController Class]
-(void)makeSayWithNeatSpeech:(NSString *)statement withVoice:(NeatSpeechVoice *)voiceToUse;
…This is the function of only the creation of the speech data. It does not start the speech.

-(BOOL)sayQueue;
…Say the sentence made on “makeSayWithNeatSpeech:withVoice:”.
If the engine have not created yet, it return No.

[OpenEarsEventsObserverDelegate Protocol]
-(void)fliteDidMakeSpeaking:(BOOL)result;
…This is the notification of “The engine created speech data.”.
If there is a problem with the data created, result = No.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Another benefit of this proposal is also available.
Under present conditions, when we called “sayWithNeatSpeech:withVoice:”,
we need to wait just a moment to begin speech.

If we can create the speech data in advance, at the timing when the user taps the play button
we can play immediately speech data.

Your kind consideration of this matter would be sincerely appreciated.
Best regards,
Hitoshi

P.S.
I’m afraid my expressions may be rude or hard to read, because I’m not so good at English.
I will try hard to learn English.

February 19, 2013 at 10:47 am #1015683

Halle Winkler
Politepix

Hi Hitoshi,

Your English is great and I know how it is to have to speak with subtlety in a second language, since I have to use a second language frequently and also worry that I sound too brusque when I am making requests.

I will definitely take your suggestion on board and consider the best way to integrate it in a future version, thank you for your suggestions. The most likely solution to the long unpunctuated speech question will just be to force a split on long unpunctuated text streams, since the lack of punctuation means there will be no contextual cues and the location of the split can be arbitrary (meaning: without any punctuation, we don’t know where the writer of the sentence meant to have clauses or emphasis, so we can split it anywhere because there is no better option). I’m not sure if I want to change the API to accomplish this goal but I will consider what you said.

To give you a preview for how I would be likely to do the long sentence splitting, I will probably implement an NSScanner to count incidents of whitespace between words and pick an arbitrary value that are “too many” spaces without there also being any punctuation or pauses, and insert a pause token before sending it to synthesis. So you could also use this as your workaround right now if you need this immediately.

February 19, 2013 at 3:12 pm #1015685

Halle Winkler
Politepix

I wanted to mention that in order to look into this issue it is still important to receive an example of this which occurs naturally in your app, since it is needed in order to design an appropriate fix and to assign the fix a priority. So far, there has never been a bug report of this occurring “in the wild” because it was designed to be an improbable event, so it would be good to get a real example of how it occurred in your app in a form that prevents the pause token from being used, in order to understand the first case of it appearing as an issue.

If your app reads different sources, just let me know about a source which leads to this issue — you can also inform me by email in order to keep it private.

February 21, 2013 at 6:47 am #1015702

hitoshi
Participant

Hi, Halle.

Thank you so much.

So far, I did not found the text of the problem.
If I find that, I will report to you immediately.

I will try the method to force a split on long unpunctuated text streams.

I appreciate all your help.
Hitoshi
Author

Posts

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic.