Method and apparatus for controlling a speech synthesis...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S260000

Reexamination Certificate

active

06810378

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to the field of text-to-speech conversion (i.e., speech synthesis) and more particularly to a method and apparatus for capturing personal speaking styles and for driving a text-to-speech system so as to convey such specific speaking styles.
BACKGROUND OF THE INVENTION
Although current state-of-the-art text-to-speech conversion systems are capable of providing reasonably high quality and close to human-like sounding speech, they typically train the prosody attributes of the speech based on data from a specific speaker. In certain text-to-speech applications, however, it would be highly desirable to be able to capture a particular style, such as, for example, the style of a specifically identifiable person or of a particular class of people (e.g., a southern accent).
While the value of a style is subjective and involves personal, social and cultural preferences, the existence of style itself is objective and implies that there is a set of consistent features. These features, especially those of a distinctive, recognizable style, lend themselves to quantitative studies and modeling. A human impressionist, for example, can deliver a stunning performance by dramatizing the most salient feature of an intended style. Similarly, at least in theory, it should be possible for a text-to-speech system to successfully convey the impression of a style when a few distinctive prosodic features are properly modeled. However, to date, no such text-to-speech system has been able to achieve such a result in a flexible way.
SUMMARY OF THE INVENTION
In accordance with the present invention, a novel method and apparatus for synthesizing speech from text is provided, whereby the speech may be generated in a manner so as to effectively convey a particular, selectable style. In particular, repeated patterns of one or more prosodic features—such as, for example, pitch (also referred to herein as “f
0
”, the fundamental frequency of the speech waveform, since pitch is merely the perceptual effect of f
0
), amplitude, spectral tilt, and/or duration—occurring at characteristic locations in the synthesized speech, are advantageously used to convey a particular chosen style. In accordance with one illustrative embodiment of the present invention, for example, one or more of such feature patterns may be used to define a particular speaking style, and an illustrative text-to-speech system then makes use of such a defined style to adjust the specified parameter or parameters of the synthesized speech in a non-uniform manner (i.e., in accordance with the defined feature pattern or patterns).
More specifically, the present invention provides a method and apparatus for synthesizing a voice signal based on a predetermined voice control information stream (which, illustratively, may comprise text, annotated text, or a musical score), where the voice signal is selectively synthesized to have a particular desired prosodic style. In particular, the method and apparatus of the present invention comprises steps or means for analyzing the predetermined voice control information stream to identify one or more portions thereof for prosody control; selecting one or more prosody control templates based on the particular prosodic style which has been selected for the voice signal synthesis; applying the one or more selected prosody control templates to the one or more identified portions of the predetermined voice control information stream, thereby generating a stylized voice control information stream; and synthesizing the voice signal based on this stylized voice control information stream so that the synthesized voice signal advantageously has the particular desired prosodic style.


REFERENCES:
patent: 4692941 (1987-09-01), Jacks et al.
patent: 5615300 (1997-03-01), Hara et al.
patent: 5860064 (1999-01-01), Henton
patent: 6185533 (2001-02-01), Holm et al.
patent: 6260016 (2001-07-01), Holm et al.
patent: 6594631 (2003-07-01), Cho et al.
patent: 411143483 (1999-05-01), None
U.S. patent application Ser. No. 09/711,563, Shih et al., filed Nov. 13, 2000.
U.S. patent application Ser. No. 09/845,561, Kochanski et al., filed Apr. 30, 2001.
“A Singing Voice Synthesis System Based on Sinusoidal Modeling”, Macon, M.W., et al, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 435-438, 1997.
“Generating Pitch Accent Distributions That Show Individual and Stylistic Differences”, Cahn, J.E.; Third ESCA/COCOSDA Workshop on Speech Synthesis, Jenolan Caves, Blue Mountains, Australia, Nov. 26-29, 1998.
“Speaking Styles: Statistical Analysis and Synthesis by a Text-to-Speech System” by M. Abe, Progress in Speech Synthesis , Jan P.H. van Santen, et al., editors, Springer-Verlag New York, Inc., pp. 495-511, 1996.
“Effect of Speaking Style on Parameters of Fundamental Frequency Contour” by N. Higuchi, et al., Progress in Speech Synthesis , Jan P.H. van Santen, et al., editors, Springer-Verlag New York, Inc., pp. 417-429, 1996.
“A Quantitative Model of F0Generation and Alignment” by Jan P.H. van Santen, et al., Intonation Analysis, Modelling and Technology, Antonis Botinis, editor, Kluwer Academic Publishers, Boston., pp. 269-287, 2000.
“Suprasegmental and segmental timing models in Mandarin Chinese and American English” by Jan P.H. van Santen, et al., J. Acoustical Society of America 107(2), pp. 1012-1026, Feb., 2000.
Sable: A Standard For TTS Markup, by R. Sproat, et al., The 5thInternational Conference on Spoken Language Processing, Sydney Convention Centre, Sydney, Australia, 1998.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for controlling a speech synthesis... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for controlling a speech synthesis..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for controlling a speech synthesis... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3301981

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.