Speech synthesis employing prosody templates

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S258000, C704S200000, C704S200100

Reexamination Certificate

active

06260016

ABSTRACT:

BACKGROUND AND SUMMARY OF THE INVENTION
The present invention relates generally to text-to-speech (tts) systems and speech synthesis. More particularly, the invention relates to a system for providing more natural sounding prosody through the use of prosody templates.
The task of generating natural human-sounding prosody for text-to-speech and speech synthesis has historically been one of the most challenging problems that researchers and developers have had to face. Text-to-speech systems have in general become infamous for their “robotic” intonations. To address this problem some prior systems have used neural networks and vector clustering algorithms in an attempt to simulate natural sounding prosody. Aside from being only marginally successful, these “black box” computational techniques give the developer no feedback regarding what the crucial parameters are for natural sounding prosody.
The present invention takes a different approach, in which samples of actual human speech are used to develop prosody templates. The templates define a relationship between syllabic stress patterns and certain prosodic variables such as intonation (F
0
) and duration. Thus, unlike prior algorithmic approaches, the invention uses naturally occurring lexical and acoustic attributes (e.g., stress pattern, number of syllables, intonation, duration) that can be directly observed and understood by the researcher or developer.
The presently preferred implementation stores the prosody templates in a database that is accessed by specifying the number of syllables and stress pattern associated with a given word. A word dictionary is provided to supply the system with the requisite information concerning number of syllables and stress patterns. The text processor generates phonemic representations of input words, using the word dictionary to identify the stress pattern of the input words. A prosody module then accesses the database of templates, using the number of syllables and stress pattern information to access the database. A prosody module for the given word is then obtained from the database and used to supply prosody information to the sound generation module that generates synthesized speech based on the phonemic representation and the prosody information.
The presently preferred implementation focuses on speech at the word level. Words are subdivided into syllables and thus represent the basic unit of prosody. The preferred system assumes that the stress pattern defined by the syllables determines the most perceptually important characteristics of both intonation (F
0
) and duration. At this level of granularity, the template set is quite small in size and easily implemented in text-to-speech and speech synthesis systems. While a word level prosodic analysis using syllables is presently preferred, the prosody template techniques of the invention can be used in systems exhibiting other levels of granularity. For example, the template set can be expanded to allow for more feature determiners, both at the syllable and word level. In this regard, microscopic F
0
perturbations caused by consonant type, voicing, intrinsic pitch of vowels and segmental structure in a syllable can be used as attributes with which to categorize certain prosodic patterns. In addition, the techniques can be extended beyond the word level F
0
contours and duration patterns to phrase-level and sentence-level analyses.
For a more complete understanding of the invention, its objectives and advantages, refer to the following specification and to the accompanying drawings.


REFERENCES:
patent: 5384893 (1995-01-01), Hutchins
patent: 5592585 (1997-01-01), Van Coile et al.
patent: 5636325 (1997-06-01), Farrett
patent: 5642520 (1997-06-01), Takeshita et al.
patent: 5652828 (1997-07-01), Silverman
patent: 5696879 (1997-12-01), Cline et al.
patent: 5704009 (1997-12-01), Cline et al.
patent: 5727120 (1998-03-01), Van Coile et al.
patent: 5729694 (1998-03-01), Holzrichter et al.
patent: 5732395 (1998-03-01), Silverman
patent: 5749071 (1998-05-01), Silverman
patent: 5751906 (1998-05-01), Silverman
patent: 5796916 (1998-08-01), Meredith
patent: 5850629 (1998-12-01), Holm et al.
patent: 5878393 (1999-03-01), Hata et al.
patent: 5905972 (1999-05-01), Huang et al.
patent: 5924068 (1999-07-01), Richard et al.
patent: 5966691 (1999-10-01), Kibre et al.
patent: 0 833 304 A2 (1998-04-01), None
patent: 0 833 304 A3 (1999-03-01), None
Chung-Hsien Wu and Jau-Hung Chen, “Template-Driven Generation of Prosodic Information for Chinese Concatenative Synthesis,” 1999 IEEE Publication, pp. 65-68.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech synthesis employing prosody templates does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech synthesis employing prosody templates, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech synthesis employing prosody templates will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2558059

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.