Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis
Reexamination Certificate
1999-03-01
2002-11-05
Knepper, David D. (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Synthesis
C704S260000
Reexamination Certificate
active
06477495
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to synthesizing speech from text. In particular, the invention relates to prosodic control which controls intonation and duration of a sentence.
2. Description of the Related Art
In general, text to speech synthesis is performed by the following procedure. First, text to be synthesized is inputted and intermediate phonetic symbol sequences are produced. Then, prosodic parameters and vocal tract transfer functions are acquired on the basis of the intermediate phonetic symbol sequences. The prosodic parameter may be a fundamental frequency pattern or the duration of a phoneme. Synthetic speech is subsequently obtained by use of these parameters. For instance, a speech synthesis system is described in Keikichi Hirose, “Speech Synthesis Technology”, Speech Processing Technology and its Applications, Information Processing, pages 984-991 (November 1997).
If the procedure described above is used, the prosodic parameters determine naturalness relating to intonation, rhythm and smoothness of the speech and the vocal tract transfer functions determine the intelligibility of individual syllables that make up a word or a sentence.
Among the prosodic parameters, the “added-type model” is a typical model for generating fundamental frequency pattern parameters. The generation model of this fundamental frequency pattern adds a rising or falling accent component to the fundamental frequency, e.g. corresponding to an accent type for a sentence syllable to a phrase component where a fundamental frequency goes down smoothly in response to a phrase. Although the added-type model is easy to be understood intuitively and matches with an actual speech phenomenon because this model imitates a human vocalization structure, there is a problem that sophisticated language processing is required to make this model work.
The duration of a phoneme as a prosodic parameter, depends on the context in which the phoneme is placed, ie. the context of the syllable. There are many factors which affect the duration of the phoneme such as modulation constraints, timing, importance of a word, indication of speech boundaries, tempo within speech areas, and syntactical meaning. Statistical analysis is typically performed against actual measurements of duration time data in order to determine the degree to which each of these factors affects duration, and the rules thus obtained are applied. However, maintaining the large-scale database that is needed to construct duration modules in a variety of contexts is a problem.
Apart from these prosodic parameters, there have been proposals for a variety of control modes for power-related parameters. However, all of these models are prosodic parameter independent models, and there is a natural limit to the extent to which the performance of these independent control models can be improved. It has been pointed out that the modeling of sentence speech according to rules is difficult according to a prosodic phenomenon.
The creation of a database built from prosodic parameters selected from natural speech has been proposed. The database would be used by a prosodic parameter model to calculate prosodic parameters, as proposed, for instance, in Katae et al, “A Domain Specific Text-to-Speech System Using a Prosody Database Retrieved with a Sentence Structure”, Studies in Sound, pages 275-276 (March 1996); or in Saito et al, “A Rule-Based Speech Synthesis Method Using Fuzokugo-Sequence Unit”, Studies in Sound, pages 317-319 (June 1998). However, these publications introduce only the fundamental frequency pattern as a prosodic parameter and are insufficient for improving the naturalness of sentence speech (speaking in sentences).
SUMMARY OF THE INVENTION
The present invention relates to a speech synthesis system for synthesizing an improved speech having a natural characteristic by editing and processing each prosodic parameter (fundamental frequency pattern, the duration of phoneme, etc.) of natural speech.
The present invention provides a text speech synthesis system for synthesizing a speech having an improved natural characteristic as compared with the conventional method by: providing a speech corpus that includes a speech sentence, prosodic parameters of the speech sentence and morphological element/structured sentence analysis data; abstracting data wherein a similarity degree with an input sentence becomes largest by searching the speech corpus; creating and correcting prosodic parameters for the abstracted data; and thereby producing prosodic parameters to be used in the synthesizing.
REFERENCES:
patent: 4771385 (1988-09-01), Egami et al.
patent: 4931936 (1990-06-01), Kugimiya et al.
patent: 5475796 (1995-12-01), Iwata
patent: 5633984 (1997-05-01), Aso et al.
patent: 5842167 (1998-11-01), Miyatake et al.
patent: 5845047 (1998-12-01), Fukada et al.
patent: 6035272 (2000-03-01), Nishimura et al.
Ando Haru
Fujita Keiko
Kitahara Yoshinori
Nukaga Nobuo
Yajima Shunichi
Hitachi , Ltd.
Knepper David D.
Mattingly Stanger & Malur, P.C.
LandOfFree
Speech synthesis system and prosodic control method in the... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech synthesis system and prosodic control method in the..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech synthesis system and prosodic control method in the... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2929018