Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis
Reexamination Certificate
1999-11-02
2002-12-17
{haeck over (S)}mits, T{overscore (a)}livaldis Ivars (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Synthesis
C704S267000
Reexamination Certificate
active
06496801
ABSTRACT:
BACKGROUND AND SUMMARY OF THE INVENTION
The present invention relates generally to speech synthesis and, more particularly, to producing naturally computer-generated speech by identifying and applying speech patterns in a voice dialog scenario.
In a typical voice dialog scenario, the structure of the spoken messages is fairly well defined. Typically, the message consists of a fixed portion and a variable portion. For example, in a vehicle speech synthesis system, a spoken message may comprise the sentence “Turn left of on Mason Street.” The spoken message consists of a fixed or carrier portion and a variable or slot portion. In this example, “Turn left on ______” defines the fixed or carrier portion, and the name of the street “Mason Street” defines the variable or slot portion. As the identifier implies, the speech synthesis system may change the variable portion so that the speech synthesis system can direct a driver to follow directions involving multiple streets or highways.
Existing speech synthesis systems typically handle the insertion of the variable portion into the fixed portion rather poorly, creating a rather choppy and unnatural speech pattern. One approach to improving the quality for generating voice dialog can be found with reference to U.S. Pat. No. 5,727,120 (Van Coile), issued Mar. 10, 1998. The Van Coil patent receives a message frame having a fixed and variable portion and generates a markup for the entire message frame. The entirety of the message frame is broken down to phonemes, and necessarily requires a uniform presentation of the message frame. In the speech markup of an enriched phonetic transcription formulated with the phonemes, the control parameters are provided at the phoneme level. Such a markup does not guarantee optimal acoustic sound unit selection when rebuilding the message frame. Further, the pitch and duration of the message frame, known as the prosody, is selected for the entire message frame, rather than the individual fixed and variable portions. Such a message frame construction renders building the frame inflexible, as the prosody of the message frame remains fixed. Further, it is desirable to change the prosody of the variable portion of a given message frame.
The present invention takes a different, more flexible approach in building the fixed and variable portions of the message frame. The acoustic portion of each of the fixed and variable portions is constructed with predetermined set of acoustic sound units. A number of prosodic templates are stored in a prosodic template database, so that one or a number of prosodic templates can be applied to a particular fixed and variable portion of the message frame. This provides great flexibility in building the message frames. For example, one, two, or even more prosodic templates can be generated for association with each fixed and variable portion, thereby providing various inflections in the spoken message. Further, the prosodic templates for the fixed portion and variable portion can thus be generated separately, providing greater flexibility in building a library database of spoken messages. For example, the acoustic and prosodic fixed portion can be generated at the phoneme, word, or sentence level, or simply be pre-recorded. Similarly, templates for the variable portion may be generated at the phoneme, word, phrase level, or simply be pre-recorded. The different fixed and variable portions of the message frame are concatenated to define a unified acoustic template and a unified prosodic template.
For a more complete understanding of the invention, its objects and advantages, reference should be made to the following specification and to the accompanying drawings.
REFERENCES:
patent: 5727120 (1998-03-01), Van Coile et al.
patent: 5905972 (1999-05-01), Huang et al.
patent: 6052664 (2000-04-01), Van Coile et al.
patent: 6175821 (2001-01-01), Page et al.
patent: 6185533 (2001-02-01), Holm et al.
patent: 6260016 (2001-07-01), Holm et al.
Junqua Jean-Claude
Pearson Steve
Veprek Peter
Harness Dickey & Pierce PLC
Matsushita Electric - Industrial Co., Ltd.
{haeck over (S)}mits T{overscore (a)}livaldis Ivars
LandOfFree
Speech synthesis employing concatenated prosodic and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech synthesis employing concatenated prosodic and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech synthesis employing concatenated prosodic and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2981304