Speech synthesis apparatus and method

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Speech synthesis apparatus and method Speech synthesis apparatus and method

: 1998-07-13
: 2001-04-03
: Hudspeth, David (Department: 2641)
: Data processing: speech signal processing, linguistics, language
: Speech signal processing
: Synthesis

: C704S260000, C704S270000
: Reexamination Certificate
: active
: 06212501
: ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to a speech synthesis apparatus for embedding, in an exemplary text segment including a fixed form portion having fixed contents and an unfixed form portion having varying contents, an arbitrary text segment which is specified by a user to the position of the unfixed form portion and generating synthesized speech of the exemplary text segment having the text segment embedded therein, and a method therefor.
In recent years, a variety of speech synthesis apparatuses for analyzing text in mixed Japanese letters and Chinese characters, synthesizing speech information of the text by synthesis by rule, and outputting voiced speech have been developed.
The basic arrangement of a speech synthesis apparatus of this type employing the synthesis-by-rule method is as follows. Speech utterances are analyzed in predetermined units, e.g., in units of CVs (consonant/vowel), CVCs (consonant/vowel/consonant), VCVs (vowel/consonant/vowel), or VCs (vowel/consonant) by LSP (line spectrum pair) analysis or cepstrum analysis to obtain phonetic information. The phonetic information is registered in a speech segment file. On the basis of this speech segment file and synthesis parameter (phonetic string and prosodic information) obtained upon analyzing text, voice source generation and synthesis filtering are performed to generate synthesized speech.
In text-to-speech synthesis by rule, a phonetic string and prosodic information are generated by analyzing text. Since both the phonetic string and the prosodic information are generated by rule, the resultant speech always has unnatural portions because of the imperfection of rule.
When text the sounds of which are to be produced is determined in advance, a technique called analysis synthesis is used. In this technique, the text is actually uttered by a person and analyzed to generate various parameters, and speech is synthesized using the parameters. Since a higher quality parameter than that in synthesis by rule can be used for speech synthesis, more natural speech can be synthesized.
In some application fields, it is required to change part of text using the synthesis-by-rule method and synthesize the remaining portion using a parameter generated by analysis. In this case, speech more natural than that obtained by synthesizing the full text by rule can be obtained while partially taking advantage of the flexibility of synthesis by rule.
In this prior art, however, even when speech is synthesized by rule using only text to be embedded as a synthesis-by-rule portion, and the resultant portion is concatenated to the remaining portion based on analysis, no natural concatenation can be obtained.
For example, for a sentence “Mr. Tanaka is waiting” (“/ta
a/ka/sa/ma/ga/o/ma/chi/de/go/za/i/ma/su/” in Japanese), “Mr. Tanaka” (“/ta
a/ka/sa/ma/ga/” in Japanese) is synthesized by rule, and “is waiting” (“/o/ma/chi/de/go/za/i/ma/su/” in Japanese) is synthesized on the basis of analysis. If “/ta
a/ka/sa/ma/ga/” is synthesized by rule without considering that “/o/ma/chi/de/go/za/i/ma/su/” follows the portion, the synthesized speech sounds as if the sentence ended at that portion (“/ta
a/ka/sa/ma/ga/”). When “/o/ma/chi/de/go/za/i/ma/su/” is spoken after that portion, unnatural speech is obtained.
BRIEF SUMMARY OF THE INVENTION
Accordingly, it is an object of the present invention to provide a speech synthesis apparatus which is used to change only part of text by synthesis by rule and synthesize the remaining portion using a synthesis parameter or speech waveform data generated by analysis, and at that time, allows natural synthesis by concatenating a synthesis-by-rule portion to an analysis synthesis portion without any sense of incongruous prosody, and a method therefor.
It is another object of the present invention to provide a speech synthesis apparatus which is used to change only part of text by synthesis by rule and synthesize the remaining portion using a synthesis parameter or speech waveform data generated by analysis, and at that time, allows natural synthesis by concatenating a synthesis-by-rule portion to an analysis synthesis portion without any sense of incongruous prosody even in a speech unit where the changeable portion (unfixed form portion) and the fixed form portion are produced without any pause, and a method therefor.
According to one aspect of the present invention, there is provided a speech synthesis apparatus comprising: means for storing, for each exemplary text segment containing a fixed form portion having a fixed text segment and an unfixed form portion on which an arbitrary text segment can be specified by a user, exemplary text segment data including context information representing a context around the unfixed form portion and parameter data obtained by analyzing a speech corresponding to the fixed form portion; means, responsive to an instruction by a user, for selecting data from among the exemplary text segment data and inputting a text segment corresponding to the unfixed form portion of the selected exemplary text segment data; means for generating parameter data of at least the unfixed form portion on the basis of the inputted text segment of the unfixed form portion and a corresponding context information; and means for concatenating the generated parameter data of the unfixed form portion to the stored parameter data of the fixed form portion, and generating synthesized speech from the concatenated parameter data.
In the apparatus, the parameter data obtained by analysis may be constituted by a phonetic string and prosodic information. The exemplary text segment data may further include positional information of the unfixed form portion in the exemplary text segment. A pitch of the unfixed form portion may be shifted to substantially equal to that of the fixed form portion on their concatenated point in generating the parameter data of the unfixed form portion or generating the synthesized speech. In a case where a pause period is provided between the unfixed form portion and the fixed form portion, the pause period may be adjusted in generating the synthesized speech.
According to another aspect of the present invention, there is provided a speech synthesis apparatus comprising: means for storing, for each exemplary text segment containing a fixed form portion having a fixed text segment and an unfixed form portion on which an arbitrary text segment can be specified by a user, exemplary text segment data including context information representing a context around the unfixed form portion and speech waveform data of the fixed form portion; means, responsive to an instruction by a user, for selecting data from among the exemplary text segment data and inputting a text segment corresponding to the unfixed form portion of the selected exemplary text segment data; means for generating parameter data of at least the unfixed form portion on the basis of the inputted text segment of the unfixed form portion and a corresponding context information, and generating synthesized speech from the generated parameter data; and means for concatenating speech waveform data of the generated synthesized speech of the unfixed form portion to the stored speech waveform data of the fixed form portion, and generating synthesized speech from the concatenated speech waveform data.
In the apparatus, the exemplary text segment data may further include positional information of the unfixed form portion in the exemplary text segment. A pitch of the unfixed form portion may be shifted to substantially equal to that of the fixed form portion on their concatenated point in generating the parameter data of the unfixed form portion or generating the synthesized speech. A waveform phase of the unfixed form portion may be adjusted to be substantially equal to that of the fixed form portion on their concatenated point in generating the synthesized speech.
According to another aspect of the present invention, there is provided a speech synthesis method comprising the steps of: providing a database for s

Affiliated with

Kaseno Osamu

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Finnegan Henderson Farabow Garrett & Dunner L.L.P.

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Hudspeth David

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Kabushiki Kaisha Toshiba

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Wieland Susan

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech synthesis apparatus and method does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech synthesis apparatus and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech synthesis apparatus and method will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2529902

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure