Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis
Reexamination Certificate
1998-07-13
2001-04-03
Hudspeth, David (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Synthesis
C704S260000, C704S270000
Reexamination Certificate
active
06212501
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to a speech synthesis apparatus for embedding, in an exemplary text segment including a fixed form portion having fixed contents and an unfixed form portion having varying contents, an arbitrary text segment which is specified by a user to the position of the unfixed form portion and generating synthesized speech of the exemplary text segment having the text segment embedded therein, and a method therefor.
In recent years, a variety of speech synthesis apparatuses for analyzing text in mixed Japanese letters and Chinese characters, synthesizing speech information of the text by synthesis by rule, and outputting voiced speech have been developed.
The basic arrangement of a speech synthesis apparatus of this type employing the synthesis-by-rule method is as follows. Speech utterances are analyzed in predetermined units, e.g., in units of CVs (consonant/vowel), CVCs (consonant/vowel/consonant), VCVs (vowel/consonant/vowel), or VCs (vowel/consonant) by LSP (line spectrum pair) analysis or cepstrum analysis to obtain phonetic information. The phonetic information is registered in a speech segment file. On the basis of this speech segment file and synthesis parameter (phonetic string and prosodic information) obtained upon analyzing text, voice source generation and synthesis filtering are performed to generate synthesized speech.
In text-to-speech synthesis by rule, a phonetic string and prosodic information are generated by analyzing text. Since both the phonetic string and the prosodic information are generated by rule, the resultant speech always has unnatural portions because of the imperfection of rule.
When text the sounds of which are to be produced is determined in advance, a technique called analysis synthesis is used. In this technique, the text is actually uttered by a person and analyzed to generate various parameters, and speech is synthesized using the parameters. Since a higher quality parameter than that in synthesis by rule can be used for speech synthesis, more natural speech can be synthesized.
In some application fields, it is required to change part of text using the synthesis-by-rule method and synthesize the remaining portion using a parameter generated by analysis. In this case, speech more natural than that obtained by synthesizing the full text by rule can be obtained while partially taking advantage of the flexibility of synthesis by rule.
In this prior art, however, even when speech is synthesized by rule using only text to be embedded as a synthesis-by-rule portion, and the resultant portion is concatenated to the remaining portion based on analysis, no natural concatenation can be obtained.
For example, for a sentence “Mr. Tanaka is waiting” (“/ta
a/ka/sa/ma/ga/o/ma/chi/de/go/za/i/ma/su/” in Japanese), “Mr. Tanaka” (“/ta
a/ka/sa/ma/ga/” in Japanese) is synthesized by rule, and “is waiting” (“/o/ma/chi/de/go/za/i/ma/su/” in Japanese) is synthesized on the basis of analysis. If “/ta
a/ka/sa/ma/ga/” is synthesized by rule without considering that “/o/ma/chi/de/go/za/i/ma/su/” follows the portion, the synthesized speech sounds as if the sentence ended at that portion (“/ta
a/ka/sa/ma/ga/”). When “/o/ma/chi/de/go/za/i/ma/su/” is spoken after that portion, unnatural speech is obtained.
BRIEF SUMMARY OF THE INVENTION
Accordingly, it is an object of the present invention to provide a speech synthesis apparatus which is used to change only part of text by synthesis by rule and synthesize the remaining portion using a synthesis parameter or speech waveform data generated by analysis, and at that time, allows natural synthesis by concatenating a synthesis-by-rule portion to an analysis synthesis portion without any sense of incongruous prosody, and a method therefor.
It is another object of the present invention to provide a speech synthesis apparatus which is used to change only part of text by synthesis by rule and synthesize the remaining portion using a synthesis parameter or speech waveform data generated by analysis, and at that time, allows natural synthesis by concatenating a synthesis-by-rule portion to an analysis synthesis portion without any sense of incongruous prosody even in a speech unit where the changeable portion (unfixed form portion) and the fixed form portion are produced without any pause, and a method therefor.
According to one aspect of the present invention, there is provided a speech synthesis apparatus comprising: means for storing, for each exemplary text segment containing a fixed form portion having a fixed text segment and an unfixed form portion on which an arbitrary text segment can be specified by a user, exemplary text segment data including context information representing a context around the unfixed form portion and parameter data obtained by analyzing a speech corresponding to the fixed form portion; means, responsive to an instruction by a user, for selecting data from among the exemplary text segment data and inputting a text segment corresponding to the unfixed form portion of the selected exemplary text segment data; means for generating parameter data of at least the unfixed form portion on the basis of the inputted text segment of the unfixed form portion and a corresponding context information; and means for concatenating the generated parameter data of the unfixed form portion to the stored parameter data of the fixed form portion, and generating synthesized speech from the concatenated parameter data.
In the apparatus, the parameter data obtained by analysis may be constituted by a phonetic string and prosodic information. The exemplary text segment data may further include positional information of the unfixed form portion in the exemplary text segment. A pitch of the unfixed form portion may be shifted to substantially equal to that of the fixed form portion on their concatenated point in generating the parameter data of the unfixed form portion or generating the synthesized speech. In a case where a pause period is provided between the unfixed form portion and the fixed form portion, the pause period may be adjusted in generating the synthesized speech.
According to another aspect of the present invention, there is provided a speech synthesis apparatus comprising: means for storing, for each exemplary text segment containing a fixed form portion having a fixed text segment and an unfixed form portion on which an arbitrary text segment can be specified by a user, exemplary text segment data including context information representing a context around the unfixed form portion and speech waveform data of the fixed form portion; means, responsive to an instruction by a user, for selecting data from among the exemplary text segment data and inputting a text segment corresponding to the unfixed form portion of the selected exemplary text segment data; means for generating parameter data of at least the unfixed form portion on the basis of the inputted text segment of the unfixed form portion and a corresponding context information, and generating synthesized speech from the generated parameter data; and means for concatenating speech waveform data of the generated synthesized speech of the unfixed form portion to the stored speech waveform data of the fixed form portion, and generating synthesized speech from the concatenated speech waveform data.
In the apparatus, the exemplary text segment data may further include positional information of the unfixed form portion in the exemplary text segment. A pitch of the unfixed form portion may be shifted to substantially equal to that of the fixed form portion on their concatenated point in generating the parameter data of the unfixed form portion or generating the synthesized speech. A waveform phase of the unfixed form portion may be adjusted to be substantially equal to that of the fixed form portion on their concatenated point in generating the synthesized speech.
According to another aspect of the present invention, there is provided a speech synthesis method comprising the steps of: providing a database for s
Finnegan Henderson Farabow Garrett & Dunner L.L.P.
Hudspeth David
Kabushiki Kaisha Toshiba
Wieland Susan
LandOfFree
Speech synthesis apparatus and method does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech synthesis apparatus and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech synthesis apparatus and method will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2529902