Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis
Reexamination Certificate
2000-01-11
2004-06-15
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Synthesis
C704S260000
Reexamination Certificate
active
06751592
ABSTRACT:
BACKGROUND OF THE INVENTION
This invention relates to a speech synthesizing apparatus for selecting and connecting speech segments to synthesize speech, on the basis of phonetic information to be subjected to speech synthesis, and also to a recording medium that stores a text-to-speech conversion program and can be read mechanically.
Attempts to make a computer recognize patterns or understand/express a natural language are now being executed. For example, a speech synthesizing apparatus is one means for producing speech by a computer, and can realize communication between computers and human beings.
Speech synthesizing apparatuses of this type have various speech output methods such as a waveform encoding method, a parameter expression method, etc. A rule-based synthesizing apparatus is a typical example which subdivides a sound into sound components, accumulates them and combines them into an optional sound.
Referring now to 
FIG. 1
, a conventional example of the rule-based synthesizing apparatus will be described.
FIG. 1
 is a block diagram illustrating the conventional rule-based synthesizing apparatus. This apparatus performs text-to-speech conversion (hereinafter referred to as “TTS”), in which input text data (hereinafter referred simply to as a “text”) is converted into a phonetic symbol string that consists of phoneme information (information concerning pronunciation) and prosodic information (information concerning the syntactic structure, lexical accent, etc. of a sentence), thereby creating speech from the phonetic symbol string. A TTS processing mechanism employed in the rule-based synthesizing apparatus of 
FIG. 1
 comprises a linguistic processing section 
32
 for analyzing the language of a text 
31
, and speech synthesizing section 
33
 for performing speech synthesizing processing on the basis of the output of the linguistic processing section 
32
.
For example, rule-based synthesis of Japanese is generally executed as follows:
First, in the linguistic processing section 
32
, morphological analysis in which a text (including Chinese characters and Japanese syllabaries) input from a text file 
31
 is dissected into morphemes, and then linguistic processing such as syntactic structure analysis is performed. After that, the linguistic processing section 
32
 determines the “type of accent” of each morpheme based on “phoneme information” and the position of the accent. Subsequently, the linguistic processing section 
32
 determines the “accent type” of each phrase that serves as a pause during vocalization (hereinafter refereed to as a “accent phrase”).
The text data processed by the linguistic processing section 
32
 is supplied to the speech synthesizing section 
33
.
In the speech synthesizing section 
33
, first, a phoneme duration determining/processing section 
34
 determines the duration of each phoneme included in the above “phoneme information”.
Subsequently, a phonetic parameter generating section 
36
 reads necessary speech segments from a speech segment storage 
35
 that stores a great number of pre-created speech segments, on the basis of the above “phoneme information”. The section 
36
 then connects the read speech segments while expanding and contracting them along the time axis, thereby generating a characteristic parameter series for to-be-synthesized speech.
Further, in the speech synthesizing section 
33
, a pitch pattern creating section 
37
 sets a point pitch on the basis of each accent type, thereby performing linear interpolation between each pair of adjacent ones of a plurality of set point pitches, to thereby create the accent components of pitch. Moreover, the pitch pattern creating section 
37
 creates a pitch pattern by superposing the accent component with a intonation component which represents a gradual lowering of pitch.
Finally, a synthesizing filter section 
38
 synthesizes desired speech by filtering.
In general, when a person speaks, he or she intentionally or unintentionally vocalizes a particular portion of the speech as to make it easier to hear than other portions. The particular portion indicates, for example, where a word which serves an important role to indicate the meaning of the speech is vocalized, where a certain word is vocalized for the first time in the speech, or where a word which is not familiar to the speaker or to the listener is vocalized. It also indicates that where a word is vocalized, if another word that has a similar pronunciation to the first-mentioned one exists in the speech, the listener may mistake the meaning of the word. On the other hand, at a portion of the speech other than the above, a person sometimes vocalizes a word in a manner which is not so easy to be heard, or which is rather ambiguous. This is because the listener will easily understand the word even if it is vocalized rather ambiguously.
However, the conventional speech synthesizing apparatus represented by the above-described rule-based synthesizing apparatus has only one type of speech segment with respect to one, and hence speech synthesis is always executed using speech segments that have the same degree of “intelligibility”. Accordingly, the conventional speech synthesizing apparatus cannot adjust the degree of the “intelligibility” of synthesized sounds. Therefore, if only speech segments that have an average degree of hearing easiness are used, it is difficult for the listener to hear them where the word should be vocalized in a manner easy to hear as aforementioned. On the other hand, if only speech segments that have a high degree of hearing easiness are used, all portions of all sentences are vocalized with clear pronunciation, which means that the listener does not hear smoothly synthesized sounds.
In addition, there exists another type of conventional speech synthesizing apparatus, in which a plurality of speech segments are prepared for one type of synthesis unit. However, it also has the above-described drawback since different speech segments are used for each type of synthesis unit in accordance with the phonetic or prosodic context, but irrespective of the adjustment of “intelligibility”.
BRIEF SUMMARY OF THE INVENTION
The present invention has been developed in light of the above, and is aimed at providing a speech synthesizing apparatus, in which a plurality of speech segments of different degrees of intelligibility for each type of unit are prepared, and are changed from one to another in the TTS processing in accordance with the state of vocalization, so that speech is synthesized in a manner in which the listener can easily hear it and does not tire even after hearing it for a long time. The invention is also aimed at providing a mechanically readable recording medium that stores a text-to-speech conversion program.
According to an aspect of the invention, there is provided a speech synthesizing apparatus comprising: text analyzing means for dissecting and analyzing text data, subjected to speech synthesis, into to-be-synthesized units and analyzing each to-be-synthesized unit, thereby obtaining a text analysis result; a speech segment dictionary that stores speech segments prepared for each of a plurality of ranks of intelligibility; determining means for determining in which rank a present degree of intelligibility is included, on the basis of the text analysis result; and synthesized-speech generating means for selecting speech segments stored in the speech segment dictionary and each included in a rank corresponding to the determined rank, and then connecting the speech segments to generate synthetic speech.
According to another aspect of the invention, there is provided a mechanically readable recording medium storing a text-to-speech conversion program for causing a computer to execute the steps of: dissecting text data, to be subjected to speech synthesis, into to-be-synthesized units, and analyzing the units to obtain a text analysis result; determining, on the basis of the text analysis result, a degree of intelligibility of each the to-be-synthesized unit; and selecting, on the basis of the dete
Finnegan Henderson Farabow Garrett & Dunner L.L.P.
Kabushiki Kaisha Toshiba
Opsasnick Michael N.
LandOfFree
Speech synthesizing apparatus, and recording medium that... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech synthesizing apparatus, and recording medium that..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech synthesizing apparatus, and recording medium that... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3309645