Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
1994-01-19
2001-12-18
Knepper, David D. (Department: 2645)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
C704S258000
Reexamination Certificate
active
06332123
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to a method for synthesizing a picture through digital processing, and more particularly, to a system for synthesizing a (still or moving) picture of a face which represents changes in the shape of mouth accompanying the production of a speech output.
When a main utters a vocal sound, vocal information is produced by an articulator, and at the same time, his mouth moves as he utters (i.e, changes in the shape of the mouth in outward appearance). A method, which converts a sentence input as an input text to speech information and outputs it, is called a speech synthesis, and this method has achieved a fair success. In contrast thereto, few reports have been published on a method for producing a picture of a face which has mouth-shape variations in correspondence to an input sentence, except the following report by Kiyotoshi Matsuoka and Kenji Kurose.
The method proposed by Matsuoka and Kurose is disclosed in a published paper [kiyotoshi Matluoka and Kenji Kurose: “A moving picture program for a training in speech reading for the deaf,” Journal of the Institute of Electronic Information and Communication Engineers of Japan, Vol. J70-D, No. 11, pp. 2167-2171 (November 1987)]
Besides, there has also been reported, as a related prior art, a method for presuming mouth-shape variations corresponding to an input text. This method is disclosed in a published paper [Shigeo Morishima, Kiyoharu Aizawa and Hiroshi Hara: “Studies of automatic synthesis of expressions on the basis of speech information,” 4TH NICOGRAPH article contest, Collection of Articles, pp. 139-146, Nihon computer Graphics Association (November 1988)]. This article proposes a method which calculates the logarithmic mean power of input speech information and controls the opening of the mouth accordingly and a method which calculates a linear prediction coefficient corresponding to the formant characteristic of the vocal tract and presumes the mouth shape.
The method by Matsuoka and Kurose has been described above as a conventional method for producing pictures of a face which have mouth-shape variations corresponding to a sentence (an input text) being input, but this method poses such problems as follows: Although a vocal sound and the mouth shape are closely related to each other in utterance, the method basically syllabicates the sentence and selects mouth-shape patterns on the basis of the correspondence in terms of characters, and consequently, the correlation between the speech generating mechanism and the mouth-shape generation is insufficient. This introduces difficulty in producing the mouth shape correctly in correspondence to the speech output. Further, although a phoneme (a minimum unit in utterance, a syllable being composed of a plurality of phonemes) differs in duration in accordance with the connection between it and the preceding and following phonemes, the method by Matsuoka and Kurose fixedly assigns four frames to each syllable, and consequently, it is difficult to represent natural mouth-shape variations in correspondence to the input sentence. Moreover, in the case of outputting the sound and the mouth-shape picture in response to the sentence being input, it is difficult to match them with each other.
The method proposed by Morishima, Aizawa and Harashima is to presume the mouth shape on the basis of input speech information, and hence cannot be applied to the production of a moving picture which has mouth-shape variations corresponding to the input sentence.
SUMMARY OF THE INVENTION
In view of the above, an object of the present invention is to provide picture synthesizing method and apparatus which permit the representation of mouth-shape variations, which correspond accurately to speech outputs and agree with the durations of phonemes.
According to an aspect of the present invention, the picture synthesizing method for synthesizing a moving picture of a person's face which has mouth-shape variations in case of reading an input sentence of a train of characters,
comprising the steps of:
developing from the input sentence of a train of characters a train of phonemes, by utilizing a speech synthesis technique outputting, for each phoneme, a corresponding vocal sound feature including articulation mode and its duration of each corresponding phoneme of the train of phonemes;
determining for each phoneme a mouth-shaped feature corresponding to each phoneme on the basis of the corresponding vocal sound feature, said mouth-shape feature including the degree of opening of the mouth, the degree of roundness of lips, the height of the lower jaw in a raised or lowered position, and the degree to which the tongue is seen,
determining values of mouth-shape parameters, for each phoneme, for representing a concrete mouth-shape on the basis of the mouth-shape feature; and
controlling the values of the mouth-shape parameters, for each phoneme, for each frame of the moving picture in is accordance with the duration of each phoneme, thereby synthesizing the moving picture having mouth-shape variations matched with a speech output audible in case of reading the input sentence of a train of characters.
According to another aspect of the present invention, the picture synthesizing apparatus comprising:
an input terminal for receiving an input sentence of a train of characters;
a speech synthesizer for developing from the input sentence a train of characters a train of phonemes, by utilizing a speech synthesis technique and outputting, for each phoneme, a corresponding vocal sound feature including articulation mode and its duration of each corresponding phoneme of the train of phonemes;
a converter for converting the corresponding vocal sound feature for each corresponding phoneme into a mouth-shape feature including the degree of opening the mouth, the degree of roundness of lips, the height of the lower jaw in a raised or lowered position, and the degree to which the tongue is seen;
means for defining a conversion table having established correspondence between various mouth-shape features and mouth-shape parameters for representing concrete mouth-shape;
means for obtaining from the conversion table mouth-shape parameters each corresponding to an individual mouth-shape feature for each phoneme provided by the converter;
a time adjuster having an output whereby values of the mouth-shape parameters from said means for obtaining are controlled in accordance with the duration of each corresponding phoneme from the speech synthesizer for producing a moving picture as a train of pictures spaced apart for a fixed period of time; and
a picture generator for generating the moving picture having mouth-shape variations matched with a speech output audible in case of reading the input sentence of a train of characters in accordance with the values of the mouth-shape parameters from said means for obtaining mouth-shape parameters under control of the time adjuster.
REFERENCES:
patent: 3364382 (1968-01-01), Harrison, III
patent: 3662374 (1972-05-01), Harrison, III et al.
patent: 4653100 (1987-03-01), Barnett et al.
patent: 4884972 (1989-12-01), Gasper
patent: 5057940 (1991-10-01), Murakami et al.
patent: 5111409 (1992-05-01), Gasper et al.
patent: 5278943 (1994-01-01), Gasper et al.
Hatori Yoshinori
Higuchi Norio
Kaneko Masahide
Koike Atsushi
Yamamoto Seiichi
Knepper David D.
Kokusai Denshin Denwa Kabushiki Kaisha
Siegel Lackenbach
LandOfFree
Mouth shape synthesizing does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Mouth shape synthesizing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Mouth shape synthesizing will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2588816