Clustered patterns for text-to-speech synthesis

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S260000, C704S245000

Reexamination Certificate

active

06529874

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to a speech information processing apparatus and a method to generate a natural pitch pattern used for text-to-speech synthesis.
BACKGROUND OF THE INVENTION
Text-to-synthesis represents the artificial generation of a speech signal from an arbitrary sentence. An ordinary text-to-speech system consists of a language processing section, a control parameter generation section, and a speech signal generation section. The language processing section executes morpheme analysis and syntax analysis for an input text. The control parameter generation section processes accent and intonation, and outputs phoneme signs, pitch pattern, and the duration of phoneme. The speech signal generation section synthesizes the speech signal.
In the text-to-speech system, an element related to the naturalness of synthesized speech is the prosody processing of the control parameter generation section. In particular, pitch pattern influences the naturalness of synthesized speech. In known text-to-speech systems, pitch pattern is generated by a simple model. Accordingly, the synthesized speech is generated as mechanical speech whose intonation is unnatural.
Recently, a method to generate the pitch pattern by using a pitch pattern extracted from natural speech has been considered. For example, in Japanese Patent Disclosure (Kokai) “PH6-236197”, unit patterns extracted from the pitch pattern of natural speech or vector-quantized unit patterns are previously memorized. The unit pattern is retrieved from a memory by input attribute or input language information. By locating and transforming the retrieved unit pattern on a time axis, the pitch pattern is generated.
In the above-mentioned text-to-speech synthesis, it is impossible to store the unit patterns suitable for all input attributes or all input language informations. Therefore, transformation of the unit pattern is necessary. For example, elasticity of the unit pattern in proportion to the duration is necessary. However, even if the unit pattern is extracted from the pitch pattern of the natural speech, the naturalness of the synthesized speech falls because of this transformation processing.
SUMMARY OF THE INVENTION
It is one object of the present invention to provide a speech information processing apparatus and a method to improve the naturalness of synthesized speech in text-to-speech synthesis.
The above and other objects are achieved according to the present invention by providing a novel apparatus, method and computer program product for generating clustered patterns for text-to-speech synthesis. In the apparatus, a representative pattern memory stores a plurality of initial representative patterns as a noise pattern. Different attribute is previously affixed to each initial representative pattern. A pitch pattern memory stores a large number of natural pitch patterns as an accent phrase. A clustering unit classifies each natural pitch pattern to the initial representative pattern based on the attribute of the accent phrase. A transformation parameter generation unit evaluates an error between a transformed representative pattern and each natural pitch pattern classified to the initial representative pattern, and generates a transformation parameter for each natural pitch pattern based on the evaluation result. A representative pattern generation unit calculates an evaluation function of the sum of the error between the transformed representative pattern an each natural pitch pattern classified to the initial representative pattern, and updates each initial representative pattern based on a result of the evaluation function. The representative pattern memory stores each updated representative pattern as a clustered pattern of the attribute affixed to the corresponding initial representative pattern.


REFERENCES:
patent: 4696042 (1987-09-01), Goudie
patent: 5384893 (1995-01-01), Hutchins
patent: 5682501 (1997-10-01), Sharman
patent: 5740320 (1998-04-01), Itoh
patent: 5832434 (1998-11-01), Meredith
patent: 5913193 (1999-06-01), Huang et al.
patent: 5913194 (1999-06-01), Karaali et al.
patent: 5949961 (1999-09-01), Sharman
patent: 5970453 (1999-10-01), Sharman
patent: 6138089 (2000-10-01), Guberman
patent: 6240384 (2001-05-01), Kagoshima et al.
X. Huang, et al., “Recent Improvements on Microsoft's Trainable Text-to-Speech System—Whistler”, Proc. of ICASSP97, Apr. 1997, pp. 959-962.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Clustered patterns for text-to-speech synthesis does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Clustered patterns for text-to-speech synthesis, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Clustered patterns for text-to-speech synthesis will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3020084

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.