Speech synthesis based on cricothyroid and cricoid modeling

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S268000, C704S266000, C704S260000

Reexamination Certificate

active

06317713

ABSTRACT:

FIELD OF THE INVENTION
This invention relates to speech synthesis and speech analysis and more particularly to a sound source generator, speech synthesizer, and speech synthesizing system and method having improved versatility and precision of sound source generation.
BACKGROUND
The production of speech consists of a combination of three elements: generation of a sound source, articulation by the vocal tract, and radiation from the lips and nostrils. By simplifying these elements and separating sound source and articulation, a generation model of speech waveform can be represented.
Generally, speech has two characteristics. One, relating to articulation, is the phonemic characteristic, which is mainly shown in the change patterns of the spectrum envelope of the sound. The other, relating to the sound source, is the prosody characteristic, which is mainly shown in the fundamental frequency patterns of the sound.
In speech synthesis based on text data, the required information for synthesizing the phonemic characteristic can be obtained from the text data by using morphological analysis. In contrast, the waveform of fundamental frequency required for synthesizing the prosody characteristic is not shown in the text data. Therefore, this waveform must be obtained according to the accent pattern of a word, the syntax of a sentence, the discourse structure of sentences, and so on.
The Fujisaki model is one of the well-known models for generation of fundamental frequency. A focus of this model is that the contour of fundamental frequency will remain nearly constant, regardless of the overall fundamental frequency, when the pattern of time curves of fundamental frequency is expressed with a logarithm. Further, the model assumes that the fundamental frequency pattern actually observed is represented by the sum of the phrase component, which moderately falls from the beginning through the end of the phrase, and the accent component, which indicates the accent on each word. From this assumption, both components are approximated by a second-order critical damping linear system response against the impulse phrase command, and a step accent command.
As described above, based on the word's accent pattern, the syntax of a sentence, and the discourse structure of sentences, the phrase command and the accent command are calculated, for which fundamental frequency can then be determined.
However, the above model for the generation of fundamental frequency has the problem that the fundamental frequency cannot be controlled more precisely, because only rise in fundamental frequency is taken into consideration. In other words, there is a limitation in adding a various expression into synthesized speech sound. Another problem is that the phrase command and the accent command can uncertainly be obtained when analyzing the observed fundamental frequency pattern.
Another problem is that a time lag occurs between the timing of designating the phrase command and the timing when the phrase component actually appears because the response of a second-order critical damping linear system against the impulsive phrase command is regarded as a phrase component.
SUMMARY OF THE INVENTION
An object of the present invention is to provide speech synthesis and sound source generation capable of solving the problems of the prior art and capable of adding various expressions, and further to provide speech analysis capable of analyzing fundamental frequency precisely.
The sound source generation device is characterized in that the device comprises: calculating component for sound source generating parameters for outputting fundamental frequency at least as sound source generating parameters, upon receiving the command concerning prosody and according to the said command, and sound source generating component for generating sound source upon receiving sound source generating parameters from calculating component for sound source generating parameters and according to the said sound source generating parameters, wherein not only the accent command but also the descent command are given for calculating fundamental frequency, and calculating component for sound source generating parameters calculates sound source generating parameters according to the accent command and the descent command.
The sound source generation device is further characterized in that the rhythm command is further given for calculating fundamental frequency and calculating component for sound source generating parameters calculates sound source generating parameters according to the accent command, the descent command, and the rhythm command.
The sound source generation device is further characterized in that the rhythm command is represented with a sine wave.
The sound source generation device is further characterized by controlling the characteristic of the generated sound source by means of controlling the amplitude and cycle of a sine wave.
The speech synthesis device is further characterized in that the device comprises: character string analyzing component for analyzing a given character string and generating the command concerning phoneme and the command concerning prosody, calculating component for sound source generating parameters for outputting fundamental frequency as sound source generation parameters at least, upon receiving the command concerning prosody generated by character string analyzing component and according to the said command, sound source generating component for generating sound source, upon receiving sound source generating parameters from calculating component for sound source generating parameters and according to the said sound source generation parameters, and articulation component for articulating sound source from sound source generating component according to the command concerning phoneme received from character string analyzing component, wherein character string analyzing component described above generates not only the accent command but also the descent command as the command concerning prosody, and calculating component for sound source generating parameters described above calculates fundamental frequency according to the accent command and the descent command.
The speech synthesis device is further characterized in that character string analyzing component further generates the rhythm command as the command concerning prosody and calculating component for sound source generating parameters calculates fundamental frequency according to the accent command, the descent command and the rhythm command.
The speech synthesis device is further characterized in that calculating component for sound source generating parameters generates the rhythm command as a sine wave.
The speech synthesis device is further characterized in that calculating component for sound source generating parameters controls the characteristic of synthesized speech sound generated, by means of controlling the amplitude and cycle of the said sine wave.
The speech processing method is further characterized by adopting not only the accent command but also the descent command as elements for controlling fundamental frequency in any speech processing method using fundamental frequency as parameters at least. The term “speech processing” here refers to operations in any way to process speech, characteristic concerning speech sound and parameters, including speech synthesis, sound source generation, speech analysis, and fundamental frequency generation therefor.
The speech processing method is further characterized by further adopting the rhythm command as elements for controlling fundamental frequency.
The speech analyzing method is further characterized by carrying out analysis using not only the accent command but also the descent command as elements for analyzing fundamental frequency.
The speech analyzing method is further characterized by further adopting the rhythm command as elements for analyzing fundamental frequency.
The storing medium is a computer-readable storing medium for storing programs which are executable by using a compute

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech synthesis based on cricothyroid and cricoid modeling does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech synthesis based on cricothyroid and cricoid modeling, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech synthesis based on cricothyroid and cricoid modeling will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2617953

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.