Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis
Reexamination Certificate
2000-07-20
2003-01-28
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Synthesis
C704S263000, C704S265000
Reexamination Certificate
active
06513007
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a synthesized sound generating apparatus and method which is suitable for inputting and synthesizing voices and instrumental sounds and outputting synthesized instrumental sounds or the like having characteristic information on the voices.
2. Prior Art
Vocoders, which have a function for analyzing and synthesizing voices, are commonly used with music synthesizers due to their ability to onomatopoeically generate instrumental sounds, noise, or the like. Major known developed vocoders include formant vocoders, linear predictive analysis and synthesis systems (PARCO analysis and synthesis), cepstrum vocoders (speech synthesis based on homomorphic filtering), channel vocoders (what is called Dudley vocoders), and the like.
The formant vocoder uses a terminal analog synthesizer to carry out sound synthesis based on parameters for vocal tract characteristics determined from a formant and an anti-formant of a spectral envelope, that is, pole and zero points thereof. The terminal analog synthesizer is comprised of a plurality of resonance circuits and antiresonance circuits arranged in cascade connection for simulating resonance/antiresonance characteristics of a vocal tract. The linear predictive analysis and synthesis system is an extension of the predictive encoding method, which is most popular among the speech synthesis methods. The PARCO analysis and synthesis system is an improved version of the linear predictive analysis and synthesis system. The cepstrum vocoder is a speech synthesis system using a logarithmic amplitude characteristic of a filter and inverse Fourier transformation and inverse convolution of a logarithmic spectrum of a sound source.
The channel vocoder uses bandpass filters
10
-
1
to
10
-N for different bands to extract spectral envelope information on an input speech signal, that is, parameters for the vocal tract characteristics, as shown in
FIG. 1
, for example. On the other hand, a pulse train generator
21
and a noise generator
22
generate two kinds of sound source signals, which are amplitude-modulated using the spectral envelope parameters. This amplitude modulation is carried out by multipliers (modulators)
30
-
1
to
30
-N. Modulated signals output from the multipliers (modulators)
30
-
1
to
30
-N pass through bandpass filters
40
-
1
to
40
-N and are then added together by an adder
50
whereby a synthesized speech signal is generated and output.
In the example of the channel vocoder disclosed in Japanese Laid-Open Patent Publication (Kokai) No. 05-204397, outputs from the bandpass filters
10
-
1
to
10
-N are rectified and smoothed when passing through short-time average-amplitude detection circuits
60
-
1
to
60
-N. A voice sound/unvoiced sound detector
71
determines a voice sound component and an unvoiced sound component of the input speech signal, and upon detecting the voice sound component, the detector
71
operates a switch
23
so as to select and deliver an output (pulse train) from the pulse train generator
21
to the multipliers
30
-
1
to
30
-N. In addition, upon detecting the unvoiced sound component, the voice sound/unvoiced sound detector
71
operates the switch
23
so as to select and deliver an output (noise) from the noise generator
22
to the multipliers
30
-
1
to
30
-N. At the same time, a pitch detector
72
detects a pitch of the input speech signal to cause it to be reflected in the output pulse train from the pulse generator
21
. Thus, when the voice sound component is detected, the output from the pulse generator
21
contains pitch information, which is among characteristic information on the input speech signal.
According to the above described formant vocoder, however, since the formant and anti-formant from the spectral envelope cannot be easily extracted, the formant vocoder requires a complicated analysis process or manual operation. The linear predictive analysis and synthesis system uses an all-pole model to generate sounds and uses a simple mean square value of prediction errors, as an evaluative reference for determining coefficients for the model. Thus, this method does not focus on the nature of voices. The cepstrum vocoder requires a large amount of time for spectral processing and Fourier transformation and is thus insufficiently responsive in real time.
On the other hand, the channel vocoder directly expresses the parameters for the vocal tract characteristics in physical amounts in the frequency domain and thus takes the nature of voices into consideration. Due to the lack of mathematical strictness, however, the channel vocoder is not suited for digital processing.
SUMMARY OF THE INVENTION
There is provided a synthesized sound generating apparatus and method which can achieve responsive and high-quality speech synthesis based on a real-time convolution operation. Coefficients are generated by using dynamic cutting to extract characteristic information from a first signal. A convolution operation in the time domain is performed on a second signal using the generated coefficients to generate a synthesized signal. An interpolation process is performed on the coefficients to prevent a rapid change in level of the generated synthesized signal upon switching of the coefficients.
REFERENCES:
patent: 3624301 (1971-11-01), Richeson
patent: 4577343 (1986-03-01), Oura
patent: 4907484 (1990-03-01), Suzuki et al.
patent: 5111727 (1992-05-01), Rossum
patent: 5247130 (1993-09-01), Suzuki et al.
patent: 5250748 (1993-10-01), Suzuki
patent: 5694522 (1997-12-01), Hiratsuka et al.
patent: 5744742 (1998-04-01), Lindemann et al.
patent: 5826232 (1998-10-01), Gulli
patent: 5864812 (1999-01-01), Kamai et al.
patent: 6073100 (2000-06-01), Goodridge, Jr.
patent: 6253182 (2001-01-01), Acero
patent: 5204397 (1993-08-01), None
Gibson et al (“Real-Time Singing Synthesis using a Parallel Processing System”, IEE Colloquium on Audio and Music Technology: The Challenge of Creative DSP, Nov. 18, 1998).
Dorvil Richemond
Nolan Daniel A
Pillsbury & Winthrop LLP
Yamaha Corporation
LandOfFree
Generating synthesized voice and instrumental sound does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Generating synthesized voice and instrumental sound, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Generating synthesized voice and instrumental sound will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3032176