Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2001-04-02
2002-12-31
{haeck over (S)}mits, T{overscore (a)}livaldis Ivars (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S220000, C704S221000
Reexamination Certificate
active
06502066
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention deals with formant tracking. More specifically, the present invention deals with formant tracking using a formant synthesizer.
The human vocal tract has a number of resonances. The speaker can change the frequency of these resonances to produce different sounds. For example, the speaker can change the configuration of the vocal tract by movement of the tongue or lips and the inclusion or exclusion of the nasal tract. These resonances are excited by the movement of the vocal cords or noise generated at a constriction of the vocal tract. Each sound has an associated set of resonances, and when sounds are strung together in a time wise fashion, they form words. These resonances are referred to as formants.
In speech analysis, the first three resonances (or formants) are generally of primary interest. Higher frequency formants vary minimally, and are usually based on the length of the particular speaker's vocal tract. Thus, the higher frequency formants do not carry a great deal of information with respect to the words being spoken.
The formants associated with each sound can vary a great deal from speaker-to-speaker. Further, formants can vary from one utterance to another, even for the same speaker. Thus, tracking formants is quite difficult.
Formant trackers are conventionally used to identify and track formants in human speech. This information is useful in speech analysis. Standard formant trackers perform linear prediction on the speech signal in order to identify the resonances or formants associated with the speech signal. In other words, at some point in time, n, the speech signal is represented as follows:
s
⁡
(
n
)
=
a
1
*
s
⁡
(
n
-
1
)
+
a
2
*
s
⁡
(
n
-
2
)
+
…
⁢
+
x
⁡
(
n
)
=
∑
i
=
1
p
⁢
a
i
⁢
s
⁡
(
n
-
i
)
+
x
⁡
(
n
)
where s(n) is the speech signal, x(n) is the excitation, and the coefficients a
i
are the impulse response of the vocal tract.
The roots of the equation represent poles, and a single pole pair has a specific frequency response. Thus, each formant track (each set of three formants) corresponds to three pole pairs.
A conventional formant tracker divides the speech signal into consecutive frames having a predetermined duration (such as 10 millisecond). By taking the roots of the filter defined by Equation 1, the resonances for each frame can be found. However, for each 10 millisecond frame, the linear prediction algorithm may identify a relatively large number (such as seven) of resonances. Although this number can be controlled in performing the linear prediction calculations, more than three resonances must be calculated, in order to model any noise or non-linearities present in the signal. The formant tracker then attempts to find smooth paths for three primary formants at each frame, given the seven resonances identified by the linear prediction algorithm.
Conventional formant trackers have problems. The primary problem associated with conventional formant trackers is that they fail to select the proper resonances identified by linear prediction, and thus fail to find the proper formants. Also, conventional formant trackers can provide discontinuous formant tracks based on inaccurate identification of resonances.
Formant synthesizers are a type of speech synthesizer used to produce speech from a phonetic description of an utterance. Formant synthesizers are generally trained by phoneticians, who in essence codify their knowledge of speech production into the mathematical codes and data tables that the formant synthesizer uses to generate formants from a phonetic representation of an utterance.
During synthesis, the input text is typically broken into the phonemic units, and those units are provided to the formant synthesizer. The formant synthesizer then generates formants or formant tracks which are reasonable and expected based on the speech units input into the synthesizer. Normally, the formant tracks are then used to create synthetic speech.
SUMMARY OF THE INVENTION
Formants corresponding to input speech units are generated from a formant synthesizer. A frequency response is generated based on the synthesized formants. A second frequency response is generated based on a speech signal which is received and which corresponds to utterances of the speech units. The synthesized formants are modified based on a comparison of the frequency response corresponding to the synthesized formants and the frequency response of the input speech signal.
REFERENCES:
patent: 4424415 (1984-01-01), Lin
patent: 5146539 (1992-09-01), Doddington et al.
patent: 5313555 (1994-05-01), Kamiya
patent: 5325462 (1994-06-01), Farrett
patent: 5625747 (1997-04-01), Goldberg et al.
patent: 5913193 (1999-06-01), Huang et al.
patent: 6101469 (2000-08-01), Curtin
“Robust N-best Formant Tracking”, Proceedings of the 4th European Conference on Speech Communication and Technology, vl, p. 737-740, 1995, by P. Schmid and E. Barnard.
R.W. Schafer, L.R. Rabiner, “System for Automatic Formant Analysis of Voiced Speech”,Journal of the Acoustical Society of America, 47, 634-648, 1970.
P. Zolfaagheri and R. Robinson, “Formant Analysis Using Mixtures of Gaussians”,Proceedings of ICSLP, p. 1229-1232, 1996.
Y. Laprie, “A New Paradigm for Reliable Automatic Formant Tracking”,Proceedings of ICASSP, vol. 2, 1994, pp. II/201-4.
IEEE Transactions on “Audio and Electroacoustics” vol. AU-21, No. 2, pp. 69-79, Apr. 1973, Markel and Gray, “On Auto Correlation Equations to Speech”.
The Journal of the Acoustical Society of America. Automatic Formant Tracking by a Newton-Raphson Technique, J.P. Olive, Apr. 14, 1971 pp 661-670.
The Journal of the Acoustical Society of America. Automatic Reduction of Vowel Spectra: An Analysis-By-Synthesis Method and Its Evaluation. By Allan P. Paul et al., vol. 36, No. 2, Feb. 1964, pp. 303-308.
The Journal of the Acoustical Society of America. Reduction of Speech Spectra by Analysis-By Synthesis Techniques. By C.G. Bell et al. vol. 22, No. 12, Dec. 1961 pp. 1725-1736.
Holt Christopher L.
Microsoft Corporation
Storm Donald L.
Westman Champlin & Kelly P.A.
{haeck over (S)}mits T{overscore (a)}livaldis Ivars
LandOfFree
System for generating formant tracks by modifying formants... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System for generating formant tracks by modifying formants..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for generating formant tracks by modifying formants... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2917139