Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis
Reexamination Certificate
2002-11-14
2004-03-16
Dorvil, Richemond (Department: 2655)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Synthesis
C704S264000, C704S266000, C704S209000
Reexamination Certificate
active
06708154
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to speech recognition and synthesis systems and in particular to speech systems that exploit formants in speech.
In human speech, a great deal of information is contained in the first three resonant frequencies or formants of the speech signal. In particular, when a speaker is pronouncing a vowel, the frequencies and bandwidths of the formants indicate which vowel is being spoken.
To detect formants, some systems of the prior art utilize the speech signal's frequency spectrum, where formants appear as peaks. In theory, simply selecting the first three peaks in the spectrum should provide the first three formants. However, due to noise in the speech signal, non-formant peaks can be confused for formant peaks and true formant peaks can be obscured. To account for this, prior art systems qualify each peak by examining the bandwidth of the peak. If the bandwidth is too large, the peak is eliminated as a candidate formant. The lowest three peaks that meet the bandwidth threshold are then selected as the first three formants.
Although such systems provided a fair representation of the formant track, they are prone to errors such as discarding true formants, selecting peaks that are not formants, and incorrectly estimating the bandwidth of the formants. These errors are not detected during the formant selection process because prior art systems select formants for one segment of the speech signal at a time without making reference to formants that had been selected for previous segments.
To overcome this problem, some systems use heuristic smoothing after all of the formants have been selected. Although such post-decision smoothing removes some discontinuities between the formants, it is less than optimal.
In speech synthesis, the quality of the formant track in the synthesized speech depends on the technique used to create the speech. Under a concatenative system, sub-word units are spliced together without regard for their respective formant values. Although this produces sub-word units that sound natural by themselves, the complete speech signal sounds unnatural because of discontinuities in the formant track at sub-word boundaries. Other systems use rules to control how a formant changes over time. Such rule-based synthesizers never exhibit the discontinuities found in concatenative synthesizers, but their simplified model of how the formant track should change over time produces an unnatural sound.
SUMMARY OF THE INVENTION
The present invention utilizes a formant-based model to improve the creation of formant tracks in synthesized speech. Text is divided into a sequence of formant model states, which are used to retrieve a sequence of stored excitation segments. The states are also provided to a formant path generator, which determines a set of most likely formant paths given the sequence of model states and the formant models for each state. The formant paths are then used to control a series of resonators, which introduce the formants into the sequence of excitation segments. This produces a sequence of speech segments that are later combined to form the synthesized speech signal.
REFERENCES:
patent: 3624302 (1971-11-01), Atal
patent: 3808370 (1974-04-01), Jackson et al.
patent: 3828132 (1974-08-01), Flanagan et al.
patent: 4130730 (1978-12-01), Ostrowski
patent: 4343969 (1982-08-01), Kellett
patent: 4424415 (1984-01-01), Lin
patent: 4813075 (1989-03-01), Ney
patent: 4831551 (1989-05-01), Schalk et al.
patent: 5042069 (1991-08-01), Chhatwall et al.
patent: 5146539 (1992-09-01), Doddington et al.
patent: 5381512 (1995-01-01), Holton et al.
patent: 5649058 (1997-07-01), Lee
patent: 5701390 (1997-12-01), Griffin et al.
patent: 5729694 (1998-03-01), Holzrichter et al.
patent: 5742928 (1998-04-01), Suzuki
patent: 5754974 (1998-05-01), Griffin et al.
patent: 5768603 (1998-06-01), Brown et al.
patent: 5911128 (1999-06-01), DeJaco
patent: 6006180 (1999-12-01), Bardaud et al.
patent: 6292775 (2001-09-01), Holmes
patent: 0878790 (1998-11-01), None
patent: 64-064000 (1989-09-01), None
patent: 11-327592 (1999-11-01), None
patent: 2000-099094 (2000-04-01), None
patent: WO 9316465 (1993-08-01), None
Vanhove (“An Algorithm For LPC Synthesis Gain Matching”, IEEE Transactions on Acoustics, Speech, and Signal Processing, Dec. 1983).*
Bhimani et al (“An Approach To Speech Synthesis And Recognition On A Digital Computer”, Proceedings of the ACM/CSC-ER 1966 21st National Conference Jan. 1966).*
El-Imam (“Speech Analysis And Synthesis On A Personal Computer”, Proceedings of the 1986 ACM SIGSMALL/PC Symposium on Small Systems, Dec. 1986).*
Al-Janabi et al (“Effective-Fourth-Order Resonator Based Mash Bandpass Sigma-Delta Modulators”, IEEE International Conference on Acoustics, Speech, and Signal Processing, Apr. 1997).*
Haas et al (“A Multi-Band Nonlinear Oscillator Model For Speech”, Conference Record of the Thirty-Second Asilomar Conference on Signals, Systems & Computers, Nov. 1998) nonlinear self-oscillating systmes model speech without external excitation consi.*
“A New Paradigm for Reliable Automatic Format Tracking”, by Yves Laprie et al., ICASSP-94, vol. 2, pp. 201-204, (1992).
“System for Automatic Formant Analysis of Voiced Speech”, by Ronald W. Schafer et al.,The Journal of the Acoustical Society of America,vol. 47, No. 2 (Part 2), pp. 634-648, (1970).
“Acoustic Parameters of Voice Individuality and Voice-Quality Control by Analysis-Synthesis Method,” by Kuwabara et al., Speech Communication 10 North-Holland, pp. 491-495 (Jun. 15, 1991).
“Tracking of Partials for Additive Sound Synthesis Using Hidden Markov Models,” by Depalle et al., 1993 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 225-228 (Apr. 27, 1993).
“A Formant Vocoder Based on Mixtures of Gaussians,” by Zolfaghari et al., IEEE International Conference on Acoustic Speech and Signal Processing, pp. 1575-1578 (1997).
“Application of Markov Random Fields to Formant Extraction,” by Wilcox et al., International Conference on Acoustics, Speech and Signal Processing, pp. 349-352 (1990).
“Role of Formant Frequencies and Bandwidths in Speaker Perception,” by Kuwabara et al., Electronics and Communications in Japan, Part 1, vol. 70, No. 9, pp. 11-21 (1987).
“A Family of Formant Trackers Based on Hidden Markov Models,” by Gary E. Kopec, International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 1225-1228 (1986).
“A Mixed-Excitation Frequency Domain Model for Time-Scale Pitch-Scale Modification of Speech”, by Alex Acero, Proceedings of the International Conference on Spoken Language Processing, Sydney, Australia, pp. 1923-1926 (Dec. 1998).
“From Text to Speech: The MITalk System”, by Jonathan Allen et al., MIT Press, Table of Contents pp. v-xi, Preface pp. 1-6 (1987).
“Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences”, by Steve B. Davis et al., IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-28, No. 4, pp. 357-366 (Aug. 1980).
“Whistler: A Trainable Text-to-Speech System”, by Xuedong Huang et al., Proceedings of the International Conference on Spoken Language Systems, Philadelphia, PA, pp. 2387-2390 (Oct. 1996).
“An Algorithm for Speech Parameter Generation from Continuous Mixture HMMS with Dynamic Features”, by Keiichi Tokuda et al., Proceedings of the Eurospeech Conference, Madrid, pp. 757-760 (Sep. 1995).
“Extraction of Vocal-Tract System Characteristics from Speech Signals”, by B. Yegnanarayana, IEEE Transactions on Speech and Audio Processing, vol. 6, No. 4, pp. 313-327 (Jul. 1998).
Vucetic, “A Hardware Implementation of Channel Allocation Algorithms Based on A Space-Bandwidth Model of A Cellular Network,” IEEE (May 1992).
Dorvil Richemond
Magee Theodore M.
Microsoft Corporation
Nolan Daniel
Westman Champlin & Kelly P.A.
LandOfFree
Method and apparatus for using formant models in resonance... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for using formant models in resonance..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for using formant models in resonance... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3189930