Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2000-12-21
2004-11-23
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S237000
Reexamination Certificate
active
06823305
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to speech recognition, and more particularly relates to an apparatus and method for speaker normalization based on biometrics.
2. Brief Description of the Prior Art
It has been well established in the field of automatic speech recognition that normalizing waveforms to account for the vocal tract differences among speakers yields more accurate results than can be obtained in systems which do not include such normalization. If an open vocal tract model is assumed, such as would be appropriate for an open vowel (for example/UH/), a uniform tube model provides a good approximation to the vocal tract, as discussed by Lawrence W. Rabiner and Ronald W. Schafer in the text
Digital Processing of Speech Signals
published by Prentice-Hall, Inc., Englewood Cliffs, N.J. 07632 in 1978.
In the uniform tube model, when one scales the tube by a factor 1/k, this results in a scaling of all of the resonances of the tube by k and, therefore, a linear scaling of the frequency axis is appropriate. In practice, linear scaling has been shown to be effective in normalizing for differences in vocal tract length. In implementation, once a form of frequency scaling (for example, linear scaling, f*=kf) has been chosen, the remaining question is how to determine a scale factor k
i
for each speaker i.
It has been known in the prior art to derive an estimated scale factor based on formant positions, as set forth in the paper “A Parametric Approach to Vocal Tract Length Normalization” by Ellen Eide and Herbert Gish as published by the IEEE in the Proceedings of the ICASSP of 1996, at pages 346-48.
Other results have been published for general speech corpora based on exhaustive search, for example, refer to “Speaker Normalization Using Efficient Frequency Warping Procedures” by Li Lee and Richard C. Rose, as published at pages 353-56 of the aforementioned 1996 ICASSP Proceedings, and “Speaker Normalization on Conversational Telephone Speech” by Steven Wegmann et al., as published at pages 339-41 of the aforementioned proceedings.
One case of interest is the situation where a database is available which contains biometric information in the form of a biometric parameter (such as speaker height) which would permit the normalization factor for each speaker to be computed by taking the ratio of the value of the speaker's biometric parameters to some measure of an average value of the biometric parameter, such as the average across all speakers in the training database.
In view of the foregoing, there is a need in the prior art for a speaker normalization apparatus and method which are based on biometrics pertaining to the speaker.
SUMMARY OF THE INVENTION
The present invention, which addresses the needs identified in the prior art, provides a method of speaker normalization. The method includes the steps of receiving a first biometric parameter, calculating a first frequency scaling factor based on the first biometric parameter, and extracting acoustic features from speech of a user in accordance with the first frequency scaling factor. The first biometric parameter is correlated to vocal tract length of a given user of a speech recognition system.
The present invention further provides an apparatus for speaker normalization, which includes a biometric parameter module, a calculation module, and an acoustic feature extractor. The biometric parameter module receives the first biometric parameter which is correlated to the vocal tract length of the user. The calculation module calculates the first frequency scaling factor based on the first biometric parameter. The acoustic feature extractor extracts acoustic features from speech of the user in accordance with the first frequency scaling factor.
The present invention can be implemented in hardware, software, or a combination of hardware and software, and accordingly also encompasses a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for speaker normalization as set forth herein.
Accordingly, it will be appreciated that the method and apparatus of the present invention provide an improvement over prior-art approaches, inasmuch as an appropriate scaling factor can be readily determined, so as to improve the accuracy of an associated speech recognition system, based on biometric data pertaining to users of the system, which may be, for example, pre-stored, or which may be ascertained during an interaction with the speaker.
REFERENCES:
patent: 5121434 (1992-06-01), Mrayati et al.
patent: 5696878 (1997-12-01), Ono et al.
patent: 6236963 (2001-05-01), Naito et al.
patent: 6298323 (2001-10-01), Kaemmerer
patent: 6356868 (2002-03-01), Yuschik et al.
Lawrence W. Rabiner and Ronald W. Schafer,Digital Processing of Speech Signalspublished by Prentice Hall, Inc., Englewood Cliffs, NJ 07632 in 1978, pp. 62-65.
“A Parametric Approach to Vocal Tract Length Normalization” by Ellen Eide and Herbert Gish as published by the IEEE in the Proceedings of the ICASSP of 1996, at pp. 346-348.
“Speaker Normalization Using Efficient Frequency Warping Procedures” Li Lee and Richard C. Rose, as published by the IEEE in the Proceedings of the ICASSP of 1996, at pp. 353-356.
“Speaker Normalization on Conversational Telephone Speech” by Steven Wegmann et al., as published by the IEEE in the Proceedings of the ICASSP of 1996, at pp. 339-341.
Azad Abul K.
Dang Thu A
Dorvil Richemond
International Business Machines - Corporation
LandOfFree
Apparatus and method for speaker normalization based on... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and method for speaker normalization based on..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for speaker normalization based on... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3301507