System and method for sampling rate transformation in speech...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S204000, C704S203000, C704S234000, C704S237000

Reexamination Certificate

active

06199041

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to speech recognition and, more particularly, to a system and method for transforming sampling rates without retraining.
2. Description of the Related Art
Speech recognition can achieve the best performance when test or operating conditions match training conditions. In general, these matched conditions include acoustic environments, speakers, application corpora, etc. An issue arises in conventional systems when a sampling frequency mismatch occurs between the training conditions and the test/operating conditions. The frequency mismatch inevitably leads to severe performance degradation in speech recognition.
When a conventional speech recognition system is deployed, it is designed for a specific data sampling frequency. When another sampling rate is considered, it is customary to re-train the system for the new specific sampling rate. While it is straightforward to transform signals and retrain systems, this presents at least two major problems in many real-time applications. First, extra efforts are needed to supply training data at the new sampling frequency by either collecting new data or transforming existing training data. Second, the training process must be repeated to generate new system parameters.
For systems that have undergone calibration processes such as speaker adaptation or acoustic adaptation, it is even more tedious to repeat the processes, let alone the complication of maintaining multiple prototypes.
Therefore, a need exists for a system and method for providing sampling frequency change without the burden of retraining.
SUMMARY OF THE INVENTION
A method for transforming a sampling rate in speech recognition systems is disclosed which may be implemented by a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for transforming a sampling rate in speech recognition systems, the method steps which may be implemented by the program storage device include providing cepstral based data including utterances comprised of segments at a reference frequency, the segments being represented by cepstral vector coefficients, converting the cepstral vector coefficients to energy bands in logarithmic spectra, filtering the energy bands of the logarithmic spectra to remove energy bands having a frequency above a predetermined portion of a target frequency and converting the filtered logarithmic spectra to modified cepstral vector coefficients at the target frequency.
In alternate methods which may be executed by the program storage device, each energy band may be associated with a mel-filter, and the step of filtering may further include the step of resealing the mel-filters. The step of converting the cepstral vector coefficients to energy bands in logarithmic spectra may include converting the cepstral vector coefficients to energy bands in logarithmic spectra by employing an inverse discrete cosine transform (IDCT). The step of filtering the energy bands may include the step of filtering the energy bands to remove energy bands above one-half the target frequency. The step of converting the filtered logarithmic spectra to modified cepstral vector coefficients at the target frequency may include the step of converting the filtered logarithmic spectra to modified cepstral vector coefficients at the target frequency by performing a discrete cosine transform (DCT). The method may further include the step of estimating maximum and mean values of segment energies at the reference frequency and at the target frequency. The method may further include the step of outputting a global maximum and mean at the reference frequency for denormalizing system prototypes of a speech recognition system. The method may further include the step of outputting a global maximum and mean at the target frequency for energy normalization of system prototypes of a speech recognition system.
Another method for transforming a sampling rate in speech recognition systems is disclosed which may be implemented by a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for transforming a sampling rate in speech recognition systems, the method steps which may be implemented by the program storage device include providing system prototypes including distributions of normalized cepstral vectors at a reference frequency, denormalizing the normalized cepstral vectors at the reference frequency, converting the denormalized to energy bands in logarithmic spectra, filtering the energy bands of the logarithmic spectra to truncate energy bands having a frequency above a predetermined portion of a target frequency, converting the filtered energy bands to modified cepstral vectors and normalizing the modified cepstral vectors at the target frequency such that the system prototypes are sampled at the target frequency.
In alternate methods which may be executed by the program storage device, each energy band is associated with a mel-filter, and the step of filtering may further include the step of resealing the mel-filters. The step of converting the denormalized to energy bands in logarithmic spectra may include converting the denormalized to energy bands in logarithmic spectra by employing an inverse discrete cosine transform (IDCT). The step of filtering the energy bands may include the step of filtering the energy bands to remove energy bands above one-half the target frequency. The step of converting the filtered energy bands to modified cepstral vectors may include the step of converting the filtered energy bands to modified cepstral vectors by performing a discrete cosine transform (DCT). The step of denormalizing the normalized cepstral vectors at the reference frequency may further include the step of inputting global maximum and mean values of segment energies at the reference frequency to denormalize the normalized cepstral vectors of the system prototypes at the reference frequency. The step of normalizing the modified cepstral vectors may further include the step of inputting global maximum and mean values of segment energies at the target frequency to normalize the cepstral vectors of the system prototypes at the target frequency.
These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.


REFERENCES:
patent: 5165008 (1992-11-01), Hermansky et al.
patent: 5581653 (1996-12-01), Todd
patent: 5732394 (1998-03-01), Nakadai et al.
patent: 5809459 (1998-09-01), Bergstrom et al.
patent: 5857000 (1999-01-01), Jar-Ferr et al.
patent: 5913188 (1999-06-01), Tzirkel-Hancock
Haeb-Umbach et al (R. Haeb-Umbach, X. Aubert, P. Beyerlein,D. Klakow, M. Ullrich, A. Wendemuth, P. Wilcox, “Acoustic Modeling in the Philips Hub-4 Continuous-Speech Recognition System,” DARPA Broadcast News, Transcription & Understanding Workshop, Feb. 1998.
Parrott (Parrott Systems, Inc., Internet web page “http://www.say-parrot.com/us/technology/algorithms/recognition/index.html,” Feb. 2000).
Padmanabhan et al (M. Padmanabhan, L.R. Bahl, D. Nahamoo, M. Picheny, “Speaker Clustering and Transformation for Speaker Adaptation in Speech Recognition Systems”, IEEE Transactions on Speech and Audio Processing, Jan. 1998).
Bahl et al., “Performance of the IBM Large Vocabulary Continuous Speech Recognition System on the ARPA Wall Street Journal Task,” ICASSP-95, 1995.
Davis et al., “Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences”, IEEE Trans. on ASSP, vol. 28, pp. 357-366, 1980.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for sampling rate transformation in speech... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for sampling rate transformation in speech..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for sampling rate transformation in speech... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2515715

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.