Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-01-26
2001-06-12
Smits, Talivaldis I. (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S239000, C704S246000
Reexamination Certificate
active
06246982
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates generally to pattern recognition, which includes automated speech and speaker recognition. In particular, it relates to a computer-implemented data processing method for measuring distance between collections of audio feature distributions or finite mixture models.
In automated speech recognition, input speech is analyzed in small time frames and the audio content of each time frame is characterized by what is known as a feature vector. A feature vector is essentially a set of N audio features associated with that frame. Such audio features are typically the different spectral or cepstral parameters corresponding to the audio of that frame. In an attempt to recognize a spoken word or phoneme, test data comprised of a feature vector or feature vector sequence is compared to models (prototypes) of the sound of known vocabulary words or phonemes. These comparisons are performed using a distance measure, which is a measure of closeness between a pair of elements under consideration. Thus, a given feature vector or feature vector sequence is recognized as that phoneme or word corresponding to the prototype that is the shortest distance away.
In a typical speech recognition system, a different speaker model is developed for each speaker using the system. Prior to using the system for the first time, a speaker is prompted to utter a predetermined sequence of words or sentences to thereby supply training data to the system. The training data is employed to develop a speaker-dependent model containing a set of user-specific prototypes. During subsequent use of the system, the user typically needs to first register his/her identity. The user's speech is then compared only to the corresponding prototypes. An obvious drawback to this technique is the inability to practically recognize speech within a conference of many speakers, for example, due to the impracticality of speaker registration prior to each utterance. Hence, there is a need for a practical method to implement automatic speaker recognition. Also, in a general use environment, it is desirable to eliminate the necessity of collecting training data for new users.
SUMMARY OF THE DISCLOSURE
The present disclosure relates to a method for computing a distance between collections of distributions of feature data (e.g., audio features). In an illustrative embodiment, audio data is processed so as to define at least first and second collections of audio feature distributions, where each collection may be derived from a speech sample of an associated speaker. For each distribution of the first collection, the distance to each distribution of the second collection is measured to determine which distribution of the second collection is the closest (most similar). The same process is carried out for the distributions of the second collection. Based on the closest distance measures, a final distance is computed representing the distance between the first and second collections. This final distance may be a weighted sum of the closest distances. The distance measure may be used in a number of applications such as speaker classification, speaker recognition and audio segmentation.
REFERENCES:
patent: 5664059 (1997-09-01), Zhao
patent: 5787396 (1998-09-01), Komori et al.
patent: 5825978 (1998-10-01), Digalakis et al.
patent: 6009390 (1999-12-01), Gupta et al.
patent: 6064958 (2000-05-01), Takahashi et al.
Thomas E. Flick, et al. “A Minimax Approach to Development of Robust Discrimination Algorithms for Multivariate Mixture Distributions,” Proc. IEEE ICASSP 88, vol. 2, pp. 1264-1267, Apr. 1988.*
Homayoon sadr Mohammad Beigi, et al. “A Distance Measure Between Collections of Distributions and its Application to Speaker Recognition,” Proc. IEEE ICASSP 98, vol. 2, pp. 753-756, May 1998.*
Geoff A. Jarrad, et al. “Shared Mixture Distributions and Shared Mixture Classifiers,” Proc. IEEE IDC 99, pp. 335-340, Feb. 1999.
Beigi Homayoon S. M.
Maes Stephane H.
Sorensen Jeffrey S.
F. Chau & Associates LLP
International Business Machines - Corporation
Smits Talivaldis I.
LandOfFree
Method for measuring distance between collections of... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for measuring distance between collections of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for measuring distance between collections of... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2456324