Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2000-06-13
2004-06-22
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S238000, C704S269000, C704S203000
Reexamination Certificate
active
06754628
ABSTRACT:
FIELD OF THE INVENTION
The present invention generally relates to apparatus and methods for providing speaker recognition.
BACKGROUND OF THE INVENTION
Voice-based speaker recognition (or verification) is an important component of personal authentication systems that are employed in controlling access to devices and services. For example, in telephone banking, an individual may provide a claim (e.g., his or her name) either by using the telephone keypad or by saying it. Subsequently, an automated system may either prompt the user to issue an utterance (password, answer to a question, etc.). The utterance can be analyzed and compared to the voice-print of the claimed person previously stored in a database. As a result of this comparison, the speaker could be either accepted or rejected. Other possible applications of voice-based speaker verification include, for example: computer access; database access via computer, cellphone or regular telephone; ATM access; and credit card authorization via telephone.
Typically, in voice-based speaker verification, a sample of the voice properties of a target speaker is taken and a corresponding model (i.e., a voiceprint) is built. In order to improve the system robustness against impostors, it is also usually the case that a large number of non-target speakers (“background speakers”) are analyzed, pre-stored as voiceprint models, and then used to normalize the voiceprint likelihood scores of a target speaker. The discriminative power of the voiceprint models is crucial to the performance of the overall verification system. An example of a conventional arrangement may be found in D. A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models,” Speech Communication 17 (1995), pp. 91-108.
A need has been recognized, however, in connection with providing voice-based speaker verification that displays even greater system robustness in the face of impostors than has hitherto been the norm.
SUMMARY OF THE INVENTION
In accordance with at least one presently preferred embodiment of the present invention, a “cohort selection” technique is employed in a different manner. Conventionally, cohort selection techniques involve the comparison of the target speaker's data to voice-prints of its closest background neighbors (cohorts) and to use this information for normalization purposes. In accordance with at least one preferred embodiment of the present invention, however, once the closest voice-prints are selected into a cohort set, the dissimilarity of the cohort models is increased using linear feature transforms. The transforms may be derived either from data relating to the target speaker only or from data relating to all speakers in the cohort, including the target speaker. A combination of these two alternatives is also contemplated herein.
The process contemplated herein are believed to contribute to improving the distinction power of the above-mentioned models by employing linear feature transforms derived from specific target speaker data and/or from the specific target's cohort speakers. The inventive processes may be used in a wide range of applications supporting voiceprint based authentication (e.g., as described in U.S. Pat. No. 5,897,616 to Kanevsky et al., entitled “Apparatus and Methods for Speaker Verification/Identification/Classification Employing Non-Acoustic and/or Acoustic Models and Databases).
In one aspect, the present invention provides a method of facilitating speaker verification, the method comprising the steps of providing target data relating to a target speaker; providing background data relating to at least one background speaker; selecting from the background data a set of cohort data having at least one proximate characteristic with respect to the target data; and combining the target data and the cohort data to produce at least one new cohort model for use in subsequent speaker verification.
In another aspect, the present invention provides an apparatus for facilitating speaker verification, the apparatus comprising: a target data store which supplies data relating to a target speaker; a background data store which supplies data relating to at least one background speaker; a selector which selects from the background data a set of cohort data having at least one proximate characteristic with respect to the target data; and a modeller which combines the target data and the cohort data to produce at least one new cohort model for use in subsequent speaker verification.
In an additional aspect, the present invention provides a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for facilitating speaker verification, the method comprising the steps of: providing target data relating to a target speaker; providing background data relating to at least one background speaker; selecting from the background data a set of cohort data having at least one proximate characteristic with respect to the target data; and combining the target data and the cohort data to produce at least one new cohort model for use in subsequent speaker verification.
In a further aspect, the present invention provides a method of facilitating verification, the method comprising the steps of providing target data relating to a target individual; providing background data relating to at least one background individual; selecting from the background data a set of cohort data having at least one proximate characteristic with respect to the target data; and combining the target data and the cohort data to produce at least one new cohort model for use in subsequent verification.
In another aspect, the present invention provides an apparatus for facilitating verification, the apparatus comprising: a target data store which supplies data relating to a target individual; a background data store which supplies data relating to at least one background individual; a selector which selects from the background data a set of cohort data having at least one proximate characteristic with respect to the target data; and a modeller which combines the target data and the cohort data to produce at least one new cohort model for use in subsequent verification.
Furthermore, the present invention provides in another aspect a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for facilitating verification, the method comprising the steps of: providing target data relating to a target individual; providing background data relating to at least one background individual, selecting from the background data a set of cohort data having at least one proximate characteristic with respect to the target data; and combining the target data and the cohort data to produce at least one new cohort model for use in subsequent verification.
REFERENCES:
patent: 5675704 (1997-10-01), Juang
patent: 5897616 (1999-04-01), Kanevsky
patent: 5930748 (1999-07-01), Kleider
patent: 6006184 (1999-12-01), Yamada
patent: 6356868 (2002-03-01), Yuschick
patent: 6393397 (2002-05-01), Choi
Isobe, Toshihiro and Takahashi, Jun-ichi. “A New Cohort Normalization Using Local Acoustic Information for Speaker Verification.” Acoustics, Speech, and Signal Processing, 1999. ICASSP '99. Proceedings., 1999 IEEE International Conference. on, vol.: 2.*
D.A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models”, Speech Communication 17 (1995), pp. 91-108.
R. Gopinath, “Maximum Likelihood Modeling with Gaussian Distributions for classification”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'98), Seattle 1998.
K. Fukunuga, “Statistical Pattern Recognition”, Academic Press 1990.
Chaudhari Upendra V.
Maes Stephane H.
Navratil Jiri
Dorvil Richemond
Ference & Associates
International Business Machines - Corporation
Patel Kinari
LandOfFree
Speaker recognition using cohort-specific feature transforms does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speaker recognition using cohort-specific feature transforms, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speaker recognition using cohort-specific feature transforms will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3364065