Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-06-14
2002-05-21
Dorvil, Richemond (Department: 2741)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S238000, C704S239000
Reexamination Certificate
active
06393397
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to an apparatus and method for selecting one or more cohort models for use in a speaker verification system.
BACKGROUND OF THE INVENTION
In various circumstances it is desirable to be able to limit access to a particular location or function to only one or more authorised individuals. Often an identifying badge or Personal Identification Number (PIN) are utilised for such purposes. Increasingly, efforts have been made to supplement such traditional identifiers with one or more biometric indicators. Finger-prints, retinal patterns, hand shape, and voice have, for example, all been considered in this regard, as all of these criteria are relatively unique to each individual.
In speaker verification systems, the individual person typically speaks a predetermined statement or series of sounds. These sounds are then compared in some way against a previously stored sample of that same person's speech pattern. A sufficiently close match yields a positive verification that the speaker is who he or she claims to be, otherwise there is no such verification.
In one prior art approach, such speaker verification is accomplished by comparing this person's present voice input against both a previously stored model representing that person's speech, and also against one or more so-called cohort models. The cohort models are typically selected from many (typically hundreds) previously stored speech models of other individuals, in order to locate a sub-set of relatively close models by comparing an original speech utterance of the person with the previously stored speech models. The previously stored speech models that are most similar to the original speech utterance are then used as the cohort models, each of which is close, but not equal, to the target individual's actual speech pattern. Upon comparing a claimed person's present speech utterance against both the previously stored model and the cohort models, a determination can be made as to whether the present utterance is more similar to the stored model or to a cohort model. If more similar to a cohort model, a rejection is returned. If, however, the present utterance is closer to the original model, an acceptance can be returned.
Using prior art techniques, determining which of the previously stored speech models are most similar to the original speech utterance involves, in effect, running the original speech utterance through each of the stored speech models to determine the most similar, which is a computationally intensive and time consuming process. When first installing such a facility in an existing location having numerous employees, the training activity, including a significant amount of time spent determining the cohort models, can, at best, inconvenience the individual, and at worst, significantly delay clearance and participation for a significant number of individuals.
The cohort model approach to speaker verification, however, continues to offer significant promise with respect to both subsequent robustness, accuracy, and ease of use. A need therefore exists for a way to support cohort model based speaker verification systems while still reducing the amount of time required to select the cohort models for each new person.
In this specification, including the claims, the terms “comprises”, “comprising” or similar terms are intended to mean a non-exclusive inclusion, such that a method or apparatus that comprises a list of elements does not include those elements solely, but may well include other elements not listed.
BRIEF SUMMARY OF THE INVENTION
The present invention therefore seeks to provide a method of and system for selecting a cohort model for use in a speaker verification system which overcomes, or at least reduces the above-mentioned problems of the prior art.
Accordingly, in one aspect, the invention provides a method of selecting at least one cohort model for use in a speaker verification system, the method including the steps of: providing a group of existing speaker models; receiving target speaker voice utterances from a target speaker; digitizing at least portions of the received utterances to provide at least one speech sample; determining a target speaker model from the at least one speech sample; determining at least one similarity value between each of a plurality of the existing speaker models and the target speaker model; and utilising the at least one similarity value to select at least one similar existing speaker model as a cohort model for the target speaker.
In one preferred embodiment, the method of selecting a cohort model further includes the steps of: determining at least one dissimilarity value between at least some of the plurality of the existing speaker models and each cohort model previously selected; and selecting at least one of the existing speaker models which is similar to the target speaker model and dissimilar to the at least one cohort model previously selected as at least one cohort model for the target speaker.
Preferably, each speaker model and cohort model comprises a set of parameters, each parameter representing a characteristic of the speech of the speaker, and the step of determining at least one similarity value between an existing speaker model and the target speaker model comprises the step of comparing the value of at least one of the parameters of the existing speaker model with the value of at least one corresponding parameter of the target speaker model to determine the similarity value.
In one embodiment, the step of determining the dissimilarity value comprises the step of: comparing the value of at least one of the parameters of the existing speaker model with the value of at least one corresponding parameter of the cohort model to determine the dissimilarity value. Preferably, the step of selecting at least one of the existing speaker models which is similar to the target speaker model but dissimilar to the at least one previously selected cohort model involves combining in a predetermined combination the dissimilarity values of two or more previously selected cohort models and selecting at least one of the existing speaker models which has a high similarity value and a high combined dissimilarity value. Conveniently, the predetermined combination can be normalised to the similarity values. One of the parameters is preferably a vector, which can be quantised, representing the frequency response of a time sample of the utterance.
Preferably, each parameter of the set of parameters is represented by a vector and the step of determining at least one similarity value between an existing speaker model and the target speaker model includes the steps of: determining at least two vectors for each existing speaker model and for the target speaker model; for each existing speaker model vector, determining the distance in the n-dimensional space between that existing speaker model vector and each target speaker model vector and, for each existing speaker model vector, storing whichever distance has a minimum value; and summing the stored minimum distances to provide the at least one similarity value.
Preferably, the step of determining at least one dissimilarity value between an existing speaker model and a cohort model includes the steps of: determining at least two vectors for each existing speaker model and for the cohort model; for each existing speaker model vector, determining the distance in the n-dimensional space between that existing speaker model vector and each cohort model vector and, for each existing speaker model vector, storing whichever distance has a minimum value; and summing the stored minimum distances to provide the at least one dissimilarity value.
According to a second aspect, the invention provides an apparatus for selecting at least one cohort model for use in a speaker verification system, the apparatus including: a database of existing speaker models; a receiver for receiving target speaker voice utterances from a target speaker; a speech digitizer coupled to the receiver to
Choi Ho Chuen
Song Jianming
Zhu Xiaoyuan
Dorvil Richemond
Motorola Inc.
Nichols Daniel K.
LandOfFree
Cohort model selection apparatus and method does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Cohort model selection apparatus and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cohort model selection apparatus and method will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2840643