Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-11-09
2002-06-04
Banks-Harold, Marsha D. (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S246000, C704S273000
Reexamination Certificate
active
06401063
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to the field of speaker verification and more particularly to a method and apparatus for generating certain data that is specific to a user and that can be used by a speaker verification system to authenticate the user based on a speech pattern. This invention is applicable to speech activated security systems such as access to voice-mail, automated telephone services, automated banking services and voice directed computer applications, among others.
BACKGROUND OF THE INVENTION
Speaker verification is the process of verifying whether a given speaker is a claimed speaker. The basis of this process lies on comparing a verification attempt with a speaker specific speech pattern representative of the claimed speaker and then calculating the likelihood of the verification attempt actually being generated by the claimed speaker. A common approach is to determine the likelihood of the verification attempt being generated by the claimed speaker given the speaker specific speech pattern. Typically, if the calculated likelihood is above a certain threshold then the verification attempt is accepted as being generated by the claimed speaker. Otherwise, the verification attempt is rejected. The level of the threshold depends on a number of factors such as the level of security required and therefore on the level of tolerance for false acceptance or false rejection.
Speaker verification systems can be characterized as being either password non-specific, where the verification is entirely done on the basis of the voice of the speaker, or password specific, where the speaker must utter a specific password in addition to having the proper voice. Password specific speaker verification systems are desirable because an additional level of security is added since the speaker must utter the correct password in addition to having a voice with the correct acoustic properties. In addition, password specific speaker verification systems may be desirable when a given functionality in a system using speaker verification is operatively linked to a given password.
A common approach for improving the speaker verification process is the use of normalizing techniques such as the world normalizing model, the background normalizing model and cohort normalization model. The world, background and cohort normalization models perform verification on the basis of a template representing the claimed speaker, and a template that is independent of the claimed speaker. The template representing the claimed speaker is herein referred to as the speaker specific speech pattern. The template that is independent of the claimed speaker is herein referred to as a normalizing template. In broad terms, normalizing techniques involve computing a likelihood score indicative of a probability that the verification attempt was generated by the claimed speaker and normalizing the likelihood score by a second score, herein referred to as the normalizing score. For additional information on the background, cohort and world normal-zing methods, the reader is invited to refer to Gu et al. (1998) “An Implementation and Evaluation of an On-line speaker Verification System for Field Trials”
Proc. ICASSP '
98, pp. 125-128 and to Rosenberg et al. (1996) “Speaker Background Models for Connected Digit Password Speaker Verification”
Proc. ICASSP '
96, pp. 81-84. The contents of these documents are hereby incorporated by reference.
In the cohort normalizing method, the normalizing template is indicative of a template representing the most competitive speaker specific speech pattern selected from a group of speaker specific speech patterns. This is done by scoring the verification attempt against various speaker specific speech patterns in a set of speaker specific speech patterns excluding the speaker specific speech pattern associated to the claimed speaker. The speaker specific speech patterns in the set are indicative of a same password uttered by different speakers. The highest scoring speaker specific speech pattern in the database of speaker specific speech patterns is retained as the most competitive speaker specific speech pattern for use in the normalizing process. The score of the verification attempt on the speaker specific speech pattern associated to the claimed speaker is compared to the score of the verification attempt on the most competitive speaker specific speech pattern in order to determine whether the given speaker is to be accepted as the claimed speaker or not.
Mathematically the cohort normalizing method can be expressed as follows:
log
L
(
O
)=log
p
(
O|&lgr;
c
)−max{log
p
(
O|&lgr;
i
)}
where L(O) is the likelihood of a verification attempt observation O, p(O|&lgr;c) is a probability that the observation O corresponds to the parameters given by &lgr;c, representative of the speaker specific speech pattern associated to the claimed speaker, and p(O|&lgr;i) is a probability that an observation O corresponds to the parameters given by &lgr;i, which represents a set of speaker specific speech patterns other than the speaker specific speech patterns associated to the claimed speaker; max{log p(O|&pgr;
i
)} represents the logarithmic likelihood of the most competitive speaker specific speech pattern.
In the background normalizing method, the normalizing template is derived by combining speaker specific speech models from a set of speech models associated to possible imposters to form a template. The speech models selected to be part of the normalizing template are typically derived on a basis of a similarity measurement. The score of the verification attempt of speaker specific pattern associated to the claimed speaker is compared to the normalizing template in a manner similar to that described in connection with the cohort normalizing method.
Methods of the type described above require a database of speaker dependent models to create the normalizing template. Performance is closely tied to the contents of the database of speaker specific models. Optimally, the database of speaker specific models should contain the speaker specific models associated to a probable imposter trying to access the system. Having a database containing a priori a complete set of speaker specific models is prohibitive to create.
Another common method is the world normalization model. In this method, instead of performing verification on the basis of the score of the speaker specific speech pattern associated to the claimed speaker and of many possible speaker specific speech patterns, the verification is done on the basis of a speaker specific speech pattern associated to the claimed speaker and of a single world template or speaker independent template. A speaker independent model set generated from a large number of speech samples collected from a large number of speakers uttering a plurality of words is used to generate a speaker independent template representative of an average pronunciation of a specific word by an average speaker. In other words, the speaker independent model set allows creating an approximation of the actual pronunciation of the specific word since the pronunciation was generated from a plurality of uttered words.
The world normalization method does not require a database of speaker dependent models and is therefore more flexible than the cohort and background model methods. A deficiency of the world normalization model is a lower performance in terms of speaker verification for a given acceptance/rejection threshold since the world normalizing model is an overgeneralization of the pronunciation of the specific word considered.
Consequently, there is a need in the industry for providing a method and apparatus for generating an improved normalizing template for use in a speaker verification system.
SUMMARY OF THE INVENTION
In accordance with a broad aspect, the invention provides an apparatus for creating a biased normalizing template suitable for use by a speaker verification system to authent
Hébert Matthieu
Peters Stephen D.
Banks-Harold Marsha D.
McFadden Susan
Nortel Networks Limited
LandOfFree
Method and apparatus for use in speaker verification does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for use in speaker verification, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for use in speaker verification will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2931289