Speaker verification and speaker identification based on a...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S256000, C704S246000, C704S247000, C704S250000

Reexamination Certificate

active

06697778

ABSTRACT:

BACKGROUND AND SUMMARY OF THE INVENTION
The present invention relates generally to speech technology and, more particularly, to a system and method for performing speaker verification or speaker identification.
The problem of authentication lies at the heart of nearly every transaction. Millions of people conduct confidential financial transactions over the telephone, such as accessing their bank accounts or using their credit cards. Authentication under current practice is far from foolproof. The parties exchange some form of presumably secret information, such as social security number, mother's maiden name or the like. Clearly, such information can be pirated, resulting in a false authentication.
One aspect of the present invention addresses the foregoing problem by providing a system and method for performing speaker verification. Speaker verification involves determining whether a given voice belongs to a certain speaker (herein called the “client”) or to an impostor (anyone other than the client).
Somewhat related to the problem of speaker verification is the problem of speaker identification. Speaker identification involves matching a given voice to one of a set of known voices. Like speaker verification, speaker identification has a number of attractive applications. For example, a speaker identification system may be used to classify voice mail by speaker for a set of speakers for which voice samples are available. Such capability would allow a computer-implemented telephony system to display on a computer screen the identity of callers who have left messages on the voice mail system.
While the applications for speaker verification and speaker identification are virtually endless, the solution to performing these two tasks has heretofore proven elusive. Recognizing human speech and particularly discriminating the speaker from other speakers is a complex problem. Rarely does a person speak even a single word the same way twice due to how human speech is produced.
Human speech is the product of air under pressure from the lungs being forced through the vocal cords and modulated by the glottis to produce sound waves that then resonate in the oral and nasal cavities before being articulated by the tongue, jaw, teeth and lips. Many factors affect how these sound producing mechanisms inter-operate. The common cold, for example, greatly alters the resonance of the nasal cavity as well as the tonal quality of the vocal cords.
Given the complexity and variability with which the human produces speech, speaker verification and speaker identification are not readily performed by comparing new speech with a previously recorded speech sample. Employing a high similarity threshold, to exclude impostors, may exclude the authentic speaker when he or she has a head cold. On the other hand, employing a low similarity threshold can make the system prone to false verification.
The present invention uses a model-based analytical approach to speaker verification and speaker identification. Models are constructed and trained upon the speech of known client speakers (and possibly in the case of speaker verification also upon the speech of one or more impostors). These speaker models typically employ a multiplicity of parameters (such as Hidden Markov Model or GMM parameters). Rather than using these parameters directly, the parameters are concatenated to form supervectors. These supervectors, one supervector per speaker, represent the entire training data speaker population.
A linear transformation is performed on the supervectors resulting in a dimensionality reduction that yields a low-dimensional space that we call eigenspace. The basis vectors of this eigenspace we call “eigenvoice” vectors or “eigenvectors”. If desired, the eigenspace can be further dimensionally reduced by discarding some of the eigenvector terms.
Next, each of the speakers comprising the training data is represented in eigenspace, either as a point in eigenspace or as a probability distribution in eigenspace. The former is somewhat less precise in that it treats the speech from each speaker as relatively unchanging. The latter reflects that the speech of each speaker will vary from utterance to utterance.
Having represented the training data for each speaker in eigenspace, the system may then be used to perform speaker verification or speaker identification.
New speech data is obtained and used to construct a supervector that is then dimensionally reduced and represented in the eigenspace. Assessing the proximity of the new speech data to prior data in eigenspace, speaker verification or speaker identification is performed. The new speech from the speaker is verified if its corresponding point or distribution within eigenspace is within a threshold proximity to the training data for that client speaker. The system may reject the new speech as authentic if it falls closer to an impostor's speech when placed in eigenspace.
Speaker identification is performed in a similar fashion. The new speech data is placed in eigenspace and identified with that training speaker whose eigenvector point for distribution is closest.
Assessing proximity between the new speech data and the training data in eigenspace has a number of advantages. First, the eigenspace represents in a concise, low-dimensional way, each entire speaker, not merely a selected few features of each speaker. Proximity computations performed in eigenspace can be made quite rapidly as there are typically considerably fewer dimensions to contend with in eigenspace than there are in the original speaker model space or feature vector space. Also, the system does not require that the new speech data include each and every example or utterance that was used to construct the original training data. Through techniques described herein, it is possible to perform dimensionality reduction on a supervector for which some of its components are missing. The result point for distribution in eigenspace nevertheless will represent the speaker remarkably well.


REFERENCES:
patent: 4032711 (1977-06-01), Sambur
patent: 5054083 (1991-10-01), Naik et al.
patent: 5339385 (1994-08-01), Higgins
patent: 5345535 (1994-09-01), Doddington
patent: 5469529 (1995-11-01), Bimbot et al.
patent: 5548647 (1996-08-01), Naik et al.
patent: 5632002 (1997-05-01), Hashimoto et al.
patent: 5687287 (1997-11-01), Gandhi et al.
patent: 5895447 (1999-04-01), Ittycheriah et al.
patent: 5953700 (1999-09-01), Kanevsky et al.
patent: 6088669 (2000-07-01), Maes
patent: 6141644 (2000-10-01), Kuhn et al.
patent: 6182037 (2001-01-01), Maes
patent: 6205424 (2001-03-01), Goldenthal et al.
patent: 6233555 (2001-05-01), Parthasarathy et al.
patent: 6272463 (2001-08-01), Lapere
patent: 0397399 (1990-11-01), None
patent: 0397399 (1990-11-01), None
patent: WO 96/17341 (1996-06-01), None
patent: WO9617341 (1996-06-01), None
Kuhn, R. et al. “Eigenfaces and Eigenvoices: Dimensionality Reduction for Specialized Pattern Recognition” Proceedings of the International Conference on Spoken Language Processing, 1998, XP000908921, p. 2 & 5.
Kuhn, R. et al. “Eigenvoices for Speaker Adaptation” Proceedings of the International Conference on Spoken Language Processing, 1998, XP000910944, p. 1772, section 3.2. Max. Likelihood Eigen-Decomposition.
Li KP: “Separating Phonetic and Speaker Features of Vowels in Formant Space”, 1987 International Conference on Acoustics, Speech and Signal Processing, pp. 1469-1472, vol. 3, XPOO2148619 1987, New York, NY, USA, IEEE.
“Eigenfaces and Eigenvoices: Dimensionality Reduction For Specialized Pattern Recognition”, 1998 IEEE Second Workshop on Multimedia Signal Processing, Dec. 7-9, 1998, Redondo Beach, California, USA, pp. 71-76 plus cover page.
R. Kuhn, P. Nguyen, J.-C. Junqua, and L. Goldwasser, “Eigenfaces and Eigenvoices: Dimensionality Reduction for Specialized Pattern Recognition”,Proceedings of the International Conference on Spoken Language Processing, pp. 1-6, 1998.
R. Kuhn, P Nguyen, J.-C. Junqua, L. Goldwasser, N. Niedzielski, S. Fincke, K. Field and M. Contolini, “Eigenvoices for

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speaker verification and speaker identification based on a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speaker verification and speaker identification based on a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speaker verification and speaker identification based on a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3327410

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.