Method and apparatus for multi-environment speaker verification

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S246000

Reexamination Certificate

active

06253179

ABSTRACT:

TECHNICAL FIELD
The present invention relates generally to the field of speaker verification.
BACKGROUND OF THE INVENTION
The use of speaker verification systems for security and other purposes has been growing in recent years. In a conventional speaker verification system, speech samples of known speakers are obtained and used to develop some sort of speaker model for each speaker. Each speaker model typically contains clusters or distributions of audio feature data derived from the associated speech sample. In operation of a speaker verification system, a person (the claimant) wishing to, e.g., access certain data, enter a particular building, etc., claims to be a registered speaker who has previously submitted a speech sample to the system. The verification system prompts the claimant to speak a short phrase or sentence. The speech is recorded and analyzed to compare it to the stored speaker model with the claimed identification (ID). If the speech is within a predetermined distance (closeness) to the corresponding model, the speaker is verified.
The environment in which the speech is sampled influences the characteristics of the recorded speech data, both for training data and test data. Thus, one of the design issues of a speaker verification system is how to account for the different environments in which training data and test data (of a claimant) are taken. Varying channels, e.g., different types of microphones, telephones or communication links, affect the parameters of a person's speech on the receiving end. In many speech verification systems, it must be assumed that any source of speech can be received over any one of a number of channels. Thus, any modifications that the channels cause in the source data must be accounted for, a procedure referred to as environment normalization.
Current approaches to channel (environment) normalization involve, in one form or another, a supervised training phase to separate and group the training and/or testing data according to a predetermined set of “models” corresponding to each of the channels. Channel dependent background models and statistics are then derived from these groups. A number of existing techniques compare received data to the claimed source model in light of the various background models. A different approach involves trying to make the data received over any of the channels look as if it was received over some canonical channel, thus mitigating the influence of the channel. Here again, the channels must be known so that they can be inverted. A shortcoming of these supervised training techniques is that, in some applications, they are unrealistic because of the requirement that each channel that may be used must be modeled and known ahead of time.
For other pattern matching problems aside from speech verification, environment normalization is likewise a problem that needs to be addressed. The general problem, which includes the speaker verification situation, is how to accept two patterns as being similar when the comparisons are (or may be) performed under mismatched conditions. The mismatched conditions may be, for example, different lighting conditions or shadows for face recognition; different noise conditions for image recognition; different foreground and lighting noise for background texture recognition; and different reception channels for speaker recognition.
SUMMARY OF THE DISCLOSURE
The present disclosure relates to a method for unsupervised environmental normalization for speaker verification using hierarchical clustering. In an illustrative embodiment, training data (speech samples) are taken from T enrolled (registered) speakers over any one of M channels, e.g., different microphones, communication links, etc. For each speaker, a speaker model is generated, each containing a collection of distributions of audio feature data derived from the speech sample of that speaker. A hierarchical speaker model tree is created, e.g., by merging similar speaker models on a layer by layer basis. Each speaker is also grouped into a cohort of similar speakers. For each cohort, one or more complementary speaker models are generated by merging speaker models outside that cohort. The complementary speaker model(s) is used to reduce false acceptances during a subsequent speaker verification operation.
When training data from a new speaker to be enrolled is received over a new channel, the speaker model tree as well as the complementary models are updated. Thus, adaptation to data from new environments is possible by incorporating such data into the verification model whenever it is encountered.


REFERENCES:
patent: 5687287 (1997-11-01), Gandhi et al.
patent: 5806029 (1998-09-01), Buhrke et al.
patent: 5963906 (1999-10-01), Turin
patent: 6006184 (1999-12-01), Yamada et al.
patent: 6038528 (2000-03-01), Mammone et al.
patent: 6058205 (2000-05-01), Bahl et al.
patent: 6073096 (2000-06-01), Gao et al.
patent: 6073101 (2000-06-01), Maes
patent: 6081660 (2000-06-01), Macleod et al.
patent: 6107935 (2000-08-01), Comerford et al.
Rosenberg et al., “Speaker background models for connected digit password speaker verification,” 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, May 1996, pp. 81 to 84.*
Li et al., “Normalized discriminant analysis with application to a hybrid speaker-verification system,” 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, May 1996, pp. 681 to 684.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for multi-environment speaker verification does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for multi-environment speaker verification, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for multi-environment speaker verification will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2523915

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.