Method and apparatus for speaker recognition via comparing...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S250000

Reexamination Certificate

active

06389392

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to automatic pattern recognition in which an unknown input is compared to reference data representative of allowed patterns and the unknown input is identified as the most likely reference pattern.
2. Description of Related Art
Reference data for each member of a set of allowed patterns is stored and a test input compared with the reference data to recognise the input pattern. An important factor to consider in automatic pattern recognition is that of undesired variations in characteristics, for instance in speech or handwriting due to time-localised anomalous events. The anomalies can have different forms such as the communication channel, environmental noise, uncharacteristic sounds from speakers, unmodelled writing conditions etc. The resultant variations cause a mismatch between the corresponding test and reference patterns which in turn can lead to a significant reduction in the recognition accuracy.
The invention has particular, although not exclusive, application to automatic speaker recognition. Speaker recognition covers both the task of speaker identification and speaker verification. In the former case, the task is to identify an unknown speaker as one from a pre-determined set of speakers; in the latter case, the task is to verify that a person is the person they claim to be, again from a pre-determined set of speakers. Hereinafter reference will be made to the field of speaker recognition but the technique is applicable to other fields of pattern recognition.
To improve robustness in automatic speaker recognition, a reference model is usually based on a number of repetitions of the training utterance recorded in multiple sessions. The aim is to increase the possibility of capturing the recording conditions and speaking behaviour which are close to those of the testing through at least one of the utterance repetitions in the training set. The enrolled speaker may then be represented using a single reference model formed by combining the given training utterance repetitions. A potential disadvantage of the above approach is that a training utterance repetition which is very different from the test utterance may corrupt the combined model and hence seriously affect the verification performance. An alternative method is to represent each registered speaker using multiple reference models. However, since the level of mismatch normally varies across the utterance, the improvement achieved in this way may not be significant.
The methods developed previously for introducing robustness into the speaker verification operation have been mainly based on the normalisation of verification scores. The development of these methods has been a direct result of the probabilistic modelling of speakers as described in the article by M. J. Carey and E. S. Parris, “Speaker Verification”, Proceedings of the Institute of Acoustics (UK), vol. 18, pp. 99-106, 1996 and an article by N. S. Jayant, “A Study of Statistical Pattern Verification”, IEEE Transaction on Systems, Man, and Cybernetics, vol. SMC-2, pp. 238-246, 1972. By adopting this method of modelling and using Bayes theorem, the verification score can be expressed as a likelihood ratio. i.e.
Verification



Score
=
likelihood



(
score
)



for



the



target



speaker
likelihood



(
score
)



for



any



speaker
The above expression can be viewed as obtaining the verification score by normalising the score for the target speaker.
A well known normalisation method is that based on the use of a general (speaker-independent) reference model formed by using utterances from a large population of speakers M. J. Carey and E. S. Parris, “Speaker Verification Using Connected Words”, Proceedings of the Institute of Acoustics (UK), vol. 14, pp. 95-100, 1992. In this method, the score for the general model is used for normalising the score for the target speaker. Another effective method in this category involves calculating a statistic of scores for a cohort of speakers, and using this to normalise the score for the target speaker as described in A. E. Rosenberg, J. Delong, C. H. Lee, B. H. Huang, and F. K. Soong, “The Use of Cohort Normalised Scores for Speaker Verification”, Proc. ICSLP, pp. 599-602, 1992 and an article by T. Matsui and S. Furui, “Concatenated Phoneme Models for Text-Variable Speaker Recognition”, Proc. ICASSP, pp. 391-394, 1993. The normalisation methods essentially operate on the assumption that the mismatch is uniform across the given utterance. Based on this assumption, first, the score for the target speaker is calculated using the complete utterance. Then this score is scaled by a certain factor depending on the particular method used.
The invention seeks to reduce the adverse effects of variation in patterns.
In accordance with the invention there is provided a method of pattern recognition.
Thus the invention relies on representing allowed patterns using segmented multiple reference models and minimising the mismatch between the test and reference patterns. This is achieved by using the best segments from the collection of models for each pattern to form a complete reference template.
Preferably the mismatch associated with each individual segment is then estimated and this information is then used to compute a weighting factor for correcting each segmental distance prior to the calculation of the final distance.


REFERENCES:
patent: 4831551 (1989-05-01), Schalk et al.
patent: 5025471 (1991-06-01), Scott et al.
patent: 5167004 (1992-11-01), Netsch et al.
patent: 5199077 (1993-03-01), Wilcox et al.
patent: 5509104 (1996-04-01), Lee et al.
patent: 5649057 (1997-07-01), Lee et al.
patent: 5651094 (1997-07-01), Takagi et al.
patent: 5839103 (1998-11-01), Mammone et al.
patent: WO 98/54694 (1998-12-01), None
Matsui et al, “Speaker recognition Using concatenated Phoneme Models”, pp. 603-606.
Rosenberg et al, “The Use of Cohort Normalized Scores For Speaker Verification”, pp. 599-602.
Carey et al, “Speaker Verification Using Connected Words”, Proceedings of the Institute of Acoustics, Proc.I.O.A., vol. 14, Part 6 (1992), pp. 95-100.
Jayant, “A Study of Statistical Pattern Verification”, IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-2, No. 2, Apr. 1972, pp. 238-246.
Carey et al, “Speaker Verification”, Proceedings of the Institute of Acoustics, Proc.I.O.A., vol. 18, Part 9 (1996).
Gish et al, “A Robust, Segmental Method for Text Independent Speaker Identification”, International Conference on Acoustics Speech and Signal Processing—ICASSP94, vol. 1, Apr. 19, 1994, pp. 1-145-1-148.
Liu, “On Creating Averaging Templates”, International Conference on Acoustics Speech and Signal Processing—ICASSP84, Mar. 19, 1984, pp. 1-4, XP002049465, New York, USA.
Kobatake, et al, “Degraded Word Recognition Based on Segmental Signal-to-Noise Ratio Weighting”, International Conference on Acoustics Speech and Signal Processing—ICASSP94, vol. 1, Apr. 19, 1994, pp. 1-425-1-428.
Matsui et al, “Concatenated Phoneme Models for Text-Variable Speaker Recognition”, 1993 IEEE, pp. 11-391-11-394.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for speaker recognition via comparing... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for speaker recognition via comparing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for speaker recognition via comparing... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2854644

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.