Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2001-08-21
2004-08-17
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S238000, C704S239000
Reexamination Certificate
active
06778957
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to handset detection using cepstral covariance matrices and distance metrics.
BACKGROUND OF THE INVENTION
Automatic verification or identification of a person by their speech is attracting greater interest as an increasing number of business transactions are being performed over the phone, where automatic speaker identification is desired or required in many applications. In the past several decades, three techniques have been developed for speaker recognition, namely (1) Gaussian mixture model (GMM) methods, (2) vector quantization (VQ) methods, and (3) various distance measure methods. The invention is directed to the last class of techniques.
The performance of current automatic speech and speaker recognition technology is quite sensitive to certain adverse environmental conditions, such as background noise, channel distortions, speaker variations, and the like. The handset distortion is one of the main factors that contribute to degradation of the speech and speaker recognizer. In the current speech technology, the common way to remove handset distortion is the cepstral mean normalization, which is based on the assumption that handset distortion is linear, but in fact the distortion is not linear. This creates a problem in real-world applications because the handset used to record voice samples for identification purposes will more than likely be different than the type of handset used by the person we wish to identify, commonly referred to as a “cross-handset” identification problem.
When applied to cross-handset speaker identification using the Lincoln Laboratory Handset Database (LLHD), the cepstral mean normalization technique has an error rate in excess of about 20%. Consider that the error rate for same-handset speaker identification is only about 7%, and it can be seen that channel distortion caused by the handset is not linear. It is therefore desirable to remove the effects of these non-linear distortions, but before that's possible, it will first be necessary to identify the handsets.
SUMMARY OF THE INVENTION
Disclosed is a method of automated handset identification, comprising receiving a sample speech input signal from a sample handset; deriving a cepstral covariance sample matrix from said first sample speech signal; calculating, with a distance metric, all distances between said sample matrix and one or more cepstral covariance handset matrices, wherein each said handset matrix is derived from a plurality of speech signals taken from different speakers through the same handset; and determining if the smallest of said distances is below a predetermined threshold value.
In another aspect of the method, said distance metric is selected from
d
1
⁢
(
S
,
Σ
)
=
A
H
-
1
,
⁢
d
5
⁢
(
S
,
Σ
)
=
A
+
1
H
_
-
2
,
d
6
⁢
(
S
,
Σ
)
=
(
A
+
1
H
_
)
⁢
(
G
+
1
G
)
-
4
,
⁢
d
7
⁢
(
S
,
Σ
)
=
A
2
⁢
H
_
⁢
(
G
+
1
G
)
-
1
,
⁢
d
8
⁢
(
S
,
Σ
)
=
(
A
+
1
H
_
)
(
G
+
1
G
)
-
1
,
⁢
d
9
⁢
(
S
,
Σ
)
=
A
G
_
+
G
H
-
2
,
an fusion derivatives thereof.
In another aspect of the method, said handset matrices are stored in a database of handset matrices wherein each handset matrix is derived from a unique make and model of handset.
In another aspect of the method, said different speakers number ten or more.
In another aspect of the method, said different speakers is no less than twenty.
Disclosed is a program storage device, readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for automated handset identification, said method steps comprising receiving a sample speech input signal from a sample handset; deriving a cepstral covariance sample matrix from said first sample speech signal; calculating, with a distance metric, all distances between said sample matrix and one or more cepstral covariance handset matrices, wherein each said handset matrix is derived from a plurality of speech signals taken from different speakers through the same handset; and determining if the smallest of said distances is below a predetermined threshold value.
In another aspect of the invention, said distance metric is selected from
d
1
⁢
(
S
,
Σ
)
=
A
H
-
1
,
⁢
d
5
⁢
(
S
,
Σ
)
=
A
+
1
H
_
-
2
,
d
6
⁢
(
S
,
Σ
)
=
(
A
+
1
H
_
)
⁢
(
G
+
1
G
)
-
4
,
⁢
d
7
⁢
(
S
,
Σ
)
=
A
2
⁢
H
_
⁢
(
G
+
1
G
)
-
1
,
⁢
d
8
⁢
(
S
,
Σ
)
=
(
A
+
1
H
_
)
(
G
+
1
G
)
-
1
,
⁢
d
9
⁢
(
S
,
Σ
)
=
A
G
_
+
G
H
-
2
,
and fusion derivatives thereof.
In another aspect of the invention, said handset matrices are stored in a database of handset matrices wherein each handset matrix is derived from a unique make and model of handset.
In another aspect of the invention, said different speakers number ten or more.
In another aspect of the invention, the number of said different speakers is no less than twenty.
Disclosed is an automated handset identification system, comprising means for receiving a sample speech input signal from a sample handset; means for deriving a cepstral covariance sample matrix from said first sample speech signal; means for calculating, with a distance metric, all distances between said sample matrix and one or more cepstral covariance handset matrices, wherein each said handset matrix is derived from a plurality of speech signals taken from different speakers through the same handset; and means for determining if the smallest of said distances is below a predetermined threshold value.
In another aspect of the invention, said means for receiving sample speech is in communication with an incoming line of communication.
In another aspect of the invention, said incoming line of communication is a phone line.
REFERENCES:
patent: 5167004 (1992-11-01), Netsch et al.
patent: 5528731 (1996-06-01), Sachs et al.
patent: 5727124 (1998-03-01), Lee et al.
patent: 5765124 (1998-06-01), Rose et al.
patent: 5950157 (1999-09-01), Heck et al.
patent: 5960397 (1999-09-01), Rahim
patent: 5995927 (1999-11-01), Li
patent: 6151573 (2000-11-01), Gong
patent: 6263309 (2001-07-01), Nguyen et al.
patent: 6327565 (2001-12-01), Kuhn et al.
patent: 6449594 (2002-09-01), Hwang et al.
patent: 6615172 (2003-09-01), Bennett et al.
Ivandro Sanches, “Noise-Compensated Hidden Markov Models”, IEEE Transactions on Speech and Audio Processing, vol. 8, No. 5, Sep. 2000, pp. 533 to 540.*
Wang, Zhong-Hua, et al.,New Distance Measures for Text-Independent Speaker Identification, International Conference for Spoken Language Processing (2000).
Sonmez, M.K.,Progressive Cepstral Normalization for Robust Speech Recognition/Speaker Identification, Institute for Systems Research, Aug. 4, 1999.
Davis, Steven B.,Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences, IEEE, 1980, pp. 65-74.
Gish, Herbert,Robust Discrimination in Automatic Speaker Identification, IEEE, 1990, pp. 289-292.
Soong, F.K., et al.,A Vector Quantization Approach to Speaker Recognition, IEEE, 1985, pp. 387-390.
Rose, Richard C., et al.,Text-Independent Speaker Identification Using Automatic Acoustic Segmentation, IEEE, 1990, pp. 293-296.
Reynolds, Douglas A.,HTIMIT and LLHDB: Speech Corpora for the Study of Handset Transducer Effects, ICASSP, pp. 1535-1538 (May 1977).
Cohen, Arnon, et al.,On Text Independent Speaker Identification using a Quadratic Classifier with Optimal Features, Speech Communication 8 (1989), pp. 35-44.
Sue Johnson,Speaker Tracking, Mphil Thesis, Jesus College, Aug. 1997, Cambridge University Engineering Department, Cambridge, England.
Lubensky David
Wang Zhong-Hua
Wu Cheng
Dang Thu A.
Dorvil Richemond
F. Chau & Associates LLC
International Business Machines - Corporation
Lerner Martin
LandOfFree
Method and apparatus for handset detection does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for handset detection, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for handset detection will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3360631