Method and apparatus for speaker recognition

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Method and apparatus for speaker recognition Method and apparatus for speaker recognition

: 1999-03-04
: 2002-02-19
: Korzuch, William (Department: 2641)
: Data processing: speech signal processing, linguistics, language
: Speech signal processing
: Recognition

: C704S238000
: Reexamination Certificate
: active
: 06349280
: ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to method and apparatus for speaker recognition and, more particularly, to a method of and an apparatus for recognizing or identifying a speaker.
Heretofore, speaker recognition independently of speech content is usually performed on the basis of the distance between a feature parameter of an input speech and a registered parameter of a speech, which has been produced by the speaker to be recognized.
Denoting the input speech parameter series by {right arrow over (x)}
i
, the registered speech parameter series by {right arrow over (y)}
j
(I and J are sample number) and the distance between these parameter series by D
old
, D
old
is obtained from the following Formulas. Symbol “∥·∥” represents Euclid distance.
D
old
=
∑
i
=
1
I
⁢
D
⁡
(
i
)
D(i)=min∥{right arrow over (x)}
i
−{right arrow over (y)}
i
∥
2
In order to reduce the computational effort and the memory capacity, it is also in practice that, instead of directly storing the feature vector series of speeches, a feature vector series {right arrow over (c)}
k
obtained by vector quantization is stored as a reference pattern.
D
old
′
=
∑
i
=
1
I
⁢
D
′
⁡
(
i
)
D
′
(i)=min∥{right arrow over (x)}
i
−{right arrow over (c)}
k
∥
2
In the above prior art techniques, for accurately determining the distance, speeches contained in an input speech should all be preliminarily stored and relatively long-time speech is used for registering the speaker to be recognized. From the standpoint of the user's burden, speech necessary for the registration is preferably as little as possible. Reducing the necessary speech, however, results in an increase of non-registered phonemes contained in the input speech, thus reducing the accuracy of collation or matching.
As a means for solving this problem, a method disclosed in Japanese Patent Application No. 2-76296 (hereinafter referred to as Literature 1) is utilized. In this method, sizes of overlap parts of an input speech and a registered speech and also inter-overlap-part distances are utilized to determine the similarity measure.
FIG. 5
shows the system disclosed in Literature 1. As shown, the system comprises overlap size calculating part, which determines, as the size of overlap part, the number of input speech samples contained in an overlap parts of the distributions of an input speech and a reference speech, and an overlap part inter-element distance calculating part. The distance between the input and reference speech patterns is determined from the results of calculations in these parts according to the following Formula.
D
new
=
∑
i
=
1
I
⁢
d
i
+
d
out
⁡
(
u
max
-
u
)
u
max
d
i
=
{
min
⁢
&LeftDoubleBracketingBar;
x
→
i
-
c
→
k
&RightDoubleBracketingBar;
2
,
for
⁢

⁢
A
i
≠
0
(
1
)
0
otherwise
(
2
)
}
A
i
=
{
k
❘
1
≤
k
≤
Kand
⁢
&LeftDoubleBracketingBar;
x
→
i
-
c
→
k
&RightDoubleBracketingBar;
≤
l
w
}
U: number of samples corresponding to (1)
U
max
: maximum number of samples corresponding to (1) for all reference patterns
d
out
: fixed distance for samples corresponding to (2)
l
k
: coverage of k-th element {right arrow over (c)}
k
of reference pattern
∥·∥Euclid distance
More specifically, a coverage l
k
of each reference speech pattern element is previously determined, and when the distance d
i
between the nearest element in the reference speech pattern and the input speech pattern exceeds its coverage, a separately determined penalty distance d
out
is added to all input speech pattern feature vectors, and the result is normalized by the overlap part size U
max
.
In this method, however, the overlap part size U
max
is determined from all reference patterns. Therefore, where registration is performed by using speeches of different contents with different speakers, the input speech content of a speaker may be close to the registered speech of a different speaker. In such a case, the U
max
may be unfairly significantly evaluated, giving rise to performance deterioration. For this reason, substantially the same number of different kinds of phonemes should be contained in the contents of the registered speeches.
In addition, according to Literature 1, the coverage of each reference pattern element is determined on the basis of the distance from the center of a cluster (i.e., element {right arrow over (c)}
k
) to the farthest distance feature parameter contained in that cluster. However, even with the same phoneme, the feature parameter varies with different speakers, and this means that it is difficult to obtain stable distribution overlap estimation.
SUMMARY OF THE INVENTION
The present invention, accordingly, has an object of providing a speaker recognition system capable of stable recognition irrespective of speakers and registration by using various speeches through an identity
on-identity check of contents of an input speech and a registered speech by speech recognition.
(1) According to a first aspect of the present invention, there is provided a method of recognizing a speaker of an input speech according to the distance between an input speech pattern, obtained by converting the input speech to a feature parameter series, and a reference pattern preliminarily registered as feature parameter series for each speaker, comprising steps of:
obtaining contents of the input and reference speech patterns by recognition;
determining an identical section, in which the contents of the input and reference speech patterns are identical;
determining the distance between the input and reference speech patterns in the calculated identical content section;
normalizing the input speech pattern by one of copying the input speech pattern and weighting the distance determined between the input and reference speech patterns if the distance between the input and reference speech patterns is greater than a predetermined value, in which the distance between the input and reference speech patterns is decreased by normalization to reduce the adverse effects of noise; and
recognizing the speaker of the input speech on the basis of the determined distance.
(2) According to a second aspect of the present invention, there is provided a method of recognizing a speaker of an input speech independently of the content thereof by converting the input speech to an input speech pattern as a feature parameter series and determining the difference of the input speech pattern from a reference speech pattern registered for each speaker, the method comprising the steps of:
obtaining the contents of the input and reference patterns by speech recognition, and determining the distance by determining identical content sections of the input and reference speech patterns from the obtained pattern content data.
(3) According to a third aspect of the present invention, there is provided a method of recognizing a speaker of an input speech comprising steps of:
determining an identical section of the input speech and a reference speech;
copying the input speech and reference speech in an unspecified speaker's acoustical model;
determining a distance between the copied input speech and a reference speech at least for the identical section;
normalizing the input speech pattern by one of copying the input speech pattern and weighting the distance determined between the input and reference speech patterns if the distance between the input and reference speech patterns is greater than a predetermined value, in which the distance between the input and reference speech patterns is decreased by normalization to reduce the adverse effects of noise; and
recognizing the speaker of the input speech.
(4) According to a fourth aspect of the present invention, there is provided a method of recognizing a speaker of an input speech comprising steps of:
copying the input speech and reference speech in an unspecified speaker's acoustical model;
determining

Affiliated with

Hattori Hiroaki

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Foley & Lardner

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Korzuch William

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

McFadden Susan

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

NEC Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for speaker recognition does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for speaker recognition, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for speaker recognition will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2943175

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure