Speech recognition apparatus, method and storage medium thereof

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S246000, C704S247000, C704S248000

Reexamination Certificate

active

06341263

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to a speaker collation apparatus, a method, and a storage medium, and particularly to a speaker collation apparatus, a method, and a storage medium characterized by generation of a standard pattern of inhibition speakers to prepare the standard pattern of inhibition speakers.
A big problem in speaker collation is that differences in ambient noise and difference in line characteristics (environmental differences) in registration and collation decrease the ratio of collation. The method for solving such problem is exemplified by likelihood normalization method on the basis of the standard pattern of inhibition speakers, proposed by Higgins, Rosenberg, and Matsui et al. These examples are A. Higgins, L, Bahler, and J. Porter; “Speaker collation using randomized phrase prompting,” digital signal processing, 1, pp. 89-106 (1991) as the Reference 1; A. E. Rosenberg, Joel Delong, Chin-Hui Lee, Biing-Hweng Juang, Frank K. Soong: “The Use of cohort normalized scores for speaker collation.” ICSLP 92, PP. 599-602 (1992), as the Reference 2; Tomoko Matsui, Sadaoki Furui: “Speaker adaptation of tied-mixture-based phoneme models for text-prompted speaker recognition” ICASSP 94, pp. 125-128 (1994) as the Reference 3.
A likelihood normalization method on the basis of the standard pattern of inhibition speakers is a method to normalize a likelihood by subtracting likelihood (likelihood of inhibition speakers) between an inputted voice and the standard pattern of inhibition speakers from a likelihood (likelihood of the identical person) between an inputted voice and the standard pattern of the identical person. Likelihood not easily affected by environmental differences can be acquired by subtraction of likelihood of inhibition speakers from the likelihood of the identical person, because environmental differences in registration and collation affect both of the likelihood of the identical person and likelihood of inhibition speakers. Known methods for selection of inhibition speakers are a method for selecting inhibition speakers similar to a voice of the identical person in registration and a method for selecting inhibition speakers similar to an inputted voice in collation. The former method is detailedly described in the Reference 2 and the latter method is detailedly described in the Reference 1 and the Reference 3.
In the likelihood normalization method using the standard pattern of inhibition speakers, a good ratio of collation can be acquired in environmental differences as small as possible in registered voice, collated voice and of the standard pattern of inhibition speakers. It is a problem that a large difference in these environmental differences reduces the ratio of collation. In order to solve the problem, many standard patterns of the candidates of inhibition speakers must be previously prepared for respective environments in registration and collation.
However, it is difficult to prepare many standard patterns of the candidates of inhibition speakers for respective environments. Therefore, a method for acquiring a good ratio of collation is required without necessity of preparing the standard patterns of the candidates of inhibition speakers for respective environments.
For a solving method in the case of a large difference in environment between registered voice and the standard pattern of inhibition speakers, a method of normalization of likelihood is proposed by adapting the standard pattern of inhibition speakers using registered voice, by acquiring likelihood (likelihood of inhibition speakers) between the adapted reducing standard pattern and the collated voice, and by subtracting the likelihood of inhibition speakers from the likelihood's of the collated voice and the standard pattern of the identical person.
This method is a method for reducing environmental differences between registered voice and the standard pattern of inhibition speakers by adapting the standard pattern of inhibition speakers on the basis of the voice of the identical person in registration. This method is an effective method in selecting inhibition speakers in registration; and detailedly described in Yamada and Hattori of the reference 4 (a method and a system of generation of a reducing standard pattern namely cohort in speaker recognition and a speaker collation apparatus including the system. Japanese Patent Application No. 1997-040102).
SUMMARY OF THE INVENTION
It is therefore an object of the present invention to provide a speaker collation apparatus, method, and storage medium capable of acquiring a high ratio of collation without previous generation of the standard patterns of the candidates of inhibition speakers for many environments in a method for selection of the standard patterns of inhibition speakers in collation.
Other objects of the present invention will become clear as the description proceeds.
According to an aspect of the present invention, there is provided a speaker collation apparatus comprising; an analysis section for converting an inputted voice data for collation to a characteristic vector, a storage section of the characteristic vector for storing the characteristic vector converted in said analysis section, a storage section of a standard pattern of candidates of inhibition speakers in which one or more standard patterns of candidates of inhibition speakers have been stored, a selection section for selecting at least one inhibition speaker by calculating similarity degree between the characteristic vector converted in said analysis section and the standard patterns of respective speakers stored in said storage section of the standard pattern of candidates of inhibition speakers, an adaptation section for adapting the standard patters of inhibition speakers by acquiring a mapping function from a characteristic vector space of a voice of a inhibition speaker to a characteristic vector space of an inputted voice by using the mapping function acquired, using the standard pattern of inhibition speakers selected in said selection section to select a inhibition speaker and the characteristic vector stored in said storage section for the characteristic vector, a calculation section of a similarity degree of inhibition speakers for calculating the similarity degree between a characteristic vector stored in said storage section of characteristic vector and the standard pattern of inhibition speakers adapted in said adaptation section, a storage section of the standard pattern of the identical person in which the registered standard pattern of the identical person has been stored, a calculation section of a similarity degree to the identical person for calculating the similarity degree between of the characteristic vector stored in said storage section for the characteristic vector and the standard pattern of the identical person stored in said storage section of the standard pattern of the identical person, a normalization section of the similarity degree for normalizing the similarity degree by using the similarity degree calculated in said calculation section of a similarity degree to the identical person and the similarity degree calculated in said calculation section of a similarity degree of inhibition speakers, a threshold value storage section for storing a threshold value previously determined. and a decision section for deciding the person by using the similarity degree normalized in said normalization section of the similarity degree and the threshold value stored in said storage section got storing a threshold value.
The speaker collation apparatus may further comprise; a normalization section for normalizing said characteristic vector converted in said analysis section, said standard pattern of a candidate of inhibition speakers stored in said storage section of said standard pattern of the candidate of inhibition speakers, and said standard pattern of the identical person stored in said storage section of the standard pattern of the identical person.
According to another aspect of the present invention

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech recognition apparatus, method and storage medium thereof does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech recognition apparatus, method and storage medium thereof, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech recognition apparatus, method and storage medium thereof will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2835138

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.