Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2000-04-04
2003-05-20
Banks-Harold, Marsha D. (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S245000, C704S250000
Reexamination Certificate
active
06567776
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a speaker-independent speech recognition method, and more particularly, to a speech recognition method using speaker cluster models, which can be used in products involving speech recognition such as spoken dialogue systems and auto-attendant systems.
2. Description of the Related Art
From related art, we learn that speaker cluster models have been applied to speaker-independent speech recognition and speaker adaptation. Although used in different application fields, the speaker cluster models are built in the same training phases. A training phase starts with dividing speakers into different speaker clusters. Then a cluster-dependent model is independently trained for each speaker cluster by using the speech data of the speakers belonging to the cluster. The collection of all cluster-dependent models then forms a speaker cluster model. Most approaches in building speaker cluster models are focused on means of dividing speakers into clusters, especially in finding measurement of similarities across speakers. Some speaker clustering methods reported in articles of the related art are as follows:
1. Using acoustic distances across speakers to measure similarities across speakers (T. Kosaka and S. Sagayama, “Tree-structured speaker clustering for fast speaker adaptation”, Proceeding of ICASSP94, pp.245-248, 1994; Y. Gao, M. Padmanabhan and M. Picheny, “Speaker adaptation based on pre-clustering training speakers”, Proceeding of EUROSPEECH97, pp.2091-2094, 1997)
2. Using vocal-tract-size related articulatory parameters to measure similarities across speakers (M. Naito, L. Deng and Y. Sagisaka, “Speaker clustering for speech recognition using the parameters characterizing vocal-tract dimensions”, Proceeding of ICASSP98, pp.981-984, 1998)
3. Clustering the speakers according to three classes of speaking rate—fast, medium and slow (T. J. Hazen and J. R. Glass, “A comparison of novel techniques for instantaneous speaker adaptation”, Proceeding of EUROSPEECH97, pp.2047-2050, 1997).
The difference among the three aforementioned speaker clustering methods is that their methods for measuring similarities across speakers are different. There are two different speaker cluster algorithms according to clustering structure. The first algorithm is called plain speaker cluster algorithm. This algorithm clusters all speakers directly using one of the aforementioned speaker clustering methods. The second algorithm is called tree-structured speaker cluster algorithm. Please refer to
FIG. 1
which illustrates a tree-structured speaker cluster model
10
. The speaker cluster model
10
has a root speaker cluster A
100
where all speakers belong. The speakers in the root speaker cluster A
100
are divided into male speaker cluster M
102
and female speaker cluster F
104
according to their gender. The male speakers in the male speaker cluster M
102
are further clustered into speaker clusters M
1
112
and M
2
114
, respectively. The female speakers in the female speaker cluster F
104
are further clustered into speaker clusters F
1
122
and F
2
124
, respectively.
When the speaker cluster model is applied to speaker-independent speech recognition where the testing speaker who utters a speech signal is unknown, two specific decision rules are commonly employed:
I. Build a cluster pre-selection model in addition to the speaker cluster model; when receiving the speech signal, use the cluster pre-selection model to pre-select a speaker cluster to which the testing speaker who utters the speech signal most probably belongs, and only use the cluster-dependent model of the selected speaker cluster to recognize the speech signal.
II. Find a best candidate for each speaker cluster by using each of the speaker cluster models as a recognition model to recognize the speech signal, and choose as the final recognition result a candidate with the highest score across all speaker clusters.
The present invention uses the speaker cluster model in speaker-independent speech recognition. Therefore, only related techniques are introduced.
In the training phase of the speaker cluster model, the methods of the related art emphasize on how to cluster speakers. Their purpose is to cluster speakers with similar characteristics into the same speaker cluster. However, the purpose of speech recognition is to correctly recognize a speech signal. Therefore, the two purposes are not exactly the same. In other words, improving the effectiveness of speaker clustering does not necessarily improve the accuracy of speech recognition. In a recognition phase, regardless which related art recognition algorithm is used, each cluster-dependent model is seen as an independent recognition model. The dependency among different cluster-dependent models is never considered.
Clustering speakers with similar characteristics absolutely into the same speaker cluster is a difficult task. Please refer to FIG.
2
.
FIG. 2
shows two speaker clusters
202
,
204
. The speaker clusters
202
,
204
have an overlapping area
206
. That means that, although the speakers in each speaker cluster
202
,
204
have substantially similar characteristics, some of the speakers in one speaker cluster have characteristics similar to those of the speakers in the other speaker cluster. For example, suppose there are four speakers W, X, Y and Z. Speaker W and speaker X have similar characteristics; speaker X and speaker Y have similar characteristics; and speaker Y and speaker Z have similar characteristics. When clustering, assuming that the speakers W and X are clustered into the speaker cluster
202
, the speakers Y and Z are clustered into the speaker cluster
204
, because the speakers X and Y have similar characteristics, they form the overlapping area
206
. In a speech recognition phase, when a testing speaker who inputs a speech signal has characteristics between that of the speaker X and that of the speaker Y, if each cluster-dependent model is treated as an independent recognition model, without considering the influence that its dependency with other cluster dependent models has on recognition, the overlapping phenomena generated by clustering may have a negative effect on recognition.
SUMMARY OF THE INVENTION
It is therefore an objective of the present invention to provide a speech recognition method for improving the performance of speech recognition.
To achieve the aforementioned goal, the present invention introduces the dependency among a plurality of cluster-dependent models to overcome recognition problems caused by between-speaker variability for improving the performance of speech recognition. The speech recognition method introduced in the present invention comprises the following steps: receiving a speech signal; recognizing the speech signal using a speaker cluster model obtained in a training phase wherein the speaker cluster model is a collection of a plurality of cluster-dependent models, and a score of each candidate is calculated according to a score function which is defined by taking the dependency among the cluster-dependent models into account; and obtaining a final recognition result according to a decision rule based on the score of each candidate.
The training phase comprises building an initialization model; adjusting parameters of at least two cluster-dependent models of the initialization model by using a discriminative training method to obtain the speaker cluster model wherein the discriminative training method is implemented by using a minimum classification error as a training criterion, a discriminant function of the discriminative training method is defined in the same manner as the score function.
Drawings are incorporated with the implementation hereinafter to further describe the present invention in detail.
REFERENCES:
patent: 5787394 (1998-07-01), Bahl et al.
patent: 6006184 (1999-12-01), Yamada et al.
patent: 6073096 (2000-06-01), Gao et al.
patent: 6107935 (2000-08-01), Comerford et al.
patent: 6125345 (2000-09-01), Modi e
Chang Sen-Chia
Chien Shih-Chieh
Penwu Chung-Mou
Banks-Harold Marsha D.
Industrial Technology Research Institute
Lerner Martin
LandOfFree
Speech recognition method using speaker cluster models does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech recognition method using speaker cluster models, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech recognition method using speaker cluster models will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3082025