Speech recognition apparatus

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S250000

Reexamination Certificate

active

06253180

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a speech recognition apparatus, and more particularly to a speech recognition apparatus which has an improved speaker adaptation function.
2. Description of the Related Art
As a conventional speaker adaptation system, a thesis entitled “Speaker Adaptation Which Makes Use of Prior Knowledge Regarding Correlation of Movement Vectors” in the Collection of Lecture Papers of the Autumn Meeting for Reading Research Papers in 1997, the Acoustic Society of Japan, Separate Volume I, pp. 23-24, September, 1997 is referred to.
FIG. 3
shows a speech adaptation system of a conventional speech recognition apparatus based on a hidden Markov model (HMM), and
FIG. 4
shows a prior learning system of the conventional speech recognition apparatus of FIG.
3
.
Referring to
FIGS. 3 and 4
, upon speaker adaptation, learning is performed by an HMM learning section
33
using adaptation utterances of a new speaker stored in an adaptation utterance storage section
31
and using speaker independent HMMs (hereinafter referred to as “SI-HMMs”) stored in an SI-HMM storage section
32
in advance as initial models, and HMMs (hereinafter referred to as “BW-HMMs”) obtained as a result of the learning are stored into a BW-HMM storage section
34
.
A subtraction section
35
stores finite differences between parameters of the BW-HMM and the SI-HMM into a first finite difference storage section
36
. Into the first finite difference storage section
36
, only parameter finite differences of those HMMs which appear in the adaptation utterances. For example, if the adaptation utterances include three utterances of “a”, “u” and “o”, since a parameter of the HMM corresponding to the “a” and parameters of the HMMs corresponding to “u” and “o” are learned by the HMM learning section
33
, finite differences between BW-HMMs and SI-HMMs for them are produced.
However, since “i” and “e” do not appear in the adaptation utterances, corresponding HMMs are not learned either, and parameters of the BW-HMMs remain same as the parameters of the SI-HMMs, the finite differences remain equal to 0.
An interpolation parameter storage section
37
stores interpolation parameters determined in prior learning (which will be hereinafter described).
An interpolation section
38
outputs second finite differences as linear sums of the interpolation parameters and the finite differences stored in the first finite difference storage section
36
so that the second finite differences may be stored into a second finite difference storage section
39
.
The second finite differences calculated by the interpolation section
38
are finite differences between parameters of those HMMs which have not appeared in the adaptation utterances and parameter of the SI-HMMs.
In the example described above, finite differences regarding the HMMs of “i” and “e” are calculated as second finite differences.
A re-estimation parameter storage section
41
stores re-estimation parameters determined in prior learning which will be hereinafter described.
A re-estimation section
40
receives the re-estimation parameters and the first and second finite differences as inputs thereto, calculates third finite differences for all HMM parameters, and stores the third finite differences into a third finite difference storage section
42
. In the example described above, the third finite differences are finite differences for parameters of all of the HMMs of “a”, “i”, “u”, “e” and “o”.
An addition section
43
adds the parameters of the SI-HMM and the third finite differences to determine specific speaker HMMs adapted to the new speaker and stores the specific speaker HMMs into an SD-HMM storage section
44
.
Upon prior learning, specific speaker HMMs (SD-HMMs) of a large number of speakers are stored into the SD-HMM storage section
44
, and finite differences (referred to as “third finite differences”) between the parameters of the SD-HMMs of the individual speakers and the parameters of the SI-HMMs calculated by the subtraction section
47
are stored into the third finite difference storage section
42
. Of the third finite differences, those third finite differences for the parameters of the HMMs which appeared in the adaptation utterances upon speaker adaptation are represented by “S”, and the other third finite differences (those for the parameters of the HMMs which did not appear in the adaptation utterances) are referred to as “U”.
An interpolation parameter learning section
45
determines the interpolation parameters so that the square sum of errors, which are differences U−U
1
between linear sums (referred to as “U
1
”) of the third finite differences S and the interpolation parameters and the third finite differences U, for the large number of speakers may be minimum, and stores the determined interpolation parameters into the interpolation parameter storage section
37
.
Then, the linear sums of the determined interpolation parameters and the third finite differences S are outputted as second finite differences so that they are stored into the second finite difference storage section
39
.
A re-estimation parameter learning section
46
determines the re-estimation parameters so that the square sum of errors, which are differences U−U
3
between linear sums (referred to as “U
3
”) of the second finite differences and the re-estimation parameter and the third finite differences U, for the large number of speakers may be minimum, and stores the re-estimation parameters into the re-estimation parameter storage section
41
.
The conventional speech recognition apparatus described above, however, has the following problems.
The first problem resides in that, upon speaker adaptation, interpolation and re-estimation are performed using finite differences (first finite differences) between BW-HMMs produced using adaptation utterances of a new speaker stored in the adaptation utterance storage section and SI-HMMs, but in prior learning for determination of interpolation parameters and re-estimation parameters, only SD-HMMs of a large number of speakers are used to perform learning.
In particular, in prior learning, first finite differences which are used upon speaker adaptation are not used, but third finite differences are used in substitution. Where the number of words of adaptation utterances is sufficiently large, since the SD-HMMs and the BW-HMMs substantially coincide with each other, this substitution is good approximation.
However, in speaker adaptation, it is the most significant subject to minimize the number of words of adaptation utterances. This reduces the burden to utterances of the user.
Where the number of words of adaptation utterances is small, since parameters of the SD-HMMs and the BW-HMMs are significantly different from each other, the approximation accuracy in such substitution as described above upon prior learning (that is, substitution of the first finite differences by the third finite differences) is very low, and it is difficult to estimate interpolation parameters or re-estimation parameters with a high degree of accuracy.
The second problem resides in that, in order to perform speaker adaptation, two linear transforms of interpolation and re-estimation are performed using a single finite difference (stored in the first finite difference storage section).
Where the number of words of adaptation utterances is small, the ratio of HMMs appearing in the utterances is very small. Therefore, it is inevitable to estimate (finite differences of) parameters of the greater part of HMMs by linear interpolation, particularly by linear transform of (finite differences of) parameters of a small number of HMMs which actually appear, and consequently, the accuracy of the second finite difference is very low.
Further, also parameters of those HMMs which have appeared in adaptation utterances are modified by re-estimation using finite differences (second finite differences having a low accuracy) of parameters of a large number of HMMs which have not ap

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech recognition apparatus does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech recognition apparatus, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech recognition apparatus will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2443172

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.