Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1997-05-28
2002-08-13
Ŝmits, T{overscore (a)}alivaldis Ivars (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S222000, C704S245000
Reexamination Certificate
active
06434522
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to an HMM generator, HMM memory device, likelihood calculating device and recognizing device for a novel HMM (Hidden Markov Model) that is applicable to such pattern recognitions as speech recognition.
2.Related Art of the Invention
Although an HMM is generally applicable to the time-series signal processing field, for convenience of explanation, it is described below in relation to, for example speech recognition. A speech recognizing device using HMM will be described first.
FIG. 1
is a block diagram of a speech recognizing device using HMM. A speech analyzing part
201
converts input sound signals to feature vectors at a constant time interval (called a frame) of, for example, 10 msec, by means of a conventional method such as a filter bank, Fourier transformation and LPC analysis. Thus, the input signals are converted to a feature vector series Y=(y(
1
), y(
2
), ———, y(T)), where T is a number of frames. A codebook
202
holds labeled representative vectors. A vector quantizing part
203
substitutes labeled representative vectors corresponding to the closest presented vectors registered in the codebook
202
for respective vectors of the vector series Y. An HMM generating part
204
generates HMM corresponding to words that constitute a recognition vocabulary from training data. In other words, to generate HMM corresponding to a word v, an HMM structure (the number of states and transition rules applicable between the states) is firstly appropriately designated, and a state transition probability in the models and incidence probability of labels occurring in accordance with the state transition are developed from a label system obtained by a multiplicity of vocalizations of the word v such that the incidence probability of the label series is maximized. An HMM memory part
205
stores the HMM thus obtained for the words. A likelihood calculating part
206
calculates the likelihood of respective models stored in the HMM memory part
205
to the label series thereof. A comparison and determination part
207
determines, as a recognition result, words corresponding to models that provide the highest value of likelihood of the respective models.
More specifically, the recognition by HMM is performed in the following manner. When a label series O obtained for an unknown input is assumed to be O=(o(
1
), o(
2
), ———o(T)), a model &lgr; corresponding to a word v, &lgr;
v
, a given state series of length. T generated by the model &lgr;
v
, is X=(x(
1
), x(2), ———, x(T)), the likelihood of &lgr;
v
to the label series O is defined as:
[Exact Solution]
L
1
⁡
(
v
)
=
∑
x
⁢
P
⁡
(
O
,
X
|
λ
v
)
[
formula
⁢
⁢
1
]
[Approximate Solution]
L
2
⁡
(
v
)
=
max
x
⁢
[
P
⁡
(
O
,
X
|
λ
v
)
]
[
formula
⁢
⁢
2
]
or logarithmically as:
L
3
⁡
(
v
)
=
max
x
⁢
[
log
⁢
⁢
P
⁡
(
O
,
X
|
λ
v
)
]
[
formula
⁢
⁢
3
]
where P(x,y|&lgr;
v
) is a joint occurrence probability of x, y in model &lgr;
v
.
Therefore, in the following expression using formula 1, for example;
v
^
=
argmax
v
⁡
[
L
1
⁡
(
v
)
]
[
formula
⁢
⁢
4
]
V{circumflex over ( )} is a recognition result. Formulae 2 and 3 can be used in the same manner.
P(O, X|&lgr;) can be obtained in the following manner.
When an incidence b
i
(o) of label o and a transition probability a
ij
from a state q
i
(i=1~I) to state q
j
(j=1~I+1) are given by state q
i
for q
i
(i=1~I) of HMM, a simultaneous probability of coincidence of state series X=(x(
1
), x(
2
), ———,s(T)) and label series O=(o(
1
), o(
2
), ———,o(T)) from HMM &lgr; is defined as:
P
⁡
(
O
,
X
|
λ
)
=
π
x
⁡
(
1
)
⁢
∏
t
=
1
T
⁢
a
x
⁡
(
t
)
⁢
⁢
x
⁡
(
t
+
1
)
⁢
∏
t
=
1
T
⁢
b
x
⁡
(
t
)
⁡
(
o
⁡
(
t
)
)
[
formula
⁢
⁢
5
]
where &pgr;
x(1)
is an initial probability of state x(
1
). Incidentally, x(T+1)=I+1 is a final state, and it is assured that no label is generated there.
In the example, although an input feature vector y is converted to a label, the feature vector y can be directly used alternatively for the incidence probability of vector y in each state, and in this case the probability density function of vector y can be given for each state. In this case, a probability density b
i
(y) of the feature vector y is used in place of the incidence probability b
i
(o) of the label o in the state q
i
(hereinafter), when z is assumed to be a label, b
i
(z) defines a probability generated with z in a state of i, and, when z is assumed to be a vector, and b
i
(z) defines a probability density of z. In this case, the formulae 1, 2 and 3 are expressed as:
[Exact Solution]
L
1
′
⁡
(
v
)
=
∑
x
⁢
P
⁡
(
Y
,
X
|
λ
v
)
[
formula
⁢
⁢
6
]
[Approximate Solution]
L
2
′
⁡
(
v
)
=
max
x
⁢
[
P
⁡
(
Y
,
X
|
λ
v
)
]
[
formula
⁢
⁢
7
]
or logarithmically as;
L
3
′
⁡
(
v
)
=
max
⁡
[
log
⁢
⁢
P
⁡
(
Y
,
X
|
λ
v
)
]
[
formula
⁢
⁢
8
]
Thus, in any of the methods, when HMM &lgr;
v
is prepared for each word v, where v=1~V, a final recognition result for an input sound signal, Y is:
v
^
=
argmax
v
⁡
[
P
⁡
(
Y
|
λ
v
)
]
[
formula
⁢
⁢
9
]
or
v
^
=
argmax
v
⁡
[
log
⁢
⁢
P
⁡
(
Y
|
λ
v
)
]
[
formula
⁢
⁢
10
]
where Y is, of course, an input label series, feature vectorial series and the like according to the respective methods.
In such conventional examples, a method of converting input feature vectors to labels is hereinafter referred to as a discrete probability distribution HMM, and another method of using input feature vectors as they are as a continuous probability distribution HMM. Features of these are described below.
It is an advantage of the discrete probability distribution HMM that the number of calculations is fewer when calculating likelihood of a model to an input label series, because the incidence probability b
i
(C
m
) of a label in state i can be run by reading from a memory device which prestores the incidence probabilities in relation to the labels, but recognition accuracy is inferior and therefore creates a problem due to errors associated with quantization. In order to prevent this problem, it is necessary to increase the number of labels(the number of clusters) although the number of learning patterns required for learning the models accordingly becomes significant. If the number of learning patterns is insufficient, b
i
(C
m
) may frequently be 0, and correct estimation cannot be obtained. For example, the following case may occur.
In the preparation of a codebook, speeches vocalized by multiple speakers are converted to a feature vector series for all words to be recognized, the set of feature vectors are clustered, and the clusters are respectively labeled. Each of the clusters has its representative vector called a centroid, which is generally an expected value of the vectors classified to the clusters. A codebook is defined as the centroids stored in a form retrievable by the labels.
Now, it is assumed that a word “Osaka”, for example, is present in the recognition vocabulary, and a model corresponding to it is prepared. Voice samples corresponding to the word “Osaka” that are vocalized by multiple speakers are converted to a feature vector series, each of the feature vectors is compared with the centroid, and the label corresponding closest to the centroid is choosen as the Vector Quantized value of the feature vector. In this way, the voice samples corresponding to the word “Osaka” are converted to a label series. By estimating an HMM parameter from the resultant label series in such a manner that the likelihood to the label series is maximized, a model corresponding to the word “Osaka” is obtai
LandOfFree
Combined quantized and continuous feature vector HMM... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Combined quantized and continuous feature vector HMM..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Combined quantized and continuous feature vector HMM... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2935649