Impulsivity estimates of mixtures of the power exponential...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S231000

Reexamination Certificate

active

06804648

ABSTRACT:

BACKGROUND OF THE INVENTION
FIELD OF THE INVENTION
The present invention generally relates to the technology of speech recognition and, more particularly, to a parametric family of multivariate density functions formed by mixture models from univariate functions for modeling acoustic feature vectors used in automatic recognition of speech.
BACKGROUND DESCRIPTION
Most pattern recognition problems require the modeling probability density of feature vectors in feature space. Specifically, in the problem of speech recognition, it is necessary to model the probability density of acoustic feature vectors in the space of phonetic units. Purely Gaussian densities have been known to be inadequate for this purpose due to the heavy tailed distributions observed by speech feature vectors. See, for example, Frederick Jelenik,
Statistical Methods for Speech Recognition
, MIT Press (1997). As an intended remedy to this problem, practically all speech recognition systems attempt modeling by using a mixture model with Gaussian densities for mixture components. Variants of the standard K-means clustering algorithm are used for this purpose. The classical version the K-means algorithm as described by John Hartigan in
Clustering Algorithms
, John Wiley & Sons (1975), and Anil Jain and Richard Dubes in
Algorithms for Clustering Data
, Prentice Hall (1988), can also be viewed as a special case of the expectation-maximization (EM) algorithm (see A. P. Dempster, N. M. Laird and D. B. Baum, “Maximum likelihood from incomplete data via the EM algorithm”,
Journal of Royal Statistical Soc
., Ser. B, vol., 39, pp. 1-38, 1997) for mixtures of Gaussians with variances tending to zero. See also Christopher M. Bishop,
Neural Networks for Pattern Recognition
, Cambridge University Press (1997), and F. Marroquin and J. Girosi, “Some extensions of the K-means algorithm for image segmentation and pattern classification”, MIT Artificial Intelligence Lab. A. I. Memorandum no. 1390, January 1993. The only attempt to model the phonetic units in speech with non-Gaussian mixture densities is described by H. Ney and A. Noll in “Phoneme modeling using continuous mixture densities”,
Proceedings of IEEE Int. Conf on Acoustics Speech and Signal Processing
, pp. 437-440, 1998, where Laplacian densities were used in a heuristic base estimation algorithm.
S. Basu and C. A. Micchelli in “Parametric density estimation for the classification of acoustic feature vectors in speech recognition”,
Nonlinear Modeling: Advanced Black
-
Box Techniques
(Eds. J. A. K. Suykens and J. Vandewalle), pp. 87-118, Kluwer Academic Publishers, Boston (1998), attempted to model speech data by building probability densities from a given univariate function h(t) for t≧0. Specifically, Basu and Micchelli considered mixture models from component densities of the form
p

(
x

u
,

)
=
ρ
d



1
det






exp



(
-
(
h

(
Q

(
x
)
)
)
,
x

R
d



where
(
1
)
Q

(
x
)
=
γ
d

(
x
-
μ
)
t


-
1

(
x
-
u
)
,
x

R
d
,
(
2
)
m
β
=

R
+

t
β

f

(
t
)




t
,
(
3
)
(when the integral is finite and R
+
denotes the positive real axis)
ρ
d
=
Γ

(
d
2
)

(
m
d
2
)
d
2
π
d
2

(
m
d
2
-
1
)
d
2
+
1
,
and
(
4
)
γ
d
=
m
d
2
d



m
d
2
-
1
.
(
5
)
If the constraints &rgr;
d
and &ggr;
d
are positive and finite, then the vector &mgr;∈R
d
and the positive definite symmetric d×d matrix &Sgr; are the mean and the covariance of this density. Particular attention was given to the choice h(t)=t
&agr;/2
, t>0, &agr;>0; the case &agr;=2 corresponds to the Gaussian density, whereas the Laplacian case considered by H. Ney and A. Noll, supra, corresponds to &agr;=1. Smaller values of &agr; correspond to more peaked distributions (&agr;→0 yields the &dgr; function), whereas larger values of &agr; correspond to distributions with flat tops (&agr;→∞ yields the uniform distribution over elliptical regions). For more details about these issues see S. Basu and C. Micchelli, supra. This particular choice of densities has been studied in the literature and referred to in various ways; e.g., &agr;-stable densities as well as power exponential distributions. See, for example, E. Gòmez, M. A. Gòmez-Villegas, and J. M. Marin, “A multivariate generalization of the power exponential family of distributions”,
Comm. Stat.—Theory Meth
. 17(3), pp. 589-600, 1998, and Owen Kenny, Douglas Nelson, John Bodenschatz and Heather A. McMonagle, “Separation of nonspontaneous and spontaneous speech”,
Proc. ICASSP
, 1998.
In S. Basu and C. Micchelli, supra, an iterative algorithm having the expectation-maximization (EM) flavor for estimating the parameters was obtained and used for a range of fixed values of &agr; (as opposed to the choice of &agr;=1 in H. Ney and A. Noll, supra, and &agr;=2 in standard speech recognition systems). A preliminary conclusion from the study in S. Basu and C. Micchelli was that the distribution of speech feature vectors in the acoustic space are better modeled by mixture models with non-Gaussian mixture components corresponding to &agr;<1. As a consequence of these encouraging results, we became interested in automatically finding the “best” value of &agr; directly from the data. It is this issue that is the subject of the present invention.
SUMMARY OF THE INVENTION
It is therefore an object of the present invention to provide a parametric family of multivariate density functions formed by mixture models from univariate functions of the type exp(−|x|
&bgr;
) for modeling acoustic feature vectors used in automatic recognition of speech.
According to the invention, the parameter &bgr; is used to measure the non-Gaussian nature of the data. In the practice of the invention, &bgr; is estimated from the data using a maximum likelihood criterion. Among other things, there is a balance between &bgr; and the number of data points N that must be satisfied for efficient estimation. The computer implemented method for automatic machine recognition of speech iteratively refines parameter estimates of densities comprising mixtures of power exponential distributions whose parameters are means (&mgr;), variances (&sgr;), impulsivity numbers (&agr;) and weights (w). The iterative refining process begins by predetermining initial values of the parameters &mgr;, &sgr; and w. Then, {circumflex over (&mgr;)}
l
, {circumflex over (&sgr;)}
l
derived from the following equations
μ
i
l
=

k
=
1
N

(

j
=
1
d

(
x
j
k
-
μ
^
j
l
)
2
σ
^
j
l
)
α
^
l
/
2
-
1

A
lk

x
i
k

k
=
1
N

(

j
=
1
d

(
x
j
k
-
μ
^
j
l
)
2
σ
^
j
l
)
α
^
l
/
2
-
1

A
lk



and
σ
i
l
=
α
^
l

γ
d

(
α
^
l
)
α
^
l
/
2


k
=
1
N

(

j
=
1
d

(
x
j
k
-
μ
^
j
l
)
2
σ
^
j
l
)
α
^
l
/
2
-
1

A
lk

(
x
i
k
-
μ
^
i
l
)
2
A
l


for i=1, . . . ,d and l=1, . . . ,m. Then &sgr; is updated by assuming that &thgr;=(&mgr;,&sgr;,&agr;), {circumflex over (&thgr;)}=({circumflex over (&mgr;)},{circumflex over (&sgr;)},{circumflex over (&agr;)}) and letting H(&mgr;,&sgr;)=E
{circumflex over (&thgr;)}
(log f(·|&thgr;)), in which case H has a unique global maximum at &mgr;={circumflex over (&mgr;)}, &sgr;={circumflex over (&sgr;)} where
β

(
α
,
α
^
)
=
{
α



Γ



(
α
+
1
α
^
)
Γ



(
1
α
^
)
}
2
α

Γ



(
3
α
)

Γ



(
1
α
^
)
Γ



(
3
α
^
)

Γ



(
1
α
)
The l dimension is set by &mgr;
l
={circumflex over (&mgr;)}
l
, &sgr;

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Impulsivity estimates of mixtures of the power exponential... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Impulsivity estimates of mixtures of the power exponential..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Impulsivity estimates of mixtures of the power exponential... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3286702

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.