Penalized maximum likelihood estimation methods, the baum...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S231000

Reexamination Certificate

active

06374216

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to methods of speech recognition and, more particularly, to nonparametric density estimation of high dimensional data for use in training models for speech recognition.
2. Background Description
In the present invention, we are concerned with nonparametric density estimation of high dimensional data. The invention is driven by its potential application to training speech data where traditionally only parametric methods have been used. Parametric models typically lead to large scale optimization problems associated with a desire to maximize the likelihood of the data. In particular, mixture models of gaussians are used for training acoustic vectors for speech recognition, and the parameters of the model are obtained by using K-means clustering and the EM algorithm, see F. Jelinek, Statistical Methods for Speech Recognition, The MIT Press, Cambridge Mass., 1998. Here we consider the possibility of maximizing the penalized likelihood of the data as a means to identify nonparametric density estimators, see I. J. Good and R. A. Gaskin, “Nonparametric roughness penalties for probability densities,”
Biometrika
58, pp. 255-77, 1971. We develop various mathematical properties of this point of view, propose several algorithms for the numerical solution of the optimization problems we encounter, and we report on some of our computational experience with these methods. In this regard, we integrate within our framework a technique that is central in many aspects of the statistical analysis of acoustic data, namely the Baum Welch algorithm, which is especially important for the training of Hidden Markov Models, see again the book by F. Jelinek, cited above.
Let us recall the mechanism in which density estimation of high dimensional data arises in speech recognition. In this context, a principal task is to convert acoustic waveforms into text. The first step in the process is to isolate important features of the waveform over small time intervals (typically 25 mls). These features, represented by a vector x&egr;R
d
(where d usually is 39) are then identified with context dependent sounds, for example, phonemes such as “AA”, “AE”, “K”, “H”. Strings of such basic sounds are then converted into words using a dictionary of acoustic representations of words. For example, the phonetic spelling of the word “cat” is “K AE T”. In an ideal situation the feature vectors generated by the speech waveform would be converted into a string of phonemes “K . . . K AE . . . AE T . . . T” from which we can recognize the word “cat” (unfortunately, a phoneme string seldom matches the acoustic spelling exactly).
One of the important problems associated with this process is to identify a phoneme label for an individual acoustic vector x. Training data is provided for the purpose of classifying a given acoustic vector. A standard approach for classification in speech recognition is to generate initial “prototypes” by K-means clustering and then refine them by using the EM algorithm based on mixture models of gaussian densities, cf F. Jelinek, cited above. Moreover, in the decoding stage of speech recognition (formation of Hidden Markoff Models) the output probability density functions are most commonly assumed to be a mixture of gaussian density functions, cf. L. E. Baum and J. A. Eagon, “An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model of ecology,”
Bull. Amer. Math. Soc.
73, pp. 360-63, 1967; L. A. Liporace, “Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree,”
IEEE Trans. on Information Theory
5, pp. 729-34, 1982; R. A. Gopinath, “Constrained maximum likelihood modeling with gaussian distributions,” Broadcast News Transcription and Understanding Workshop, 1998.
SUMMARY OF THE INVENTION
According to this invention, we adopt the commonly used approach to classification and think of the acoustic vectors for a given sound as a random variable whose density is estimated from the data. When the densities are found for all the basic sounds (this is the training stage) an acoustic vector is assigned the phoneme label corresponding to the highest scoring likelihood (probability). This information is the basis of the decoding of acoustic vectors into text.
Since in speech recognition x is typically a high dimensional vector and each basic sound has only several thousand data vectors to model it, the training data is relatively sparse. Recent work on the classification of acoustic vectors, see S. Basu and C. A. Micchelli, “Maximum likelihood estimation for acoustic vectors in speech recognition,”
Advanced Black
-
Box Techniques For Nonlinear Modeling: Theory and Applications,
Kluwer Publishers (1998), demonstrates that mixture models with non-gaussian mixture components are useful for parametric density estimation of speech data. We explore the use of nonparametric techniques. Specifically, we use the penalized maximum likelihood approach introduced by Good and Gaskin, cited above. We combine the penalized maximum likelihood approach with the use of the Baum Welch algorithm, see L. E. Baum, T. Petrie, G. Soules and N. Weiss, “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains,”
The Annals of Mathematical Statistics
41, No. 1, pp. 164-71, 1970; Baum and Eagon, cited above, often used in speech recognition for training Hidden Markoff Models (HMMs). (This algorithm is a special case of the celebrated EM algorithm as described, e.g., A. P. Dempster, N. M. Liard and D. B. Baum, “Maximum likelihood from incomplete data via the EM algorithm,”
Journal of Royal Statistical Soc.
39(B), pp. 1-38, 1977.)
We begin by recalling that one of the most widely used nonparametric density estimators has the form
f
n

(
x
)
=
1
nh
d




Z
n

k

(
x
-
x
i
h
)
,
x

R
d
(
1
)
where Z
n
={1, . . . , n}, k is some specified function, and {x
i
:i&egr;Z
n
} is a set of observations in R
d
of some unknown random variable, cf. T. Cacoullos, “Estimates of a multivariate density,”
Annals of the Institute of Statistical Mathematics
18, pp. 178-89, 1966; E. Parzen, “On the estimation of a probability density function and the mode,”
Annals of the Institute of Statistical Mathematics
33, pp. 1065-76, 1962; M. Rosenblatt, “Remarks on some nonparametric estimates of a density function,”
Annals of Mathematical Statistics
27, pp. 832-37, 1956. It is well known that this estimator converges almost surely to the underlying probability density function (PDF) provided that the kernel k is strictly positive on R
d
, ∫
R
d
k(x)dx=1, h→0, nh→∞, and n→∞. The problem of how best to choose n and h for a fixed kernel k for the estimator (1) has been thoroughly discussed in the literature, cf. L. Devroye and L. Györfi, Nonparametric Density Estimation, The L
1
View, John Wiley & Sons, New York, 1985.
In this invention, we are led, by the notion of penalized maximum likelihood estimation (PMLE), to density estimators of the form
f

(
x
)
=



Z
n

c
i

k

(
x
,
x
i
)
,
x

R
d
(
2
)
where k(x, y), x, y&egr;R
d
is the reproducing kernel in some Hilbert space H, cf S. Saitoh, Theory of Reproducing Kernels and its Applications,
Pilman Research Notes in Mathematical Analysis,
Longman Scientific and Technical, Essex, UK, 189,1988.
Among the methods we consider, the coefficients in this sum are chosen to maximize the homogeneous polynomial



(
Kc
)
:=



Z
n



(

j

Z
n

K
ij

c
j
)
,
c
=
(
c
1
,



,
c
n
)
T
,
(
3
)
over the simplex
S
n
={c:c&egr;R
+
n
, e
T
c=
1},  (4)
where e=(1, . . . ,1)
T
&egr;R
n
,

R
+
n
={c:c
=(
c
1
, . . . ,c
n
)
T
, c
i
≧0,
i&egr;Z
n
},&e

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Penalized maximum likelihood estimation methods, the baum... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Penalized maximum likelihood estimation methods, the baum..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Penalized maximum likelihood estimation methods, the baum... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2820819

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.