Nongaussian density estimation for the classification of...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Nongaussian density estimation for the classification of... Nongaussian density estimation for the classification of...

: 1998-06-25
: 2001-07-31
: Zele, Krista (Department: 2748)
: Data processing: speech signal processing, linguistics, language
: Speech signal processing
: Recognition

: C704S255000
: Reexamination Certificate
: active
: 06269334
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to speech recognition systems and, more particularly, to the use of EM type algorithms for the estimation of parameters for a mixture model of nongaussian densities. The present invention was motivated by two objectives. The first was to study maximum likelihood density estimation methods for high dimensional data, and the second was the application of the techniques developed in large vocabulary continuous parameter speech recognition.
2. Background Description
Speech recognition systems requires modeling the probability density of feature vectors in the acoustic space of phonetic units. Purely gaussian densities have been know to be inadequate for this purpose due to the heavy tailed distributions observed by speech feature vectors. Se,, for example, Frederick Jelenik,
Statistical Methods for Speech Recognition
, MIT Press, 1997. As an intended remedy to this problem, practically all speech recognition systems attempt modeling by using a mixture model with gaussian densities for mixture components. Variants of the standard K-mean clustering algorithm are used for this purpose. The classical version (as described by John Hartigan in
Clustering Algorithms
, John Wiley & Sonse, 1975, and Anil Jain and Richard Dubes in
Algorithms for Clustering Data
, Prentice Hall, 1988) of the K-means algorithm can also be viewed as an special case of the EM algorithm (as described by A. P. Dempster, N. M. Laird and D. B. Baum in “Maximum likelihood from incomplete data via the EM algorithm”,
Journal of Royal Statistical Soc
., Ser. B, vol. 39, pp. 1-38, 1997) in the limiting case of gaussian density estimation with variance zero. See, for example, Christopher M. Bishop,
Neutral Networks for Pattern Recognition
, Cambridge University Press, 1997, and F. Marroquin and J. Girosi, “Some extensions of the K-means algorithm for image segmentation and pattern classification”,
MIT Artificial Intelligence Lab. A. I. Memorandum no.
1390, January 1993. The only known attempt to model the phonetic units in speech with nongaussian mixture densities is described by H. Ney and A. Noll in “Phoneme modeling using continuous mixture densities”,
Proceedings of IEEE Int. Conf. on Acoustics Speech and Signal Processing
, pp. 437-440, 1988, where laplacian densities were used in a heuristic based estimation algorithm.
SUMMARY OF THE INVENTION
It is therefore an object of this invention to provide a new statistical modeling paradigm for automatic machine recognition of speech.
According to this invention, novel mixtures of nongaussian statistical probability densities for modeling speech subword units (e.g., phonemes) and further subcategories thereof, including the transition and output probabilities, are used in a Hidden Markov generative model of speech. We model speech data by building probability densities from functions of the form exp(−t
&agr;/2
) for t≧0, &agr;>0. In this notation, the case &agr;=2 corresponds to the gaussian density, whereas the laplacian case considered in Ney et al. corresponds to &agr;=1.
Furthermore, we focus on a set of four different types of mixture components constructed from a different univariate function. For each of them, a mixture model is then used for a maximum likelihood model of speech data. It turns out that our iterative algorithm can be used for a range of values of &agr; (opposed to fixed &agr;=1 in Ney et al. or &agr;=2 in standard speech recognition systems).
Our observation is that the distribution of speech feature vectors in the acoustic space are better modeled by mixture models with nongaussian mixture components. In particular, for speech &agr;<1 seems more appropriate, see FIG.
1
. To wit, very similar distributions have been noted for the distribution of image gradients by Stuart Geman in “Three lectures on image understanding”, The Center For Imaging Science, Washington State University, video tabe, Sep. 10-12, 1997, and also by David Mumford in pattern theory, Lecture in Directors Series, IBM Yorktown Heights, Feb. 23, 1998.
The second point to be made is that from a practical standpoint estimation of densities in speech data is accomplished by all the difficulties characteristic of high dimensional density estimation problems. See David W. Scott,
Multivariate Density Extimation
, Wiley Interscience, 1992, and James R. Thompson and Richard A. Tapia,
Nonparametric Function Estimation
, modeling and simulation, SIAM Publications, 1997. Feature vectors of dimension fifty or more are typical in speech recognition systems, and consequently, the data can be considered to be highly sparse. In contrast, the literature (see Scott, supra) on multivariate density estimation puts high emphasis on “exploratory data analysis” the goal of which is to glean insight about the densities via visualization of the data. This is not feasible for dimensions of the order of fifty or more, even when projections on lower dimensional spaces are considered.
The classification/training step for speech recognition which we use can be cast in a general framework. We begin with a parametric family p(x/&lgr;), x &egr; R
d
, &lgr; &egr; &OHgr; ⊂R
q
of probability densities on R
d
with parameters in the manifold &OHgr; in R
q
. The method of classification used here begins with k finite subsets of T
1
, T
2
, . . . , T
k
of R
d
and consider the problem of deciding which of these sets given vector x &egr; R
d
lies. The method that we employ picks k probability densities p
1
=p(.|&thgr;
1
), . . . , p
k
=p (.|&thgr;
k
) from our family and associates with the subset T
l
the probability density p
1
, l=1, 2, . . . , k. Then, we say x &egr; R
d
belongs to the subset T
r
if r is the least integer in the set {1, 2, . . . , k} such that
p
r
⁡
(
x
)
≥
max
⁢
{
p
l
⁡
(
x
)
⁢
:
⁢

⁢
1
≤
l
≤
k
}
.
To use this method we need to solve the problem of determining, for a given finite subset T⊂R
d
, a probability density p(.|&thgr;) in our family. This is accomplished by maximum likelihood estimation (MLE). Thus, the likelihood function, for the data T is given by
L
⁡
(
λ
|
T
)
=
∏
y
∈
T
⁢
p
(
y
|
λ
)
,

⁢
λ
∈
Ω
,
and a vector, &thgr; &egr; &OHgr;, is chosen which maximizes this function over all &lgr; &egr; &OHgr; (if possible). Generally, a maximum does not exist, (see V. N. Vapnik,
The Nature of Statistical Learning Theory
, Springer Verlag, 1995, p. 24), and thus, typically as iterative method is used to find a stationary point of the likelihood function. As we shall see, the iteration we use takes the form of a variation of the EM algorithm described by Dempster et al., supra, and Richard A. Redner and Homer Walker, “Mixture densities, maximum likelihood and the EM algorithm”, SIAM Review vol. 26, no. 2, April 1984.

REFERENCES:
patent: 4783804 (1988-11-01), Juang et al.
patent: 5148489 (1992-09-01), Erell et al.
patent: 5271088 (1993-12-01), Bahler
patent: 5473728 (1995-12-01), Luginbuhl et al.
patent: 5694342 (1997-12-01), Stein
patent: 5706402 (1998-01-01), Bell
patent: 5737490 (1998-04-01), Austin et al.
patent: 5754681 (1998-05-01), Watanabe et al.
patent: 5790758 (1998-08-01), Streit
patent: 5839105 (1998-11-01), Ostendorf et al.
patent: 5857169 (1999-01-01), Seide
patent: 5864810 (1999-01-01), Digalakis et al.
Godsill et al, “Robust Noise Reduction For Speech and Audio Signals”, IEEE, pp. 625-628, 1996.*
Laskey, “A Bayesian Approach to Clustering and Classification”, IEEE pp. 179-183, 1991.*
Tugnait, “Parameter Identifiability of Multichannel ARMA Models of Linear Non-Gaussian Signals Via cumulant Matching”, IEEE, IV 441-444, 1994.*
Frangoulis, “Vector Quantization of the Continuous Distributions of an HMM Speech Recogniser base on Mixtures of Continuous Distributions”, IEEE, pp. 9-12, 1989.*
Pham et al, “Maximum Likelihood Estimation of a Class of Non-Gaussian Densities

Affiliated with

Basu Sankar

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Micchelli Charles A.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

International Business Machines - Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Kaufman, Esq. Stephen C.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

Opsasneck Michael N.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Whitham, Curtis & Whitham

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Zele Krista

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Nongaussian density estimation for the classification of... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Nongaussian density estimation for the classification of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Nongaussian density estimation for the classification of... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2526538

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure