Bayesian principal component analysis

Data processing: structural design – modeling – simulation – and em – Modeling by mathematical expression

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C706S016000, C706S020000

Reexamination Certificate

active

06671661

ABSTRACT:

FIELD OF THE INVENTION
This invention relates generally to data modeling and analysis such as principal component analysis, and more particularly to Bayesian principal component analysis.
BACKGROUND OF THE INVENTION
Data modeling has become an important tool in solving complex and large real-world computerizable problems. Applications of data modeling include data compression, density estimation and data visualization. A data modeling technique used for these and other applications is principal component analysis (PCA). It has proven to be a popular technique for data modeling applications, such as data compression, image analysis, visualization, pattern recognition, regression, and time-series prediction. Other data modeling applications in which PCA can be applied are density modeling for emission densities in speech recognition, clustering of data for data mining applications, and building class-conditional density models for hand-writing recognition.
A common definition of PCA is that for a set D of observed d-dimensional data vectors {t
n
}, n&egr;{1, . . . , N}, the q principal axes w
j
, j&egr;{1, . . . , q}, are those orthonormal axes onto which the retained variance under projection is maximal. As those of ordinary skill within the art can appreciate, it can be shown that the vectors w
j
are given by the q dominant eigenvectors (those with the largest associated eigenvalues) of the sample covariance matrix S=&Sgr;
n
(t
n
−{overscore (t)})(t
n
−{overscore (t)})
T
/N such that Sw
j
=&lgr;
j
w
j
and where {overscore (t)} is the sample mean. The vector x
n
=W
T
(t
n
−{overscore (t)}), where W=(w
1
, w
2
, . . . , W
q
), is thus a q-dimensional reduced representation of the observed vector t
n
.
A limitation of conventional PCA is that it does not define a probability distribution. However, as described in the reference M. E. Tipping and C. M. Bishop, Probabilistic principal component analysis (1997), PCA can be reformulated as the maximum likelihood solution of a specific latent variable model. This solution is referred to as probabilistic PCA. However, as with conventional PCA, the model utilized provides no mechanism for determining the value of the latent-space dimensionality q. For q=d−1 the model is equivalent to a full-covariance Gaussian distribution, while for q<d−1 it represents a constrained Gaussian distribution in which the variance in the remaining d−q directions is modeled by a single parameter &sgr;
2
. Thus, the choice of q corresponds to a problem in model complexity optimization. If data is plentiful, then cross-validation to compare all possible values of q offers a possible approach. However, this can quickly become intractable for mixtures of probabilistic PCA models if each component is desired to have its own q value.
For these and other reasons, there is a need for the present invention.
SUMMARY OF THE INVENTION
The invention relates to Bayesian principal component analysis. In one embodiment, a computer-implemented method for performing Bayesian PCA includes inputting a data model; receiving a prior distribution of the data model; determining a posterior distribution; generating output data based on the posterior distribution (such as, a data model, a plurality of principal components, and/or a distribution); and, outputting the output data. In another embodiment, a computer-implemented method includes inputting a mixture of a plurality of data spaces; determining a maximum number of principal components for each of the data spaces within the mixture; and, outputting the maximum number of principal components for each of the data spaces within the mixture.
Thus, the invention provides for a Bayesian treatment of PCA. A prior distribution, such as P(&mgr;, W, &sgr;
2
), is received over the parameters of the inputted data model. The corresponding posterior distribution, such as P(&mgr;, W, &sgr;
2
|D), is then obtained, for example, by multiplying the prior distribution by the likelihood function, and normalizing. In one embodiment, the output data is generated by obtaining a predictive density, by marginalizing over the parameters, so that
P
(t|D)=∫∫∫
P
(t|&mgr;, W, &sgr;
2
)
P
(&mgr;, W, &sgr;
2
|D
)
d&mgr;dWd&sgr;
2.
To implement this framework, embodiments of the invention address two issues: the choice of prior distribution, and the formulation of a tractable algorithm. Thus, embodiments of the invention control the effective dimensionality of the latent space (corresponding to the number of retained principal components). Furthermore, embodiments of the invention avoid discrete model selection and instead utilize continuous hyper-parameters to determine automatically an appropriate effective dimensionality for the latent space as part of the process of Bayesian inference.


REFERENCES:
patent: 5325445 (1994-06-01), Herbert
patent: 5343537 (1994-08-01), Bellegarda et al.
patent: 5465321 (1995-11-01), Smyth
patent: 5754681 (1998-05-01), Watanabe et al.
patent: 5796924 (1998-08-01), Errico et al.
patent: 5949678 (1999-09-01), Wold et al.
patent: 5963591 (1999-10-01), O'Brien et al.
patent: 6128587 (2000-10-01), Sjolander
patent: 6212509 (2001-04-01), Pao et al.
patent: 6262730 (2001-07-01), Horvitz et al.
patent: 6263103 (2001-07-01), Freeman et al.
patent: 6336108 (2002-01-01), Thiesson et al.
patent: 6380934 (2002-04-01), Freeman et al.
Liu and Wechsler, “A Unified Bayesian Framework for Face Recognition”, IEEE, 1998, pp. 151-155.*
Bishop, Christopher, “Variational Principal Components”, Artificial Neural Networks, Sep. 1999, IEE 1999, pp. 509-14.*
Bishop, Christopher M.Neural Networks for Pattern Recognition. Published by Oxford University Press in 1995.
Christopher M. Bishop, Neural Networks for Pattern Recognition, Nov. 1995, Oxford University Press, ISBN 0198538642.
Michael E. Tipping, Christopher M. Bishop, Probabilistic Principal Component Analysis, Technical Report NCRG/97/010, Sep. 4, 1997, pp. 1-13.
Michael E. Tipping, Christopher M. Bishop, Mixtures of Probabilistic Principal Component Analyers, Technical Report NCRG/97/003, Jul. 11, 1998.
Radford M. Neal, Probabilstic Inference Using Markov Chain Monte Carlo Methods, Technical Report CRG-TR-93-1, Sep. 25, 1993.
David J C MacKay, Probable networks and plausible predictions—a review of practical Bayesian methods for supervised neural networks, Network: Computation in Neural Systems 6 (3), 469-505, 1995.
Michael E. Tipping and Christopher M. Bishop, Mixtures of Principal Component Analyzers, Artificial Neural Networks, Jul. 7-9, 1997, Conference Publication No. 440, Fifth International Conference Artifical Neural Networks.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Bayesian principal component analysis does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Bayesian principal component analysis, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Bayesian principal component analysis will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3146013

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.