Data processing: artificial intelligence – Neural network – Learning task
Reexamination Certificate
1999-09-04
2003-10-14
Patel, Ramesh (Department: 2121)
Data processing: artificial intelligence
Neural network
Learning task
C706S020000, C706S025000
Reexamination Certificate
active
06633857
ABSTRACT:
FIELD OF THE INVENTION
This invention relates generally to data modeling and analysis, and more particularly to a relevance vector machine for such data modeling and analysis.
BACKGROUND OF THE INVENTION
Data modeling has become an important tool in solving complex and large real world computerizable problems. Applications of data modeling include data compression, density estimation and data visualization. A data modeling technique used for these and other applications is probabilistic modeling. It has proven to be a popular technique for data modeling applications such as speech recognition, vision, handwriting recognition, information retrieval and intelligent interfaces. One framework for developing such applications involves the representation of probability distributions as directed acyclic graphs, which are also known as Bayesian networks, belief networks, and probabilistic independence networks, among other terms.
In modeling such as probabilistic, usually a training data set is given that includes input vectors
{
x
n
}
n
=
1
N
along with a set of corresponding targets
{
t
n
}
n
=
1
N
,
the latter of which can be real values, in the case of regression analysis, or class labels, in the case of classification analysis. From this training set, a model of p(t|x) is attempted to be inferred, with the object of making accurate predictions of t for new, unlabelled, examples of x. Generally, the principal challenge is to find the appropriate complexity of this model. Scoring alternative models by training set accuracy alone is usually undesirable, since increasing the model complexity, while reducing the training set error, can easily lead to over-fitting and poor generalization. A more robust approach is to introduce a prior distribution over models, which is used in conjunction with the information supplied by the training data to infer the prediction model. This prior distribution, also referred to as a prior, can be explicit, such as in a Bayesian framework, or can be implicit in other approaches.
One method for classification, that has also been extended to regression, is known as the support vector machine (SVM). Although it does not estimate p(t|x), it makes predictions based on a discriminant function of the form
y
⁢
(
x
)
=
∑
n
=
1
N
⁢
⁢
w
n
⁢
K
⁢
(
x
,
x
n
)
+
w
0
,
where {w
n
} are the model weights and K(·,·) is a kernel function. A feature of the SVM is that its cost function attempts to minimize the number of errors made on the training set while simultaneously maximizing the margin between the two classes, in the feature space implicitly defined by the kernel. This maximum-margin principle is an appealing prior for classification, and ultimately drives many of the weights to zero, resulting in a sparse kernel classifier where the non-zero weights are associated with x
n
that are either on the margin or lie on the wrong side of it. Model complexity is thus constrained such that only these support vectors determine the decision function. In practice, in addition to fitting the model to the training data, it is also necessary to estimate the parameters (usually, denoted C) which regulate the trade-off between the training errors and size of margin, which may entail additional cross-validation.
A disadvantage with the SVM as a general matter is that it utilizes many kernel functions, and may not yield as optimal test performance as may be desired. Furthermore, the SVM utilizes parameters (i.e., those denoted C), which add unwanted complexity to the model. For these and other reasons, there is a need for the present invention.
SUMMARY OF THE INVENTION
The invention relates to a relevance vector machine (RVM). The RVM is a probabilistic basis model of the same functional form of the SVM. Sparsity is achieved through a Bayesian treatment, where a prior is introduced over the weights governed by a set of what are referred to as hyperparameters—one such hyperparameter associated with each weight, whose most probable values are iteratively estimated from the data. The posterior distribution of many of the weights is sharply peaked around zero, in practice.
In one embodiment, a computer-implemented method includes inputting a data set to be modeled, and determining a relevance vector learning machine to obtain a posterior distribution over the learning machine parameters given the data set (also referred to as “the posterior”). This includes determining a marginal likelihood for the hyperparameters, and iteratively re-estimating the hyperparameters to optimize the marginal likelihood. For the case of regression analysis, the marginal likelihood is determined directly. For the case of classification analysis, the marginal likelihood is approximated through the additional determination of the most probable weights for the given hyperparameters, and the Hessian at that most probable weight value. This approximation is also iteratively redetermined as the hyperparameters are updated. At least the posterior distribution for the weights given the data set is then output by the method.
RVM has advantages not found in prior art approaches such as SVM. As compared to SVM, for example, the non-zero weights in the RVM have been seen to not be associated with examples close to the decision boundary, but rather appear to represent more prototypical examples of classes. These examples are termed relevance vectors. Generally, the trained RVM utilizes many fewer basis functions than the corresponding SVM, and typically superior test performance. Furthermore, no additional validation of parameters (such as C) is necessary to specify the model, save those associated with the basis.
REFERENCES:
patent: 5855011 (1998-12-01), Tatsouka
patent: 6301571 (2001-10-01), Tatsouka
Mixtures of Principal Component Analyzers, Michael E. Tipping; Christopher M. Bishop; Artificial Neural Networks, Jul. 7-9, 1997, IEEE, Conference Publication No. 440, IEEE, pps. 13-18.*
Hierarchical Models for Data Visualization, Michael E. Tipping; Christopher M. Bishop; Artificial Neural Networks, Jul. 7-9, 1997, IEEE, Conference Publication. No. 440, pps. 70-75.*
The relevance vector machine technique for channel equalization application, Chen, S.; Gunn, S.R.; Harris, C.J. Neural Networks, IEEE Transactions on , vol.: 12 Issue: 6, Nov. 2001, Page(s): 1529-1532.*
Errata to “The relevance vector machine technique for channel equalization application” Chen, S.; Gunn, S.R.; Harris, C.J. Neural Networks, IEEE Transactions on , vol.: 13 Issue: 4 , Jul. 2002 Page(s): 1024-1024.*
Time series prediction based on the relevance vector machine with adaptive kernels Quinonero-Candela, J.; Hansen, L.K. Acoustics, Speech, and Signal Processing, 2002 IEEE International Conference on, vol.: 1, 2002, Page(s): 985-988.*
Block-adaptive kernel-based CDMA multiuser detection Chen, S.; Hanzo, L. Communications, 2002. ICC 2002. IEEE International Conference on , vol.: 2 , 2002 Page(s): 682-686 vol. 2.*
Michael E. Tipping, Sparse Bayesian Learning and the Relevance Vector Machine, Journal of Machine learning Research 1 (2001) pps. 211-244.*
MacKay, Bayesian non-linear modelling for the prediction competitions, in ASHRAE Transactions, vol. 100, pp. 1053-1062, ASHRAE, Atlanta, Georgia, 1994.
MacKay, Bayesian Interpolation, Neural Computation, 4(3): 415-447, 1992.
MacKay, The evidence framework applied to classification networks, Neural Computation, 4(5):720-736, 1992.
Neal, Lecture Notes in Statistics 118, Bayesian Learning for Neural Networks, pp. 15-17, 100-102, 113-116, 147-150 (Springer, 1996).
Platt, Fast training of support vector machines using sequential minimal optimization, in Advances in Kernal Methods: Support Vector Learning, MIT Press, Cambridge, MA (1999).
Vladimir N. Vapnik, Statistical Learning Theory, Chapter 10: The Support Vector Method for Estimating Indicator Functions, 1998, John Wiley & Sons, Inc., ISBN 0-471-03003-1.
Amin & Turocy LLP
Holmes Michael B.
Microsoft Corporation
Patel Ramesh
LandOfFree
Relevance vector machine does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Relevance vector machine, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Relevance vector machine will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3134663