Relevance vector machine

Data processing: artificial intelligence – Neural network – Learning task

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Relevance vector machine Relevance vector machine

: 1999-09-04
: 2003-10-14
: Patel, Ramesh (Department: 2121)
: Data processing: artificial intelligence
: Neural network
: Learning task

: C706S020000, C706S025000
: Reexamination Certificate
: active
: 06633857
: ABSTRACT:

FIELD OF THE INVENTION
This invention relates generally to data modeling and analysis, and more particularly to a relevance vector machine for such data modeling and analysis.
BACKGROUND OF THE INVENTION
Data modeling has become an important tool in solving complex and large real world computerizable problems. Applications of data modeling include data compression, density estimation and data visualization. A data modeling technique used for these and other applications is probabilistic modeling. It has proven to be a popular technique for data modeling applications such as speech recognition, vision, handwriting recognition, information retrieval and intelligent interfaces. One framework for developing such applications involves the representation of probability distributions as directed acyclic graphs, which are also known as Bayesian networks, belief networks, and probabilistic independence networks, among other terms.
In modeling such as probabilistic, usually a training data set is given that includes input vectors
{
x
n
}
n
=
1
N
along with a set of corresponding targets
{
t
n
}
n
=
1
N
,
the latter of which can be real values, in the case of regression analysis, or class labels, in the case of classification analysis. From this training set, a model of p(t|x) is attempted to be inferred, with the object of making accurate predictions of t for new, unlabelled, examples of x. Generally, the principal challenge is to find the appropriate complexity of this model. Scoring alternative models by training set accuracy alone is usually undesirable, since increasing the model complexity, while reducing the training set error, can easily lead to over-fitting and poor generalization. A more robust approach is to introduce a prior distribution over models, which is used in conjunction with the information supplied by the training data to infer the prediction model. This prior distribution, also referred to as a prior, can be explicit, such as in a Bayesian framework, or can be implicit in other approaches.
One method for classification, that has also been extended to regression, is known as the support vector machine (SVM). Although it does not estimate p(t|x), it makes predictions based on a discriminant function of the form
y
⁢
(
x
)
=
∑
n
=
1
N
⁢

⁢
w
n
⁢
K
⁢
(
x
,
x
n
)
+
w
0
,
where {w
n
} are the model weights and K(·,·) is a kernel function. A feature of the SVM is that its cost function attempts to minimize the number of errors made on the training set while simultaneously maximizing the margin between the two classes, in the feature space implicitly defined by the kernel. This maximum-margin principle is an appealing prior for classification, and ultimately drives many of the weights to zero, resulting in a sparse kernel classifier where the non-zero weights are associated with x
n
that are either on the margin or lie on the wrong side of it. Model complexity is thus constrained such that only these support vectors determine the decision function. In practice, in addition to fitting the model to the training data, it is also necessary to estimate the parameters (usually, denoted C) which regulate the trade-off between the training errors and size of margin, which may entail additional cross-validation.
A disadvantage with the SVM as a general matter is that it utilizes many kernel functions, and may not yield as optimal test performance as may be desired. Furthermore, the SVM utilizes parameters (i.e., those denoted C), which add unwanted complexity to the model. For these and other reasons, there is a need for the present invention.
SUMMARY OF THE INVENTION
The invention relates to a relevance vector machine (RVM). The RVM is a probabilistic basis model of the same functional form of the SVM. Sparsity is achieved through a Bayesian treatment, where a prior is introduced over the weights governed by a set of what are referred to as hyperparameters—one such hyperparameter associated with each weight, whose most probable values are iteratively estimated from the data. The posterior distribution of many of the weights is sharply peaked around zero, in practice.
In one embodiment, a computer-implemented method includes inputting a data set to be modeled, and determining a relevance vector learning machine to obtain a posterior distribution over the learning machine parameters given the data set (also referred to as “the posterior”). This includes determining a marginal likelihood for the hyperparameters, and iteratively re-estimating the hyperparameters to optimize the marginal likelihood. For the case of regression analysis, the marginal likelihood is determined directly. For the case of classification analysis, the marginal likelihood is approximated through the additional determination of the most probable weights for the given hyperparameters, and the Hessian at that most probable weight value. This approximation is also iteratively redetermined as the hyperparameters are updated. At least the posterior distribution for the weights given the data set is then output by the method.
RVM has advantages not found in prior art approaches such as SVM. As compared to SVM, for example, the non-zero weights in the RVM have been seen to not be associated with examples close to the decision boundary, but rather appear to represent more prototypical examples of classes. These examples are termed relevance vectors. Generally, the trained RVM utilizes many fewer basis functions than the corresponding SVM, and typically superior test performance. Furthermore, no additional validation of parameters (such as C) is necessary to specify the model, save those associated with the basis.

REFERENCES:
patent: 5855011 (1998-12-01), Tatsouka
patent: 6301571 (2001-10-01), Tatsouka
Mixtures of Principal Component Analyzers, Michael E. Tipping; Christopher M. Bishop; Artificial Neural Networks, Jul. 7-9, 1997, IEEE, Conference Publication No. 440, IEEE, pps. 13-18.*
Hierarchical Models for Data Visualization, Michael E. Tipping; Christopher M. Bishop; Artificial Neural Networks, Jul. 7-9, 1997, IEEE, Conference Publication. No. 440, pps. 70-75.*
The relevance vector machine technique for channel equalization application, Chen, S.; Gunn, S.R.; Harris, C.J. Neural Networks, IEEE Transactions on , vol.: 12 Issue: 6, Nov. 2001, Page(s): 1529-1532.*
Errata to “The relevance vector machine technique for channel equalization application” Chen, S.; Gunn, S.R.; Harris, C.J. Neural Networks, IEEE Transactions on , vol.: 13 Issue: 4 , Jul. 2002 Page(s): 1024-1024.*
Time series prediction based on the relevance vector machine with adaptive kernels Quinonero-Candela, J.; Hansen, L.K. Acoustics, Speech, and Signal Processing, 2002 IEEE International Conference on, vol.: 1, 2002, Page(s): 985-988.*
Block-adaptive kernel-based CDMA multiuser detection Chen, S.; Hanzo, L. Communications, 2002. ICC 2002. IEEE International Conference on , vol.: 2 , 2002 Page(s): 682-686 vol. 2.*
Michael E. Tipping, Sparse Bayesian Learning and the Relevance Vector Machine, Journal of Machine learning Research 1 (2001) pps. 211-244.*
MacKay, Bayesian non-linear modelling for the prediction competitions, in ASHRAE Transactions, vol. 100, pp. 1053-1062, ASHRAE, Atlanta, Georgia, 1994.
MacKay, Bayesian Interpolation, Neural Computation, 4(3): 415-447, 1992.
MacKay, The evidence framework applied to classification networks, Neural Computation, 4(5):720-736, 1992.
Neal, Lecture Notes in Statistics 118, Bayesian Learning for Neural Networks, pp. 15-17, 100-102, 113-116, 147-150 (Springer, 1996).
Platt, Fast training of support vector machines using sequential minimal optimization, in Advances in Kernal Methods: Support Vector Learning, MIT Press, Cambridge, MA (1999).
Vladimir N. Vapnik, Statistical Learning Theory, Chapter 10: The Support Vector Method for Estimating Indicator Functions, 1998, John Wiley & Sons, Inc., ISBN 0-471-03003-1.

Affiliated with

Tipping Michael

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Amin & Turocy LLP

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Holmes Michael B.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Microsoft Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Patel Ramesh

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Relevance vector machine does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Relevance vector machine, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Relevance vector machine will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3134663

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure