Cascade boosting of predictive models

Data processing: artificial intelligence – Adaptive system

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06546379

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to methods for building predictive models from a given population of data points, and more particularly to a family of methods known as boosting or adaptive sampling, wherein multiple models are constructed and combined in an attempt to improve upon the performance obtained by building a single model. The invention resolves the interpretability problems of previous boosting methods while mitigating the fragmentation problem when applied to decision trees.
2. Background Description
Predictive modeling refers to generating a model from a given set of data points (also called “examples” or “records”), where each point is comprised of fields (also called “attributes” or “features” or “variables”), some of which are designated as target fields (also called “dependent variables”) whose values are to be predicted from the values of the others (also called “independent variables”).
There are many known methods of building predictive models from data for many different families of models, such as decision trees, decision lists, linear equations, and neural networks. The points used to build a model are known as training points. The performance of a model is judged by comparing the predicted values with the actual values of target fields in various populations of points. It is important to perform well on “new” or “unseen” points that were not used to train the model. In practical applications, the values of target fields become known at later times than the values of other fields. Actions based on accurate predictions can have many benefits. For example, a retailer can mail promotional material to those customers who are likely to respond favorably, while avoiding the costs of mailing the material to all customers. In this example, customer response is a target field. Other fields might include past responses to similar promotions, other past purchases, or demographic information.
Given some method of building predictive models, one could simply apply the method once to the available training points, or to as large a sample from that population as the underlying method can work with. Boosting refers to a family of general methods that seek to improve the performance obtained from any given underlying method of building predictive models by applying the underlying methods more than once and the combining the resulting “weak” models into a single overall model that, although admittedly more complex than any of the “weak” models obtained from the underlying method, may make more accurate predictions. The term “weak”, as used in connection with boosting, is a technical term used in the art; a “weak” model is simply a model with imperfect performance that one hopes to improve by somehow combining the “weak” model with other “weak” models built by the same underlying method, but from different samples of the available training points. A model with good performance may still be considered “weak” in this context. Boosting is a process of adaptive resampling that builds a weak model, determines how to choose another training sample by observing the performance of the weak model(s) already built, builds another weak model, and so on.
FIG. 1
illustrates the end result of the common method of boosting. A list of admittedly weak models (
1001
,
1002
, . . . ,
1099
) is available, and every one of those models is applied to any given point for which a prediction is wanted in
1000
. The actual number of weak models could be more or less than the 99 indicated in FIG.
1
. Tens or hundreds of weak models are commonly used. Such an ensemble of models comprising the final weighted model is not humanly understandable. To return a single prediction, a weighted average of all the weak predictions is computed in block
1100
. As commonly practiced (see, for example, Y. Freund and R. Schapire, “Experiments with a New Boosting Algorithm”,
Proceedings of the International Machine Learning Conference
, pp. 148-156 (1996), or R. Schapire and Y. Singer, “Improved Boosting Algorithms Using Confidence-Rated Predictions”,
Proceedings of the
11
th
Annual Conference on Computational Learning Theory
(1998)), boosting builds both a list of weak models (
1001
,
1002
, . . . ,
1099
) and a corresponding list of weights for averaging the predictions of those models, as in step
1100
.
The method of building the lists is a form of adaptive resampling, as illustrated in FIG.
2
. Given a population of training points
2001
and a trivial initial probability distribution D_
1
2011
where all points are equally probable, any given model-building method yields a model M_
1
2012
to serve as the first weak model. If the given model-building method cannot deal directly with a large population of training points together with a probability distribution on those points, then a sample of any convenient size may be drawn from the population
2001
according to the distribution
2011
. To determine the weight of the predictions of M_
1
and to build the next weak model, the next probability distribution D_
2
2021
is computed by observing the weighted average performance of M_
1
on the entire population
2001
, with the performance for each point weighted by the probability of that point according to the current distribution
2011
. The function that computes the weight of M_
1
will reward better performance with the higher weight for M_
1
, while the function that computes the next probability of each point ensures that points where M_
1
performs poorly will be more likely to be chosen than are points with the same current probability where M_
1
performs well.
Given the same population
2001
and the new probability distribution
2021
, the same given model-building method yields a model M_
2
2022
to serve as the second weak model. The process of observing performance and determining both a weight and a new distribution continues for as long as desired, leading eventually to a final distribution D_
99
2991
and then a final weak model M_
99
2992
. The last step in boosting as commonly practiced is a truncation of the repeated process: observed performance determines the weight of M_
99
, but the computation of a new distribution is omitted. Boosting as commonly practiced will be called “additive boosting” hereafter.
After additive boosting, the final model is of a form unlike that of the models in the list of admittedly weak models, and the list of weights is difficult to interpret. Despite considerable experimental success, additive boosting is, from the viewpoint of the end user, disturbingly like the old fable about a committee of blind men who independently examined various parts of an elephant by touch and could not pool their admittedly limited observations into a consensus about the whole animal, as illustrated in FIG.
3
. Man
31
feels the elephant's leg
32
and assumes he has encountered a tree. Man
33
feels the elephant's leg
34
and assumes he has encountered a snake. The two men are unable to conclude that they have both encountered an elephant
35
.
This interpretability problem is well-known. A large complex model, such as a typical boosted model, with a whole ensemble of base models and their weights, is difficult to understand and explain. This limits the scope of practical applications. There have been attempts to mitigate the interpretability problem with visualization tools applied after models have been built (J. S. Rao and W. J. E. Potts, “Visualizing Bagged Decision Trees”,
Proceedings of the Third International Conference on Knowledge, Discovery and Data Mining
, (KDD-97), pp. 243-246 (August 1997). The interpretability problem has been addressed for the special case of Naive Bayes classification as the underlying method of building models and “weight of evidence” as the desired interpretation (G. Ridgeway, D. Madigan, T. Richardson, and J. O'Kane, “Interpretable Boosted Naive Bayes Classification”,
Proceedings of the Fourth International Conference on Knowledge, Discovery and Dat

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Cascade boosting of predictive models does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Cascade boosting of predictive models, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cascade boosting of predictive models will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3009741

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.