Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-12-09
2001-11-13
Amsbury, Wayne (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06317752
ABSTRACT:
BACKGROUND
This invention relates generally to data mining software.
Data mining software extracts knowledge that may be suggested by a set of data. For example, data mining software can be used to maximize a return on investment in collecting marketing data, as well as other applications such as credit risk assessment, fraud detection, process control, medical diagnoses and so forth. Typically, data mining software uses one or a plurality of different types of modeling algorithms in combination with a set of test data to determine what types of characteristics are most useful in achieving a desired response rate, behavioral response or other output from a targeted group of individuals represented by the data. Generally, data mining software executes complex data modeling algorithms such as linear regression, logistic regression, back propagation neural network, Classification and Regression (CART) and Chi
2
(Chi squared) Automatic Interaction Detection (CHAID) decision trees, as well as other types of algorithms on a set of data.
Results obtained by executing these algorithms can be expressed in a variety of ways. For example, an RMS error, R
2
value, confusion matrix, gains table or multiple lift charts or a single lift chart with multiple lift curves. Based on these results the decision maker can decide which model (i.e., type of modeling algorithm and learning parameters) might be best for a particular use.
SUMMARY
In many real world modeling problems, such as in version testing, often a single variable or set of input variables can have a significantly strong influence on predicting behavioral outcomes. The data mining software allows for execution of multiple models based on selective segmentation of data using models designed for and trained with the particular data segments. When the models operate on each of the data segments, they can produce a simple lift chart to show the performance of the model for that segment of data.
While a single lift chart may provide useful results, the single lift chart does not indicate the usefulness of the multiple model approach. A single lift chart does not indicate how the multiple models should optimally combined and used. In addition, the performance of individual models based on data segmentation can not be directly compared to that of a single, non-segmented model, to determine whether the improvement, if any, exhibited with the multiple data segment modeling approach justifies the additional modeling expenses associated therewith.
The scores generated for these models cannot be simply sorted from among different models when a priori data distributions have been modified. This is typical in problems such as response modeling, when a class or behavior of interest represents a small sample of the overall population (e.g., response rates are typically 1-2%). Scores cannot be simply combined and sorted from multiple models because the scores no longer represent probabilities of the predicted outcomes. Records from a less represented class (e.g., responders to a mailing campaign) are typically over sampled relative to the other class (e.g., non-responders). While this sampling technique provides improved prediction accuracy, the model scores for many data-driven algorithms no longer map directly to probabilities and therefore cannot be easily combined from multiple models.
According to an aspect of the present invention, a method of version testing in database based marketing includes building a model for each version, based on a random test sample of potential contacts selected from a data set for each version and scoring the dataset of records of potential contacts using each version's model to produce model scores for each version. The method also includes converting the model scores for each version into response rate predictions.
According to a further aspect of the present invention, a computer program product for conducting version testing in database based marketing comprises instructions for causing a computer to build a model for each version, based on a random test sample of potential contacts selected from a data set for each version. The computer program product also includes instructions to cause a computer to score the dataset of records of potential contacts using each version's model to produce model scores for each version and convert the model scores for each version into response rate predictions.
One of more of the follow advantages are provided the one or more aspects of the invention. A process for version testing is provided. In version testing, different versions of offers can be made to different groups of people and outcomes of the offers can be evaluated. By testing each offer to a randomly selected subsample of the total population, the results can be modeled separately and then combined to determine the best offer to send to a prospect. Also, a decision maker can determine the optimal mailing point in a combined lift chart. The process may take into account the total number of versions, the amount of “overlap” or customers flagged to be mailed multiple versions when selected independently, how much of the overlap would be “won” by a version when comparing its profit with competing versions and the cost structure of each version when moving between upper and lower bounds.
REFERENCES:
patent: 4853843 (1989-08-01), Ecklund
patent: 5251131 (1993-10-01), Masand et al.
patent: 5842199 (1998-11-01), Miller et al.
patent: 5842200 (1998-11-01), Agrawal et al.
patent: 5970482 (1999-10-01), Pham et al.
patent: 5991741 (1999-11-01), Speakman et al.
patent: 6012056 (2000-01-01), Menlove
patent: 6012058 (2000-01-01), Fayyad et al.
patent: 6038538 (2000-03-01), Agrawal et al.
patent: 6044366 (2000-03-01), Graffe et al.
patent: 6049861 (2000-04-01), Bird et al.
patent: 6058397 (2000-05-01), Barrus et al.
patent: 6059724 (2000-05-01), Campell et al.
patent: 6119103 (2000-09-01), Basch et al.
“Classification and Regression”, E. Brand et al., Feb. 26, 1998, pp. 1-7.
“Data Mining—A Practical Overview”, R.Burke, 1997, Indiana University, pp. 1-7.
“A Majority Rules Approach to Data Mining”, R. J. Roiger et al., IEEE 1997, pp. 100-107.
“Was ist Lotto am Samstag?”, Toto-Lotto in Bayern (Internet: www.staatliche-lotterieverwaltung.de/spiele/lotto-as.htm).
“Bias Correction in Risk Assessment When Logistic Regression is Used With An Unevenly Proportioned Sample Between Risk and Non-Risk Groups”, Lee et al., pp. 304-309.
Crites Robert
Kennedy Ruby
Lee Yuchun
Amsbury Wayne
Fish & Richardson P.C.
Pardo Thuy
Unica Technologies, Inc.
LandOfFree
Version testing in database mining does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Version testing in database mining, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Version testing in database mining will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2604920