Method and system to identify which predictors are important...

Data processing: measuring – calibrating – or testing – Measurement system – Statistical measurement

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C702S127000

Reexamination Certificate

active

06484123

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention generally relates to forecasting; and, more specifically, the invention relates to methods and systems to identify which predictors are important for making particular forecasts. The preferred embodiment of the invention may be used in any area where the collaborative filter (also called nearest neighbor) data mining technology is applied. Such areas include, but are not limited to, e-commerce, banking, manufacturing, securities trading, website personalization, mass marketing, communications, and medical diagnosis.
2. Prior art
The collaborative filter, also called the nearest neighbor model, is a mathematical model that is used to predict the value of a target variable given a set of input variables. For example, consider a scenario where a researcher has collected data regarding a population's age, height, weight, and gender. This information is contained in the following table:
TABLE 1
Sample data containing a population's age,
height, weight, and gender information.
Age
Height
Weight
(years)
(inches)
(lbs)
Gender
Subject 1
53
65
165
M
Subject 2
44
54
150
M
Subject 3
32
74
175
F
Subject 4
12
50
120
M
Subject 5
9
36
90
F
In this hypothetical example, suppose that the task is to take age, height, and weight information from a new table of data and use that information to determine the gender of the individual. Thus, suppose we are given a table as follows:
TABLE 2
Sample data containing age, height, and weight
information for five new subjects. Gender is
unknown.
Age
Height
Weight
(years)
(inches)
(lbs)
Gender
Subject 6
13
51
121
?
Subject 7
29
74
175
?
Subject 8
50
70
170
?
Subject 9
5
30
30
?
Subject 10
10
36
90
?
The task is to apply the nearest neighbor algorithm to determine the gender of the subjects 6-10 based on the data in Tables 1 and 2.
In a simple implementation of the nearest neighbor algorithm, we compute the Euclidean distance between every subject in our test set (Table 2) and every subject in the training set (Table 1). The gender associated with the nearest match in Table 1 is assigned to the subjects in the test set. A variation on this algorithm is to use the average value of the K nearest neighbors. In this case, we take K=1. Thus:
TABLE 3
Intersubject Euclidean distances. Cells
in bold correspond to the nearest neighbor.
Subject 1
Subject 2
Subject 3
Subject 4
Subject 5
Subject 6
1244.0
603.7
1268.7
1.0
400.7
Subject 7
252.3
416.7
3.0
1296.7
3023.0
Subject 8
19.7
230.7
121.7
1448.0
3079.0
Subject 9
7251.3
5499.0
7896.7
2849.7
1217.3
Subject 10
2771.7
1693.3
3051.0
366.7
0.3
The lowest value in each row indicates the nearest match between the subject in the test set and the subjects in the training set. Thus, Subject 6 is closest to Subject 4. The gender for Subject 6 is thus predicted to be male. Now, we can complete Table 2:
TABLE 4
Subjects in Table 2 with Gender predictions.
Age
Height
Weight
(years)
(inches)
(lbs)
Gender
Subject 6
13
51
121
M
Subject 7
29
74
175
F
Subject 8
50
70
170
M
Subject 9
5
30
30
F
Subject 10
10
36
90
F
A problem that arises in using this popular algorithm is that it is not known which variables led to the predictions made. For example, for subject 9, all we know at this point is that the nearest neighbor was subject 5. We do not know whether it was the age, height or weight of the subject that led to the conclusion that the subject's gender is female. At this point, the best we can say is that the ensemble of predictors (age, height and weight) led to this conclusion.
SUMMARY OF THE INVENTION
An object of this invention is to provide a novel method and system to identify which predictors are important for making a forecast.
Another object of the present invention is to determine which predictor, if any, are driving a particular predictor in a collaborative filter.
These and other objectives are attained with a method and system for identifying parameters that are important in predicting a target variable. The method comprises the steps of compiling training data, said training data identifying, for each of a first set of subjects, values for each of a first set of parameters; and compiling test data, said test data identifying, for each of a second set of subjects, values for each of a second set of parameters, said first and second sets of parameters having at least a plurality of common parameters.
The method comprises the further steps of using the data in the training data, and using a nearest neighbor procedure, to identify, for each of the second set of subjects, a value for a target parameter; and processing the training data and the test data, according to a predefined procedure, to determine the relative importance of at least selected ones of the first group of parameters in predicting the values for the target parameter.
Further benefits and advantages of the invention will become apparent from a consideration of the following detailed description, given with reference to the accompanying drawings, which specify and show preferred embodiments of the invention.


REFERENCES:
patent: 5485621 (1996-01-01), Schwanke et al.
patent: 6304675 (2001-10-01), Osbourn et al.
“IBM DB2 Intelligent Miner for Data”, http://www.-4.ibm.com/software/data/iminer/fordata/index.htm, IBM Software: Database and Data Managem . . . B2 Intelligent Miner for Data: Overview, (last modified May 22, 2000).
“Data Mining The Data Mining Challenge: Turning Raw Data Into Business Gold”, http://www.sas.com/software/data-mining, SAS Institute's Data Mining Solution, (last modified May 22, 2000).
“Enterprise Miner”, http://www.sas.com/software/components/miner.htp, Enterprise Miner, (last modified May 22, 2000).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system to identify which predictors are important... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system to identify which predictors are important..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system to identify which predictors are important... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2935870

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.