Data processing: artificial intelligence – Knowledge processing system – Knowledge representation and reasoning technique
Reexamination Certificate
2001-03-30
2004-08-24
Davis, George B. (Department: 2121)
Data processing: artificial intelligence
Knowledge processing system
Knowledge representation and reasoning technique
C706S021000
Reexamination Certificate
active
06782377
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is related to the field of data mining, more particularly to classifier models and supervised methods of learning same from previous observations, for the purpose of predicting a class or category of newly encountered observations.
2. Discussion of the Prior Art
One problem in data mining is deviation detection. Deviation detection includes modeling rarely occurring phenomena or examples in data. Current work in deviation detection focuses on building classifier models for normal examples and then detecting deviations from the model.
There are many situations wherein rare deviant events may occur, for example, in network intrusion detection or fraud detection. In network intrusion detection, for attacks of type remote-to-local (r2l) on a computer made remotely by guessing a password or opening a ftp data connection, the successful incidents of attack would be rare as compared to the total number of r2l connections.
Classification is a supervised technique in data mining that learns models from labeled data (known data). Input to the classification problem is a set of observations from the real world that are recorded as a set of records, each characterized by multiple attributes. Associated with each record is a categorical attribute called a class. Given a training set of records with known class labels, one problem is to learn a model for the class in terms of other attributes.
In prior art systems, the objective has been to maximize accuracy, thus, minimizing the number of misclassifications. However, for rare classes, a mere accuracy metric is not sufficient. Consider evaluation data in which 1% of the examples belong to a rare target class; if the classifier predicts that every example belongs to a class other than the target class, then it achieves an accuracy of 99%, which is acceptable for many existing techniques. However, from the perspective of predicting rare target class examples, its performance is unacceptable because it may not have predicted any rare target examples correctly. Existing methods of learning rule-based models may fail to learn complete and precise models for rare target classes.
Disjunctive Normal Form (DNF) is the most general form of a rule-based model. The model for a given class C, is of the form R: D
1
OR D
2
. . . OR Dn→C, where each Di is called a disjunct, and is formed by conjunction of multiple conditions on attributes of the record, such as D
1
:(a
1
=P) AND (a2<10), where a
1
and a
2
are attributes of unordered and ordered types, respectively, and P is one of the specific values taken by a
1
. There are two broad categories of methods of learning a DNF model: specific-to-general and general-to-specific. Specific-to-general techniques start with each record as the most specific rule, and progressively generalize the rule-set by merging or removing individual conditions and rules. However, these techniques are not usually suitable for problems with large high-dimensional data-sets because their time complexity scales poorly (e.g., quadratic in training set size and cubic in number of attributes).
General-to-specific techniques begin with the most general rule, an empty rule, and progressively add specific conditions to it. General-to-specific techniques found widespread use because of their competitive performance and computational tractability in practice. General-to-specific techniques use a method of sequential covering that iteratively discovers multiple disjuncts (D
1
then D
2
then D
3
, and so on). Every time a rule is learned, the examples supported by it are removed before the next iteration. Thus, after D
1
is discovered, records where its antecedent is true are removed.
Existing techniques strive for the accuracy of each disjunct with respect to the predicted target class. Each disjunct is expected to cover a disjoint signature of the target class. If all signatures of the target class are pure, such that each signature covers very few negative examples (i.e. examples not of the target class), then this approach may produce satisfactory results.
However, there are several situations in which a sequential covering method of discovering high accuracy disjuncts may fail. For example, when the target class signature is composed of two components, presence of the target class and absence of the non-target-class, and the later component is not correctly or completely learned. This can happen, especially for rare classes, when a signature for the presence of the class is inherently impure by itself. For example, in intrusion detection, for a rare attack type r2l , a signature for the presence of an example might be connection_type=ftp. However, this will also cover ftp connections made to flood the computer in a denial-of-service (dos) attack. Thus, the rule for r2l has to be refined by detecting signatures for the absence of dos attacks.
In many existing techniques, accuracy constraints cause each rule to be refined immediately by adding more conjunctive conditions that detect a second component. However, individual rules may not cover a sufficient number of negative examples needed for learning desirable signatures, leading to splintered false positives.
Another problem in existing sequential-covering methods is small disjuncts, in which are rules that cover a small number of target class examples, typically less than about 1 or 2 percent of the target class, are more prone to generalization error than rules covering a larger number of such examples. For prevalent classes, such rules usually arise in the later iterations of the method, because the remainder dataset includes a small number of target class examples to learn from. However, for rare classes, such rules may appear very early in the method because the total number of target class examples is relatively small.
Small disjuncts are manifested in existing techniques, because of the constraints on the accuracy of each rule. The constraints result in low overall support for the rule. From statistical point of view, decisions made with small evidential support are often unreliable.
Therefore a need exists for a system and method to overcome splintered false positives and small disjuncts, having the ability to learn complete and precise classifier models for target classes including rare examples.
SUMMARY OF THE INVENTION
According to an embodiment of the present invention, a program storage device is provided readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for predicting a target class within a dataset. The method includes predicting the presence of a plurality of examples of the target class, and predicting the absence of the target class among the plurality of examples predicted to have presence of the target class.
The method further includes weighing an effect of each absence prediction on each presence prediction.
The target class includes less than three percent of the examples in the dataset. The target class includes less than two percent of the examples in the dataset.
The method includes evaluating the presence prediction and the absence prediction according to the equation:
F=
2*recall*precision/(recall+precision)
wherein F is a balance between recall and precision. The recall is a predefined fraction of the examples among all target class examples. The precision is a predefined fraction of correctly predicted target class examples among all predicted examples.
According to an embodiment of the present invention, a method is provided for learning a classifier model which determines examples of a target class in a dataset. The method includes learning a plurality of positive rules supporting a plurality of examples of the target class, learning a plurality of negative rules removing a plurality of false positive examples among the examples supported by the positive rules, and weighing an effect of each negative rule on each positive rule.
The presence prediction achieves at least a pre
Agarwal Ramesh
Joshi Mahesh V.
Davis George B.
F. Chau & Associates LLC
International Business Machines - Corporation
LandOfFree
Method for building classifier models for event classes via... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for building classifier models for event classes via..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for building classifier models for event classes via... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3357997