Data processing: artificial intelligence – Knowledge processing system
Reexamination Certificate
1999-11-18
2003-11-04
Voeltz, Emanuel Todd (Department: 2121)
Data processing: artificial intelligence
Knowledge processing system
C706S012000, C706S925000
Reexamination Certificate
active
06643629
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to data sets, and more particularly to a method for identifying particular data points of interest in a large data set.
BACKGROUND OF THE INVENTION
The ability to identify particular data points in a data set that are dissimilar from the remaining points in the set has useful applications in the scientific and financial fields. For example, identifying such dissimilar points, which are commonly referred to as outliers, can be used to identify abnormal usage patterns for a credit card to detect a stolen card. The points in the abnormal usage pattern associated with the unauthorized use of the stolen card are deemed outliers with respect to the normal usage pattern of the cardholder.
Conventional methods employed for identifying outliers typically use an algorithm which relies upon a distance-based definition for outliers in which a point p in a data set is an outlier if no more than k points in the data set are at a distance of d or less from the point p. The distance d function can be measured using any conventional metric.
Although, methods which employ the aforementioned conventional distance-based definition of outliers can be used to identify such points in large data sets, they suffer from a significant drawback. Specifically, they are computationally expensive since they identify all outliers rather than ranking and thus identifying only particular outliers that are of interest. In addition, as the size of a data set increases, conventional methods require increasing amounts of time and hardware to identify the outliers.
SUMMARY OF THE INVENTION
A new method for identifying a predetermined number of outliers of interest in a large data set. The method uses a new definition of outliers in which such points are ranked in relation to their neighboring points. The method also employs new partition-based detection algorithms to partition the data points, and then compute upper and lower bounds for each partition. These bounds are then used to identify and eliminate those partitions that cannot possibly contain the predetermined number of outliers of interest. Outliers are then computed from the remaining points residing in the partitions that were not eliminated. The present method eliminates a significant number of data points from consideration as outliers, thereby resulting in substantial savings in computational expense compared to conventional methods employed to identify such points.
REFERENCES:
patent: 6003029 (1999-12-01), Agrawal et al.
patent: 6049797 (2000-04-01), Guha et al.
patent: 6092072 (2000-07-01), Guha et al.
Sumit Sen et al; Clustering of Relational Data Containing Noise and Outliers; 1998; IEEE; 0-7803-4863-X/98; 1998; 1411-1416.
Ramaswamy Sridhar
Rastogi Rajeev
Shim Kyuseok
Hirl Joseph P.
Lucent Technologies - Inc.
Todd Voeltz Emanuel
LandOfFree
Method for identifying outliers in large data sets does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for identifying outliers in large data sets, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for identifying outliers in large data sets will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3138251