Method for identifying outliers in large data sets

Data processing: artificial intelligence – Knowledge processing system

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Method for identifying outliers in large data sets Method for identifying outliers in large data sets

: 1999-11-18
: 2003-11-04
: Voeltz, Emanuel Todd (Department: 2121)
: Data processing: artificial intelligence
: Knowledge processing system

: C706S012000, C706S925000
: Reexamination Certificate
: active
: 06643629
: ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to data sets, and more particularly to a method for identifying particular data points of interest in a large data set.
BACKGROUND OF THE INVENTION
The ability to identify particular data points in a data set that are dissimilar from the remaining points in the set has useful applications in the scientific and financial fields. For example, identifying such dissimilar points, which are commonly referred to as outliers, can be used to identify abnormal usage patterns for a credit card to detect a stolen card. The points in the abnormal usage pattern associated with the unauthorized use of the stolen card are deemed outliers with respect to the normal usage pattern of the cardholder.
Conventional methods employed for identifying outliers typically use an algorithm which relies upon a distance-based definition for outliers in which a point p in a data set is an outlier if no more than k points in the data set are at a distance of d or less from the point p. The distance d function can be measured using any conventional metric.
Although, methods which employ the aforementioned conventional distance-based definition of outliers can be used to identify such points in large data sets, they suffer from a significant drawback. Specifically, they are computationally expensive since they identify all outliers rather than ranking and thus identifying only particular outliers that are of interest. In addition, as the size of a data set increases, conventional methods require increasing amounts of time and hardware to identify the outliers.
SUMMARY OF THE INVENTION
A new method for identifying a predetermined number of outliers of interest in a large data set. The method uses a new definition of outliers in which such points are ranked in relation to their neighboring points. The method also employs new partition-based detection algorithms to partition the data points, and then compute upper and lower bounds for each partition. These bounds are then used to identify and eliminate those partitions that cannot possibly contain the predetermined number of outliers of interest. Outliers are then computed from the remaining points residing in the partitions that were not eliminated. The present method eliminates a significant number of data points from consideration as outliers, thereby resulting in substantial savings in computational expense compared to conventional methods employed to identify such points.

REFERENCES:
patent: 6003029 (1999-12-01), Agrawal et al.
patent: 6049797 (2000-04-01), Guha et al.
patent: 6092072 (2000-07-01), Guha et al.
Sumit Sen et al; Clustering of Relational Data Containing Noise and Outliers; 1998; IEEE; 0-7803-4863-X/98; 1998; 1411-1416.

Affiliated with

Ramaswamy Sridhar

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Rastogi Rajeev

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Shim Kyuseok

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Hirl Joseph P.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Lucent Technologies - Inc.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Todd Voeltz Emanuel

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for identifying outliers in large data sets does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for identifying outliers in large data sets, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for identifying outliers in large data sets will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3138251

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure