Data processing: database and file management or data structures – Database design – Data structure types
Patent
1998-04-07
2000-07-18
Alam, Hosain T.
Data processing: database and file management or data structures
Database design
Data structure types
707100, G06F 1730
Patent
active
060920726
ABSTRACT:
The present invention relates to a computer method, apparatus and programmed medium for clustering large databases. The present invention represents each cluster to be merged by a constant number of well scattered points that capture the shape and extent of the cluster. The chosen scattered points are shrunk towards the mean of the cluster by a shrinking fraction to form a representative set of data points that efficiently represent the cluster. The clusters with the closest pair of representative points are merged to form a new cluster. The use of an efficient representation of the clusters allows the present invention to obtain improved clustering while efficiently eliminating outliers.
REFERENCES:
patent: 4945549 (1990-07-01), Simon et al.
patent: 5040133 (1991-08-01), Feintuch et al.
patent: 5263120 (1993-11-01), Bickel
patent: 5325466 (1994-06-01), Kornacker
patent: 5452371 (1995-09-01), Bozinovic et al.
patent: 5555196 (1996-09-01), Asano
patent: 5675791 (1997-10-01), Bhide et al.
patent: 5696877 (1997-12-01), Iso
patent: 5706503 (1998-01-01), Poppen et al.
patent: 5710915 (1998-01-01), McElhiney
patent: 5784283 (1998-06-01), Pingali et al.
patent: 5796924 (1998-08-01), Errico et al.
patent: 5832182 (1998-11-01), Zhang et al.
patent: 5940832 (1999-08-01), Hamada et al.
patent: 5983224 (1998-08-01), Singh et al.
patent: 6012058 (2000-01-01), Fayyad et al.
Tian Zhang, et al. Birch: An Efficient Data Clustering Method For Very Large Databases. In Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 103-114, Montreal, Canada, Jun. 1996.
Eui-Hong Han, et al. Clustering Based On Association Rule Hypergraphs. Technical report, 1997 SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Jun. 1997.
Martin Ester, et al. A Database Interface For Clustering In Large Spatial Databases. In International Conference on Knowledge Discovery in Databases and Data Mining (KDD-95), Montreal, Canada, Aug. 1995.
Raymond T. Ng, et al. Efficient And Effective Clustering Methods For Spatial Data Mining. In Proc. of the VLDB Conference, Santiago, Chile, Sep. 1994.
Jeffrey Scott Vitter. Random Sampling With A Reservoir. ACM Transactions on Mathematical Software, 11(1):37-57, 1985.
Martin Ester, et al. A Density-Based Algorithm For Discovering Clusters In Large Spatial Database With Noise. In International Conference on Knowledge Discovery in Databases and Data Mining (KDD-96), Montreal, Canada, Aug. 1996.
Guha Sudipto
Rastogi Rajeev
Shim Kyuseok
Alam Hosain T.
Colbert Ella
Lucent Technologies - Inc.
LandOfFree
Programmed medium for clustering large databases does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Programmed medium for clustering large databases, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Programmed medium for clustering large databases will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2047850