Data processing: database and file management or data structures – Database design – Data structure types
Patent
1998-03-04
2000-09-05
Breene, John E.
Data processing: database and file management or data structures
Database design
Data structure types
707 3, 707 4, 707 5, G06F 1730
Patent
active
061157082
ABSTRACT:
As an optimization problem, clustering data (unsupervised learning) is known to be a difficult problem. Most practical approaches use a heuristic, typically gradient-descent, algorithm to search for a solution in the huge space of possible solutions. Such methods are by definition sensitive to starting points. It has been well-known that clustering algorithms are extremely sensitive to initial conditions. Most methods for guessing an initial solution simply make random guesses. In this paper we present a method that takes an initial condition and efficiently produces a refined starting condition. The method is applicable to a wide class of clustering algorithms for discrete and continuous data. In this paper we demonstrate how this method is applied to the popular K-means clustering algorithm and show that refined initial starting points indeed lead to improved solutions. The technique can be used as an initializer for other clustering solutions. The method is based on an efficient technique for estimating the modes of a distribution and runs in time guaranteed to be less than overall clustering time for large data sets. The method is also scalable and hence can be efficiently used on huge databases to refine starting points for scalable clustering algorithms in data mining applications.
REFERENCES:
patent: 5442778 (1995-08-01), Pedersen et al.
patent: 5758147 (1998-05-01), Chen et al.
patent: 5787422 (1998-07-01), Tukey et al.
patent: 5832182 (1998-11-01), Zhang et al.
patent: 5884305 (1999-03-01), Kleinberg et al.
patent: 5920856 (1999-07-01), Syeda-Mahmood
patent: 5930784 (1999-07-01), Hendrickson
patent: 5983224 (1999-11-01), Singh et al.
patent: 6012058 (2000-01-01), Fayyad et al.
J. Banfield and A. Patery, "Model-based Gaussian and Non-Gaussian Clustering", Biometrics, vol. 39: 803-821,pp 15-34, (1993).
P.S. Bradley, O.I. Mangasarian, and W.N. Street. 1997. Clustering via Concave Minimization, in Advances in Neural Information Processing systems 9, M.C. Mozer, M.I. Jordan, and T. Petsche (Eds.) pp. 368-374, MIT Press, (1997).
P. Cheeseman and J. Stutz, "Bayesian Classification (AutoClass): Theory and Results", in [FPSU96], pp. 153-180. MIT Press, (1996).
A. P. Dempster, N.M. Laird, and D. Rubin, "Maximum Likelihood From Incomplete Data Via the EM Algorithm". Journal of the Royal Statistical Society, Series B, 39(1):pp. 1-38, (1977).
U. Fayyad, D. Haussler, and p. Stolorz. "Mining Science Data." Communications of the ACM 39(11), (1996).
D. Fisher. "Knowledge Acquisition via Incremental Conceptual Clustering". Machine Learning, 2:139-172, (1987).
E. Forgy, "Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classifications", Biometrics 21:768. (1965).
Jones, "A Note on Sampling From a Tape File". Communications of the ACM, vol. 5, (1962).
S. Z. Selim and M.A. Ismail, "K-Means-Type Algorithms: A Generalized Convergence Therorem and Characterization of Local Optimality." IEEE Trans. on Pattern Analysis and machine Intelligence, vol. PAMI-6, No. 1, (1984).
T. Zhang, R. Ramakrishnan, and M. Livny. "BIRCH: A new Data Clustering Algorithm and its Applications", Data Mining and knowledge Discovery, vol. 1, No. 2, (1997).
Radford M. Neal and Geoffrey E. Hinton, A View of the Em Algorithm That Justifies Incremental, Sparse and Other Variants, (date unknown).
Bo Thiesson, Christopher Meek, David Maxwell Chickering and David Heckerman, "Learning Mixtures of DAG Models" Technical Report MSR-TR-97-30 Dec. (1997), Revised May 1998).
Bradley Paul S.
Fayyad Usama
Breene John E.
Microsoft Corporation
Robinson Greta L.
LandOfFree
Method for refining the initial conditions for clustering with a does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for refining the initial conditions for clustering with a, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for refining the initial conditions for clustering with a will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2222824