Method for refining the initial conditions for clustering with a

Data processing: database and file management or data structures – Database design – Data structure types

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

707 3, 707 4, 707 5, G06F 1730

Patent

active

061157082

ABSTRACT:
As an optimization problem, clustering data (unsupervised learning) is known to be a difficult problem. Most practical approaches use a heuristic, typically gradient-descent, algorithm to search for a solution in the huge space of possible solutions. Such methods are by definition sensitive to starting points. It has been well-known that clustering algorithms are extremely sensitive to initial conditions. Most methods for guessing an initial solution simply make random guesses. In this paper we present a method that takes an initial condition and efficiently produces a refined starting condition. The method is applicable to a wide class of clustering algorithms for discrete and continuous data. In this paper we demonstrate how this method is applied to the popular K-means clustering algorithm and show that refined initial starting points indeed lead to improved solutions. The technique can be used as an initializer for other clustering solutions. The method is based on an efficient technique for estimating the modes of a distribution and runs in time guaranteed to be less than overall clustering time for large data sets. The method is also scalable and hence can be efficiently used on huge databases to refine starting points for scalable clustering algorithms in data mining applications.

REFERENCES:
patent: 5442778 (1995-08-01), Pedersen et al.
patent: 5758147 (1998-05-01), Chen et al.
patent: 5787422 (1998-07-01), Tukey et al.
patent: 5832182 (1998-11-01), Zhang et al.
patent: 5884305 (1999-03-01), Kleinberg et al.
patent: 5920856 (1999-07-01), Syeda-Mahmood
patent: 5930784 (1999-07-01), Hendrickson
patent: 5983224 (1999-11-01), Singh et al.
patent: 6012058 (2000-01-01), Fayyad et al.
J. Banfield and A. Patery, "Model-based Gaussian and Non-Gaussian Clustering", Biometrics, vol. 39: 803-821,pp 15-34, (1993).
P.S. Bradley, O.I. Mangasarian, and W.N. Street. 1997. Clustering via Concave Minimization, in Advances in Neural Information Processing systems 9, M.C. Mozer, M.I. Jordan, and T. Petsche (Eds.) pp. 368-374, MIT Press, (1997).
P. Cheeseman and J. Stutz, "Bayesian Classification (AutoClass): Theory and Results", in [FPSU96], pp. 153-180. MIT Press, (1996).
A. P. Dempster, N.M. Laird, and D. Rubin, "Maximum Likelihood From Incomplete Data Via the EM Algorithm". Journal of the Royal Statistical Society, Series B, 39(1):pp. 1-38, (1977).
U. Fayyad, D. Haussler, and p. Stolorz. "Mining Science Data." Communications of the ACM 39(11), (1996).
D. Fisher. "Knowledge Acquisition via Incremental Conceptual Clustering". Machine Learning, 2:139-172, (1987).
E. Forgy, "Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classifications", Biometrics 21:768. (1965).
Jones, "A Note on Sampling From a Tape File". Communications of the ACM, vol. 5, (1962).
S. Z. Selim and M.A. Ismail, "K-Means-Type Algorithms: A Generalized Convergence Therorem and Characterization of Local Optimality." IEEE Trans. on Pattern Analysis and machine Intelligence, vol. PAMI-6, No. 1, (1984).
T. Zhang, R. Ramakrishnan, and M. Livny. "BIRCH: A new Data Clustering Algorithm and its Applications", Data Mining and knowledge Discovery, vol. 1, No. 2, (1997).
Radford M. Neal and Geoffrey E. Hinton, A View of the Em Algorithm That Justifies Incremental, Sparse and Other Variants, (date unknown).
Bo Thiesson, Christopher Meek, David Maxwell Chickering and David Heckerman, "Learning Mixtures of DAG Models" Technical Report MSR-TR-97-30 Dec. (1997), Revised May 1998).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for refining the initial conditions for clustering with a does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for refining the initial conditions for clustering with a, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for refining the initial conditions for clustering with a will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2222824

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.