Data processing: database and file management or data structures – Database design – Data structure types
Patent
1999-02-05
2000-10-24
Breene, John E.
Data processing: database and file management or data structures
Database design
Data structure types
707 6, 707 7, 707103, 707 10, 709215, 709252, G06F 1730
Patent
active
061381152
ABSTRACT:
A method and system are disclosed for generating a decision-tree classifier in parallel in a multi-processor system, from a training set of records. The method comprises the steps of: partitioning the records among the processors, each processor generating an attribute list for each attribute, and the processors cooperatively generating a decision tree by repeatedly partitioning the records using the attribute lists. For each node, each processor determines its best split test and, along with other processors, selects the best overall split for the records at that node. Preferably, the gini-index and class histograms are used in determining the best splits. Also, each processor builds a hash table using the attribute list of the split attribute and shares it with other processors. The hash tables are used for splitting the remaining attribute lists. The created tree is then pruned based on the MDL principle, which encodes the tree and split tests in an MDL-based code, and determines whether to prune and how to prune each node based on the code length of the node.
REFERENCES:
patent: 5870735 (1999-02-01), Agrawal et al.
R. Agrawal et al., An Interval Classifier for Database Mining Applications, Proceedings of the 18th VLDB Conference Vancouver, British Columbia, Aug. 1992.
R. Agrawal et al., Database Mining: A Performance Perspective, IEEE Transactions on Knowledge and Data Engineering, vol. 5, No. 6, pp. 914-925, Special Issue on Learning and Discovery in Knowledge-Based Databases, Dec. 1993.
L. Breiman (Univ. of CA-Berkeley) et al. Classification and Regression Trees (Book) Chapter 2. Introduction to Tree Classification pp. 18-58, Wadsworth International Group, Belmont, CA 1984.
J. Catlett, Megainduction: Machine Learning on Very Large Databases, PhD thesis, Univ. of Sydney, Jun./Dec. 1991.
P. K. Chan et al., Experiments on Multistrategy Learning by Meta-learning. In Proc. Second Intl. Conf. on Info. and Knowledge Mgmt., pp. 314-323, 1993.
D. J. DeWitt, J. F. Naughton and D. A. Schneider, Parallel Sorting on Shared-Nothing Architecture Using Probabilistic Splitting, In Proc. of the 1st Int'l Conf. on Parallel and Distributed Information Systems, pp. 280-291, Dec. 1991.
U. Fayyad et al., The Attribute Selection Problem in Decision Tree Generation. In 105h Nat'l Conf. on AI AAAI-92, Learning: Inductive 1992.
M. James, Classification Algorithms (book), Chapters 1-3, QA278.65, J281 Wiley-Interscience Pub., 1985.
M. Mehta et al., Mdl-based Decision Tree Pruning. Int'l Conference on Knowledge Discovery in Databases and Data Mining (KDD-95) Montreal, Canada, pp. 216-221, Aug. 1995.
J. R. Quinlan et al., Inferring Decision Trees Using Minimum Description Length Principle, Information and Computation 80, pp. 227-248, 1989. (0890-5401/89 Academic Press, Inc.).
Wallace et al., Coding Decision Trees, Machine Learning, 11, pp. 7-22, 1993. (Kluwer Academic Pub., Boston. Mfg. in the Netherlands.).
S. M. Weiss et al., Computer Systems that Learn, Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems, pp.113-143, 1991. Q325.5, W432, C2, Morgan Kaufmann Pub. Inc., San Mateo, CA.
MPI: A Message-Passing Interface Standard, Message Passing Interface Forum May 5, 1994.
M. Mehta, R. Agrawal & J. Rissanen, SLIQ: Fast Scalable Classifier for Data Mining, In EDBT 96, Avignon, France, Mar. 1996.
R. P. Lippmann, An Introduction to Computing with Neural Nets, IEEE ASSP Magazine, pp. 4-22, 0740-7467/87/0400, Apr. 1987.
D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Chapter 6, Intro. to Genetics Based Machine Learning, pp. 218-257, (Book), 1989.
D. J. DeWitt, S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H. Hsiao & R. Rasmussen, The Gamma Database Machine Project, IEEE Transactions on Knowledge and Data Eng. vol. 2, No. 1, pp. 44-62, Mar. 1990.
No. 08/500,717, filed Jul. 11, 1995, for System and Method for Parallel Mining of Association Rules in Databases, Pat. No. 5,842,200.
No. 08/541,665, filed Oct. 10, 1995, for Method and System for Mining Generalized Sequential Patterns in a Large Database, Pat. No. 5,742,811.
No. 08/564,694, filed Nov. 29, 1995, for Method and System for Generating a Decision-tree Clarifier for Data Records, Pat. No. 5,787,274.
Agrawal Rakesh
Mehta Manish
Shafer John Christopher
Breene John E.
International Business Machines - Corporation
Lewis Cheryl
Tran Khanh Q.
LandOfFree
Method and system for generating a decision-tree classifier in p does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for generating a decision-tree classifier in p, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for generating a decision-tree classifier in p will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-1974995