Method and system for generating a decision-tree classifier inde

Data processing: database and file management or data structures – Database design – Data structure types

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

3642253, 3642254, 3642823, 364974, 364DIG1, 364DIG2, 711100, 382 36, G06F 1730

Patent

active

057993114

ABSTRACT:
A method and system are disclosed for generating a decision-tree classifier from a training set of records, independent of the system memory size. The method comprises the steps of: generating an attribute list for each attribute of the records, sorting the attribute lists for numeric attributes, and generating a decision tree by repeatedly partitioning the records using the attribute lists. For each node, split points are evaluated to determine the best split test for partitioning the records at the node. Preferably, a gini index and class histograms are used in determining the best splits. The gini index indicates how well a split point separates the records while the class histograms reflect the class distribution of the records at the node. Also, a hash table is built as the attribute list of the split attribute is divided among the child nodes, which is then used for splitting the remaining attribute lists of the node. The created tree is further pruned based on the MDL principle, which encodes the tree and split tests in an MDL-based code, and determines whether to prune and how to prune each node based on the code length of the node.

REFERENCES:
patent: 4719571 (1988-01-01), Rissanen et al.
patent: 5418946 (1995-05-01), Mori
patent: 5463773 (1995-10-01), Sakakibara et al.
R.G.& G. Nagy, "Decision tree design using probabilistic model," IEEE Trans, vol. 30, pp. 191-199, Jan. 1984.
Gini index, L Breiman, J. H. Friedman, R. A. Olshen, & C. Stone, "Classification & Regression Trees", Wadsworth International Group, Belmont, CA, Jan. 1984.
Moura Pires, "Adecision Tree Algorithm with Segmenation", Proceedings IECON 91. International Conference on Industrial Electronics Control, and Instrumentation, vol. 3, pp. 2077-2082, Nov. 1991.
J. R. Quinlan, "Introduction of Decision Trees," (Abstract), Machine Learning 1:86-106, Jan. 1986.
Mehta et al., "A fast scalable classifier for data mining,"In EDBT 96, Avignon, France, Mar. 1996.
R. Agrawal et al., An Interval Classifier for Database Mining Applications, Proceedings of the 18th VLDB Conference Vancouver, British Columbia, Aug. 1992.
R. Agrawal et al., Database Mining: A Performance Perspective, IEEE Transactions on Knowledge and Data Engineering, vol. 5, No. 6, pp. 914-925, Special Issue on Learning and Discovery in Knowledge-Based Databases, Dec. 1993.
L. Breiman (Univ. of CA-Berkeley) et al. Classification and Regression Trees (Book) Chapter 2. Introduction to Tree Classification pp. 18-58, Wadsworth International Group, Belmont, CA 1984.
J. Catlett, Megainduction: Machine Learning on Very Large Databases, PhD thesis, Univ. of Sydney, Jun./Dec. 1991.
P. K. Chan et al., Experiments on Multistrategy Learning by Meta-learning. In Proc. Second Intl. Conf. on Info. and Knowledge Mgmt., pp. 314-323, 1993.
U. Fayyad et al., The Attribute Selection Problem in Decision Tree Generation. In 105h Nat'l Conf. on AI AAAI-92, Learning: Inductive 1992.
M. James, Classification Algorithms (book), Chapters 1-3, QA278.65, J281 Wiley-Interscience Pub., 1985.
M. Mehta et al., Mdl-based Decision Tree Pruning. Int'l Conference on Knowledge Discovery in Databases and Data Mining (KDD-95) Montreal, Canada, pp. 216-221, Aug. 1995.
J. R. Quinlan et al., Inferring Decision Trees Using Minimum Description Length Principle, Information and Computation 80, pp. 227-248, 1989. (0890-5401/89 Academic Press, Inc.).
Wallace et al., Coding Decision Trees, Machine Learning, 11, pp. 7-22, 1993. (Kluwer Academic Pub., Boston. Mfg. in the Netherlands.).
S. M. Weiss et al., Computer Systems that learn, Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems, pp. 113-143, 1991. Q325.5, W432, C2, Morgan Kaufmann Pub. Inc., San Mateo, CA.
M. Mehta, R. Agrawal & J. Rissanen, SLIQ: Fast Scalable Classifier for Data Mining, In EDBT 96, Avignon, France, March 1996.
R. P. Lippmann, An Introduction to Computing with Neural Nets, IEEE ASSP Magazine, pp. 4-22, 0740-7467/87/0400, Apr. 1987.
D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Chapter 6, Intro. to Genetics Based Machine Learning, pp. 218-257, (Book), 1989.
U.S. Application No. 08/500,717, filed Jul. 11, 1995, for System and Method for Parallel Mining of Association Rules in Databases.
U.S. Application No. 08/541,665, filed Oct. 10, 1995, for Method and System for Mining Generalized Sequential Patterns in a Large Database.
U.S. Application No. 08/564,694, filed Nov. 29, 1995, for Method and System for Generating a Decision-tree Clarifier for Data Records.
No serial number. Filed May 1, 1996, IBM Doc. No. AM9-96-015, Method and System for Generating a Decision-Tree Classifier in Parallel in a Multi-Processor System.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for generating a decision-tree classifier inde does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for generating a decision-tree classifier inde, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for generating a decision-tree classifier inde will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-46930

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.