Rule induction on large noisy data sets

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

395 20, 395 21, 395 22, 395 23, 395 50, 395 77, G06F 1700, G06F 1500

Patent

active

057196923

ABSTRACT:
Efficient techniques for inducing rules used in classifying data items on a noisy data set. The prior-art IREP technique, which produces a set of classification rules by inducing each rule and then pruning it and continuing thus until a stopping condition is reached, is improved with a new rule-value metric for stopping pruning and with a stopping condition which depends on the description length of the rule set. The rule set which results from the improved IREP technique is then optimized by pruning rules from the set to minimize the description length and further optimized by making a replacement rule and a modified rule for each rule and using the description length to determine whether to use the replacement rule, the modified rule, or the original rule in the rule set. Further improvement is achieved by inducing rules for data items not covered by the original set and then pruning these rules. Still further improvement is gained by repeating the steps of inducing rules for data items not covered, pruning the rules, optimizing the rules, and again pruning for a fixed number of times. The fully-developed technique has the O(nlog.sup.2 n) running time characteristic of IREP, but produces rule sets which do a substantially better job of classification than those produced by IREP.

REFERENCES:
patent: 5222197 (1993-06-01), Teng et al.
patent: 5265192 (1993-11-01), McCormack
patent: 5373486 (1994-12-01), Dowla et al.
patent: 5444796 (1995-08-01), Ornstein et al.
patent: 5481650 (1996-01-01), Cohen
patent: 5504840 (1996-04-01), Hiji et al.
patent: 5588091 (1996-12-01), Alkon et al.
patent: 5590218 (1996-12-01), Ornstein
J. Furnkraz et al., Incremental Reduced Error Pruning, Proc. of the Eleventh International Conf., Jul. 10, 1994-Oct. 13, 1994, New Brunswick, NJ, USA, pp. 70-77.
J. R. Quinlan et al., "Induction of Logic Programs: Foil and Related Systems," New Generation Computing, vol. 13, No. 3-4, 1995, pp. 287-312.
J. Furnkranz, "A Tight Integration of Pruning and Learning," Machine Learning: ECMl-95, Apr. 25-27, 1994, pp. 291-294.
J. R. Quinlan, "MDL and Categorical Theories," Machine Learning: Proceedings of the 12th International Conference on Machine Lerning, Jul. 9-12, 1995, Tahoe City, CA, USA, pp. 464-470.
J. Furnkranz, "FOSSIL: A Robust Relational Learner," Machine Learning: ECML-94, Apr. 6-8, 1994, Catania, IT, pp. 122-137.
J. R. Quinlan et al., "FOIL: A Midterm Report," Machine Learning: ECML-93, Apr. 5-7, 1993, vienna, AT, pp. 3-20.
P. Clark, R. Boswell, "Rule Induction with CN2 Some Recent Improvements", Machine Learning --Proceedings of the Fifth European Conf. (EWSL-91), pp 151-163, Springer-Verlag (1991).
W. W. Cohen, "Efficient Pruning Methods for Separate-and-Conquer Rule Learning Systems", Machine Learning, Proceedins of the 13th International Joint conf. on Artificial Intelligence, pp. 988-994, 1993.
J. R. Quinlan, "{Cr.t}: Programs for Machine Learning", Chapter 5, From Trees to Rules, publisher: Morgan Kaufmann, pp. 43-53, 1994.
J. Furnkranz, G. Widmer, "Incremental Reduced Error Pruning", Machine Learning: Proceedings of the Eleventh Annual Conf., Publisher: Morgan Kaufmann, 1994.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Rule induction on large noisy data sets does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Rule induction on large noisy data sets, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Rule induction on large noisy data sets will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-1788341

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.