Finding the most interesting patterns in a database quickly...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C702S083000, C714S037000, C714S049000

Reexamination Certificate

active

07136844

ABSTRACT:
Many discovery problems, e.g., subgroup or association rule discovery, can naturally be cast as n-best hypotheses problems where the goal is to find the n hypotheses from a given hypothesis space that score best according to a certain utility function. We present a sampling algorithm that solves this problem by issuing a small number of database queries while guaranteeing precise bounds on confidence and quality of solutions. Known sampling approaches have treated single hypothesis selection problems, assuming that the utility be the average (over the examples) of some function—which is not the case for many frequently used utility functions. We show that our algorithm works for all utilities that can be estimated with bounded error. We provide these error bounds and resulting worst-case sample bounds for some of the most frequently used utilities, and prove that there is no sampling algorithm for a popular class of utility functions that cannot be estimated with bounded error. The algorithm is sequential in the sense that it starts to return (or discard) hypotheses that already seem to be particularly good (or bad) after a few examples. Thus, the algorithm is almost always faster than its worst-case bounds.

REFERENCES:
patent: 5675786 (1997-10-01), McKee et al.
patent: 6278989 (2001-08-01), Chaudhuri et al.
patent: 6282570 (2001-08-01), Leung et al.
patent: 6532458 (2003-03-01), Chaudhuri et al.
patent: 6542886 (2003-04-01), Chaudhuri et al.
“A Sequential Sampling Algorithm for a General Class of Utility Criteria” by Tobias Scheffer and Stefan Wrobel, University of Magdeburg, FIN/IWS, P.O. Box 4120, 39016 Magdeburg, Germany.
“An Algorithm for Multi-Relational Discovery of Subgroups”, in Principles of Data Mining and Knowledge Discovery: First European Symposium; proceedings/PKDD '97, J. Komorowski J. Zytkow (eds.), pp. 78-87, Springer Verlag, Berlin, New York, 1997.
“PALO: A Probabilistic Hill-Climbing Algoirthm”, Russell Greiner, Siemens Corporate Research, Princeton, NJ. This paper expands the short article, “Probablistic Hill-Climbing: Theory and Applications” that was awarded the “Artificial Intelligence Journal Best Paper Award” at the Ninth Canadian Conference on Artificial Intelligence (CSCSI-92), in Vancouver, May 1992.
“Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation” by Oded Maron and Andrew W. Moore.
“Adaptive Sampling Method for Scaling Up Knowledge Discovery Algorithms”, by Carlos Domingo, Ricard Gavalda, and Osamu Watanabe, Technical Reports on Mathematical and Computing Sciences: TR-C131, 1999.
“Sampling Large Databases for Association Rules” by Hannu Toivonen, University of Helsiniki, Department of Computer Sciences, FIN-00014 University of Helsinki, Finland.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Finding the most interesting patterns in a database quickly... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Finding the most interesting patterns in a database quickly..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Finding the most interesting patterns in a database quickly... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3673867

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.