Database aggregation query result estimator

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C704S200000, C714S026000

Reexamination Certificate

active

10873569

ABSTRACT:
Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.

REFERENCES:
patent: 5263120 (1993-11-01), Bickel
patent: 5483636 (1996-01-01), Saxena
patent: 5606164 (1997-02-01), Price et al.
patent: 5799301 (1998-08-01), Castelli et al.
patent: 5809491 (1998-09-01), Kayalioglu et al.
patent: 5909678 (1999-06-01), Bergman et al.
patent: 5946692 (1999-08-01), Faloutsos et al.
patent: 6094651 (2000-07-01), Agrawal et al.
patent: 6185512 (2001-02-01), Lambrecht
patent: 2002/0123979 (2002-09-01), Chaudhuri et al.
patent: 2002/0127529 (2002-09-01), Cassuto et al.
S. Chaudhuri and V. Narasayya, Program for TPC-D Data Generation with Skew. ftp.research.microsoft.com/pub/users/viveknar.tpcdskew.
H. Jagdish, N. Koudas and S. Muthukrishnan , “Mining Deviants in Times Series Database”, In Proceedings of 25thInternational Conference Very Large Data Bases, pp. 102-113, 1999.
E. Knorr and R. Ng, “Algorithms for Mining Distance-Based Outliers in Large Datasets”, In Proceedings of 24thInternational Conference Very Large Data Bases, pp. 392-403, 1998.
J.F. Naughton and S. Seshadri, “On Estimating the Size of Projections”, In Proceedings Third International Conference on Database Theory, pp. 499-513, 1990.
W.G. Cochran, “Sampling Techniques”, John Wiley & Sons, New York, third edition, 1977, Chapter 3, pp. 50-71.
Y. loannidis and V. Poosala, “Histogram Based Approximations of Set-Valued Query Answers”, In Proceedings of 25thInternational Conference Very Large Data Bases, pp. 174-185, 1999.
V. Ganti, M.L. Lee and R. Ramakrishnan, “ICICLES: Self-Tuning Samples for Approximate Query Answering”, In Proc. 26thVLDB, 2000.
S. Acharya, P.B. Gibbons, V. Poosala, “Congressional Samples for Approximate Answering of Group-By Queries”, ACM SIGMOD 2000, May 2000, Dallas, Texas.
S. Acharya, P.B. Gibbons, V. Poosala and S. Ramaswamy, “Join Synopses for Approximate Query Answering”, SIGMOD 1999, Philadelphia, PA.
P.B. Gibbons and Y. Matias, “New Sampling-Based Summary Statistics for Improving Approximate Query Answers”, SIGMOD 1998, Seattle, WA.
P..J. Haas and J.M. Hellerstein, “Ripple Joins for Online Aggregation”, SIGMOD 1999, Philadelphia, PA.
S. Chaudhuri, R.Motwani and V. Narasayya, “On Random Sampling Over Joins”, SIGMOD 1999, Philadelphia, PA.
G.S. Manku, S. Rajagopalan, B.G. Lindsay, “Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets”, SIGMOD 1999, Philadelphia, PA.
S. Ganguly, P.B. Gibbons, Y. Matias and A. Silberschatz, “Bifocal Sampling for Skew-Resistant Join Size Estimation”, pp. 271-281, SIGMOD '96, Jun. 1996, Montreal Canada.
G.K. Ziph, PhD, “Human Behavior and The Principle of Least Effort”, 1949, Addison-Wesley Press, Inc.
P.J. Hass, J.F. Naughton and A.N. Swami, “On the Relative Cost of Sampling for Join Selectivity Estimation”, pp. 14-24, SIGMOD/PODS 94-May 1994, Minneapolis, Minnesota USA.
J.M. Hellerstein, P.J. Haas and H.J. Wang, “Online Aggregation”, pp. 171-182, SIGMOD '97 AZ, USA.
R.J. Lipton, Jeffrey F. Naughton, D.A. Schneider and S. Seshadri, “Efficient Sampling Strategies for Relational Database Operations”, Theoretical Computer Science 116 (1993) 195-226.
Wen-Chi Hou, G. Ozaoyoglu and E. Dogdu, “Error-Constrained COUNT Query Evaluation in Relational Databases”, pp. 278-287, May 29-31, 1991, Denver, Colorado, Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data.
F. Olken and D. Rotem, “Random Sampling from Databases—A Survey”, Abstract and pp. 1-55, Mar. 22, 1994.
R. Motwani and P. Raghavan, “Randomized Algorithms”, 1995.
F. Olken, “Simple Random Sampling from Relational Databases”, Proceedings from Twelfth International Conference on Very Large Databases, Kyoto, Aug. 1986, p. 160-169.
G. Piatetsky-Shapiro and C. Connell, “Accurate Estimation of the Number of Tuples Satisfying a Condition”, pp. 256-276, 1984 ACM.
S. Chauhuri, R. Motwani and V. Narasayya, “Random Sampling for Histogram Construction: How much is enough”, 12 pages.
G.S. Manku, S. Rajagopalan and B.G. Lindsay, “Approximate Medians and other Quantiles in One Pass and with Limited”, pp. 426-435, SIGMOD '98, Seattle Washington.
F.Olken, “Random Sampling from Databases”, 1993, pp. 1-158.
Barnett et al., Table of Contents, “Outliers in Statistical Data,” John Wiley, 3rd Edition (1994).
Chatfield, Table of Contents, “The Analysis of Time Series,” Chapman and Hall (1984).
Hawkins, Table of Contents, “Identification of Outliers,” Chapman and Hall (1980).
Etzio et al., Table of Contents, “Temporal Databases: Research and Practice LNCS 1399,” Springer Verlag (1988).
Gray et al., Table of Contents, “Transaction Processor: Concepts and Techniques,” Morgan Kaufmann (1993).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Database aggregation query result estimator does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Database aggregation query result estimator, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Database aggregation query result estimator will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3805390

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.