Sampling for aggregation queries

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06842753

ABSTRACT:
Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.

REFERENCES:
patent: 5263120 (1993-11-01), Bickel
patent: 5483636 (1996-01-01), Saxena
patent: 5799301 (1998-08-01), Castelli et al.
patent: 5809491 (1998-09-01), Kayalioglu et al.
patent: 5946692 (1999-08-01), Faloutsos et al.
patent: 6094651 (2000-07-01), Agrawal et al.
patent: 20020123979 (2002-09-01), Chaudhuri et al.
patent: 20020127529 (2002-09-01), Cassuto et al.
S. Acharya, P.B. Gibbons, V. Poosala, “Congressional Samples for Approximate Answering of Group-By Queries”, ACM SIGMOD 2000, May 2000, Dallas, Texas.
S. Acharya, P.B. Gibbons, V. Poosala and S. Ramaswamy, “Join Synopses for Approximate Query Answering”, SIGMOD 1999, Philadelphia, PA.
P.B. Gibbons and Y. Matias, “New Samping-Based Summary Statistics for Improving Approximate Query Answers”, SIGMOD 1998, Seattle, WA.
P..J. Haas and J.M. Hellerstein, “Ripple Joins for Online Aggregation”, SIGMOD 1999, Philadelphia, PA.
S. Chaudhuri, R.Motwani and V. Narasayya, “On Random Sampling Over Joins”, SIGMOD 1999, Philadelphia, PA.
G.S. Manku, S. Rajagopalan, B.G. Lindsay, “Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets”, SIGMOD 1999, Philadelphia, PA.
S. Ganguly, P.B. Gibbons, Y. Matias and A. Silberschatz, “Bifocal Sampling for Skew-Resistant Join Size Estimation”, pp. 271-281, SIGMOD '96, Jun. 1996, Montreal Canada.
G.K. Ziph, PhD, “Human Behavior and The Principle of Least Effort”, 1949, Addison-Wesley Press, Inc.
P.J. Hass, J.F. Naughton and A.N. Swami, “On the Relative Cost of Sampling for Join Selectivity Estimation”, pp. 14-24, SIGMOD/PODS 94—May 1994, Minneapolis, Minnesota USA.
J.M. Hellerstein, P.J. Haas and H.J. Wang, “Online Aggregation”, pp. 171-182, SIGMOD '97 AZ, USA.
R.J. Lipton, Jeffrey F. Naughton, D.A. Schneider and S. Seshadri, “Efficient Sampling Strategies for Relational Database Operations”, Theoretical Computer Science 116 (1993) 195-226.
Wen-Chi Hou, G. Ozaoyoglu and E. Dogdu, “Error-Constrained COUNT Query Evaluation in Relational Databases”, pp. 278-287, May 29-31, 1991, Denver, Colorado, Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data.
F. Olken and D. Rotem, “Random Sampling from Databases—A Survey”, Abstract and pp. 1-55, Mar. 22, 1994.
R. Motwani and P. Raghavan, “Randomized Algorithms”, 1995.
F. Olken, “Simple Random Sampling from Relational Databases”, Proceedings from Twelfth International Conference on Very Large Databases, Kyoto, Aug. 1986, pp. 160-169.
G. Piatetsky-Shapiro and C. Connell, “Accurate Estimation of the Number of Tuples Satisfying a Condition”, pp. 256-276, 1984 ACM.
S. Chaudhuri, R. Motwani and V. Narasayya, “Random Sampling for Histogram Construction: How much is enough”, 12 pages.
G.S. Manku, S. Rajagopalan and B.G. Lindsay, “Approximate Medians and other Quantiles in One Pass and with Limited Memory”, pp. 426-435, SIGMOD '98, Seattle Washington.
F.Olken, “Random Sampling from Databases”, 1993, pp. 1-158.
S. Chaudhuri and V. Narasayya, Program for TPC-D Data Generation with Skew. ftp.research.microsoft.com/pub/users/viveknar/tpcdskew.
H. Jagdish, N. Koudas and S. Muthukrishnan , “Mining Deviants in Times Series Database”, In Proceedings of 25thInternational Conference Very Large Data Bases, pp. 102-113, 1999.
E. Knorr and R. Ng, “Algorithms for Mining Distance-Based Outliers in Large Datasets”, In Proceedings of 24thInternational Conference Very Large Data Bases, pp. 392-403, 1998.
J.F. Naughton and S. Seshadri, “On Estimating the Size of Projections”, In Proceedings Third International Conference on Database Theory, pp. 499-513, 1990.
W.G. Cochran, “Sampling Techniques”, John Wiley & Sons, New York, third edition, 1977, Chapter 3, pp. 50-71.
Y. Ioannidis and V. Poosala, “Histogram Based Approximations of Set-Valued Query Answers”, In Proceedings of 25thInternational Conference Very Large Data Bases, pp. 174-185, 1999.
V. Ganti, M.L. Lee and R. Ramakrishnan, “ICICLES: Self-Tuning Samples for Approximate Query Answering”, In Proc. 26thVLDB, 2000.
Barnett et al., “Outliers in Statistical Data,” John Wiley, 3rd Edition, Table of Contents (1994).
Chatfield, “The Analysis of Time Series,” Chapman and Hall, Table of Contents (1984).
Hawkins, “Identification of Outliers,” Chapman and Hall, Table of Contents (1980).
Etzion, et al., “Temporal Databases: Research and Practice INCS 1399,” Springer Verlag, Table of Contents (1998).
Gray et al., “Transaction Processor: Concepts and Techniques,” Morgan Kaufmann, Table of Content (1993).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Sampling for aggregation queries does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Sampling for aggregation queries, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Sampling for aggregation queries will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3438277

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.