Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2007-03-13
2007-03-13
Alam, Shahid (Department: 2162)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C704S200000, C714S026000
Reexamination Certificate
active
10873569
ABSTRACT:
Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.
REFERENCES:
patent: 5263120 (1993-11-01), Bickel
patent: 5483636 (1996-01-01), Saxena
patent: 5606164 (1997-02-01), Price et al.
patent: 5799301 (1998-08-01), Castelli et al.
patent: 5809491 (1998-09-01), Kayalioglu et al.
patent: 5909678 (1999-06-01), Bergman et al.
patent: 5946692 (1999-08-01), Faloutsos et al.
patent: 6094651 (2000-07-01), Agrawal et al.
patent: 6185512 (2001-02-01), Lambrecht
patent: 2002/0123979 (2002-09-01), Chaudhuri et al.
patent: 2002/0127529 (2002-09-01), Cassuto et al.
S. Chaudhuri and V. Narasayya, Program for TPC-D Data Generation with Skew. ftp.research.microsoft.com/pub/users/viveknar.tpcdskew.
H. Jagdish, N. Koudas and S. Muthukrishnan , “Mining Deviants in Times Series Database”, In Proceedings of 25thInternational Conference Very Large Data Bases, pp. 102-113, 1999.
E. Knorr and R. Ng, “Algorithms for Mining Distance-Based Outliers in Large Datasets”, In Proceedings of 24thInternational Conference Very Large Data Bases, pp. 392-403, 1998.
J.F. Naughton and S. Seshadri, “On Estimating the Size of Projections”, In Proceedings Third International Conference on Database Theory, pp. 499-513, 1990.
W.G. Cochran, “Sampling Techniques”, John Wiley & Sons, New York, third edition, 1977, Chapter 3, pp. 50-71.
Y. loannidis and V. Poosala, “Histogram Based Approximations of Set-Valued Query Answers”, In Proceedings of 25thInternational Conference Very Large Data Bases, pp. 174-185, 1999.
V. Ganti, M.L. Lee and R. Ramakrishnan, “ICICLES: Self-Tuning Samples for Approximate Query Answering”, In Proc. 26thVLDB, 2000.
S. Acharya, P.B. Gibbons, V. Poosala, “Congressional Samples for Approximate Answering of Group-By Queries”, ACM SIGMOD 2000, May 2000, Dallas, Texas.
S. Acharya, P.B. Gibbons, V. Poosala and S. Ramaswamy, “Join Synopses for Approximate Query Answering”, SIGMOD 1999, Philadelphia, PA.
P.B. Gibbons and Y. Matias, “New Sampling-Based Summary Statistics for Improving Approximate Query Answers”, SIGMOD 1998, Seattle, WA.
P..J. Haas and J.M. Hellerstein, “Ripple Joins for Online Aggregation”, SIGMOD 1999, Philadelphia, PA.
S. Chaudhuri, R.Motwani and V. Narasayya, “On Random Sampling Over Joins”, SIGMOD 1999, Philadelphia, PA.
G.S. Manku, S. Rajagopalan, B.G. Lindsay, “Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets”, SIGMOD 1999, Philadelphia, PA.
S. Ganguly, P.B. Gibbons, Y. Matias and A. Silberschatz, “Bifocal Sampling for Skew-Resistant Join Size Estimation”, pp. 271-281, SIGMOD '96, Jun. 1996, Montreal Canada.
G.K. Ziph, PhD, “Human Behavior and The Principle of Least Effort”, 1949, Addison-Wesley Press, Inc.
P.J. Hass, J.F. Naughton and A.N. Swami, “On the Relative Cost of Sampling for Join Selectivity Estimation”, pp. 14-24, SIGMOD/PODS 94-May 1994, Minneapolis, Minnesota USA.
J.M. Hellerstein, P.J. Haas and H.J. Wang, “Online Aggregation”, pp. 171-182, SIGMOD '97 AZ, USA.
R.J. Lipton, Jeffrey F. Naughton, D.A. Schneider and S. Seshadri, “Efficient Sampling Strategies for Relational Database Operations”, Theoretical Computer Science 116 (1993) 195-226.
Wen-Chi Hou, G. Ozaoyoglu and E. Dogdu, “Error-Constrained COUNT Query Evaluation in Relational Databases”, pp. 278-287, May 29-31, 1991, Denver, Colorado, Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data.
F. Olken and D. Rotem, “Random Sampling from Databases—A Survey”, Abstract and pp. 1-55, Mar. 22, 1994.
R. Motwani and P. Raghavan, “Randomized Algorithms”, 1995.
F. Olken, “Simple Random Sampling from Relational Databases”, Proceedings from Twelfth International Conference on Very Large Databases, Kyoto, Aug. 1986, p. 160-169.
G. Piatetsky-Shapiro and C. Connell, “Accurate Estimation of the Number of Tuples Satisfying a Condition”, pp. 256-276, 1984 ACM.
S. Chauhuri, R. Motwani and V. Narasayya, “Random Sampling for Histogram Construction: How much is enough”, 12 pages.
G.S. Manku, S. Rajagopalan and B.G. Lindsay, “Approximate Medians and other Quantiles in One Pass and with Limited”, pp. 426-435, SIGMOD '98, Seattle Washington.
F.Olken, “Random Sampling from Databases”, 1993, pp. 1-158.
Barnett et al., Table of Contents, “Outliers in Statistical Data,” John Wiley, 3rd Edition (1994).
Chatfield, Table of Contents, “The Analysis of Time Series,” Chapman and Hall (1984).
Hawkins, Table of Contents, “Identification of Outliers,” Chapman and Hall (1980).
Etzio et al., Table of Contents, “Temporal Databases: Research and Practice LNCS 1399,” Springer Verlag (1988).
Gray et al., Table of Contents, “Transaction Processor: Concepts and Techniques,” Morgan Kaufmann (1993).
Chaudhuri Sarajit
Datar Mayur D.
Motwani Rajeev
Narasayya Vivek R.
Alam Shahid
Microsoft Corporation
LandOfFree
Database aggregation query result estimator does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Database aggregation query result estimator, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Database aggregation query result estimator will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3805390