Method for estimating the number of distinct values in a...

Data processing: database and file management or data structures – Database and file access – Query optimization

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S600000, C707S698000, C707S719000, C707S747000

Reexamination Certificate

active

07987177

ABSTRACT:
The task of estimating the number of distinct values (DVs) in a large dataset arises in a wide variety of settings in computer science and elsewhere. The present invention provides synopses for DV estimation in the setting of a partitioned dataset, as well as corresponding DV estimators that exploit these synopses. Whenever an output compound data partition is created via a multiset operation on a pair of (possibly compound) input partitions, the synopsis for the output partition can be obtained by combining the synopses of the input partitions. If the input partitions are compound partitions, it is not necessary to access the synopses for all the base partitions that were used to construct the input partitions. Superior (in certain cases near-optimal) accuracy in DV estimates is maintained, especially when the synopsis size is small. The synopses can be created in parallel, and can also handle deletions of individual partition elements.

REFERENCES:
patent: 5530883 (1996-06-01), Baum et al.
patent: 5542089 (1996-07-01), Lindsay et al.
patent: 5727197 (1998-03-01), Burgess et al.
patent: 5802521 (1998-09-01), Ziauddin et al.
patent: 5832475 (1998-11-01), Agrawal et al.
patent: 5950185 (1999-09-01), Alon et al.
patent: 5999928 (1999-12-01), Yan
patent: 6061676 (2000-05-01), Srivastava et al.
patent: 6226629 (2001-05-01), Cossock
patent: 6732110 (2004-05-01), Rjaibi et al.
patent: 6738762 (2004-05-01), Chen et al.
patent: 6865567 (2005-03-01), Oommen et al.
patent: 7047230 (2006-05-01), Gibbons
patent: 7124146 (2006-10-01), Rjaibi et al.
patent: 2002/0083033 (2002-06-01), Abdo et al.
patent: 2002/0198867 (2002-12-01), Lohman et al.
patent: 2003/0208488 (2003-11-01), Perrizo
patent: 2004/0049492 (2004-03-01), Gibbons
patent: 2004/0059743 (2004-03-01), Burger
patent: 2004/0133567 (2004-07-01), Witkowski et al.
patent: 2005/0097072 (2005-05-01), Brown et al.
patent: 2005/0147240 (2005-07-01), Agrawal et al.
patent: 2005/0147246 (2005-07-01), Agrawal et al.
patent: 2006/0047683 (2006-03-01), Lakshminarayan et al.
patent: 2006/0218123 (2006-09-01), Chowdhuri et al.
patent: 2008/0120274 (2008-05-01), Cruanes et al.
patent: 2010/0010989 (2010-01-01), Li et al.
patent: WO 2007/134407 (2007-11-01), None
patent: WO 2010/104902 (2010-09-01), None
Phillip B. Gibbons, Distinct Sampling for highly-Accurate answers to Distinct values queries and event reports, proceedings of the 27th VLDB, 2001, 10 pages.
Kevin Beyer1 et al. “On Synopses for DistinctValue Estimation Under Multiset Operations”,SIGMOD'07, Jun. 12-14, 2007,, pp. 199-.
Neoklis Polyzotis, “SelectivityBased Partitioning: A DivideandUnion Paradigm for Effective Query Optimization”, CIKM'05, Oct. 31-Nov. 5, 2005.
Abdelkader Hameurlain et al. “CPU and incremental memory allocation in dynamic parallelization of SQL queries”,Parallel Computing 28 (2002) 525-556.
Damianos Chatziantoniou et al. “Partitioned optimization of complex queries”,Information Systems 32 (2007) 248-282.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for estimating the number of distinct values in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for estimating the number of distinct values in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for estimating the number of distinct values in a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2703513

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.