Estimation of column cardinality in a partitioned relational...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06732110

ABSTRACT:

FIELD OF THE INVENTION
The present invention is directed to an improvement in computing systems and in particular to an improved system for the estimation of column cardinality in a partitioned relational database.
BACKGROUND OF THE INVENTION
In relational database management systems (RDBMS) it is common to utilize query optimizers to improve the efficiency of processing of queries to be carried out on a relational database. One of the most commonly required statistics for such query optimisation is the column cardinality for a table in a relational database. Column cardinality is the number of distinct values contained in a column in the table of the database. In a serial database, column cardinality may be calculated relatively easily if an index is defined on the column being considered. Where there is no index on a column for which the cardinality is to be calculated, a sort of the values in the column is required to provide an exact measurement. Such sorting is an expensive operation and is not desirable in a database system. Because query optimisers do not require exact statistics to function effectively, a good approximation of column cardinality is sufficient in most cases to obtain a good query plan.
There are a number of techniques known in the prior art to obtain approximations for column cardinality without requiring the sorting of column values in a database table. Examples of such prior art techniques include sample counting, linear counting, and logarithmic counting. These techniques are described in Morton M. Astrahan, Mario Schkolnick, and Kyu-Young Whang, “Counting Unique Values of an Attribute Without Sorting,” Information Systems 12, 1(1987).
In a partitioned RDBMS, such as a share-nothing parallel database management system, tables may be partitioned across several nodes which do not share data. In such an environment it is potentially difficult to calculate column cardinality. The same value may occur in multiple nodes and therefore it is not possible to simply sum the column cardinality values for each node to obtain a table's overall column cardinality value for the different nodes in the parallel database. One approach is used in the DB2 universal database (UDB) (trade-mark) in the parallel database environment. This approach relies on statistics for column cardinality being calculated on a single node. The node used will be treated as being representative of the data in the column across the different nodes in the partitioned database. In fact, the node may or may not be representative of the data as a whole. As a query is optimised, the overall column cardinality (across all nodes) is estimated using a known probabilistic formula. The column cardinality for the representative node, the number of table rows in that node, and the number of nodes across which the table is partitioned are used to estimate the overall column cardinality. There is overhead involved in such an approach, and the approach is also limited where the node used to represent the data as a whole is in some way atypical of the data value distribution. As a result the estimated overall column cardinality using this approach may vary considerably from the actual value.
It is therefore desirable to have a technique for estimating the cardinality of a column in a partitioned relational database table which is efficient and which provides a reliable estimate of the column cardinality across all nodes in which the table data is stored.
SUMMARY OF THE INVENTION
A system, method and computer readable medium for estimating a column cardinality value for a column in a partitioned table stored in a plurality of nodes in a relational database is disclosed. According to one embodiment of the present invention, a plurality of column values for the partitioned table stored in each node are hashed, and a hash data set for each node is generated. Each of the hash data sets from each node is transferred to a coordinator node designated from the plurality of nodes. The hash data sets are merged into a merged data set, and an estimated column cardinality value for the table is calculated from the merged data set.
Advantages of the invention include an efficient technique for providing a reliable estimate of column cardinality in a partitioned relational database.


REFERENCES:
patent: 5469568 (1995-11-01), Schiefer et al.
patent: 5542073 (1996-07-01), Schiefer et al.
patent: 5761653 (1998-06-01), Schiefer et al.
patent: 5765146 (1998-06-01), Wolf et al.
patent: 5797000 (1998-08-01), Bhattacharya et al.
patent: 5802521 (1998-09-01), Ziauddin et al.
patent: 5899986 (1999-05-01), Ziauddin
patent: 5918225 (1999-06-01), White et al.
patent: 6029163 (2000-02-01), Ziauddin
patent: 6226629 (2001-05-01), Cossock
patent: 6405198 (2002-06-01), Bitar et al.
patent: 6421687 (2002-07-01), Klostermann
patent: 6477523 (2002-11-01), Chiang
Deen et al “Multi-join on parallel processors”, IEEE 1990, pp. 92-102.*
Poosala et al “Improved histograms for selectivity estimation of range predictes”, ACM 1996, pp. 294-305.*
Astrahan, Morton M., “Approximating the Number of Unique Values of an Attribute Without Sorting,” Inform. Systems, 1987, vol. 12, No. 1, pp. 11-15.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Estimation of column cardinality in a partitioned relational... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Estimation of column cardinality in a partitioned relational..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Estimation of column cardinality in a partitioned relational... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3252223

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.