Efficient groupby aggregation in tournament tree sort

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06282541

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates in general to computer-implemented database systems, and, in particular, to efficient groupby aggregation in tournament tree sort.
2. Description of Related Art
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. Relational databases are organized into tables which consist of rows and columns of data. The rows are formally called tuples or records. A database will typically have many tables and each table will typically have multiple tuples and multiple columns. The tables are typically stored on direct access storage devices (DASD), such as magnetic or optical disk drives for semi-permanent storage.
The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO). The SQL interface allows users to formulate relational operations on the tables either interactively, in batch files, or embedded in host languages, such as C and COBOL. SQL allows the user to manipulate the data.
An example query, typical of many decision support database applications, expressed in SQL is:
SELECT DEPT, AVG(SALARY) FROM EMP GROUPBY DEPT
To execute the above SELECT query, the RDBMS may sort records by DEPT and compute the SALARY averages for the group of records for each DEPT. In particular, when sorting a large number of rows, there may be many parallel sorts occurring separately. These sorts may be responding to the same or different queries. Each sort takes up system resources. In particular, each sort reads a number of records, sorts these records, and places the sorted records into a temporary data set. For a large number of records, there could be many temporary data sets, which may take up the entire memory, even before the sorting is completed.
It has been suggested that special treatment be given to aggregation in a RDBMS in A. Klug, “Access Path in the Abell Statistical Query Facility”, Proc. ACM SIGMOD, 1982, pp. 161-172, which is incorporated by reference herein. One technique, not suggested in the Klug article, but suggested in D. J. Haderle and E. J. Lynch, “Evaluation of Column Function on Grouped Data During Data Ordering”, IBM Technical Disclosure Bulletin, 10, March 1990, p385-386, which is incorporated by reference herein, is to perform the operations related to the aggregation within a sort.
One problem pertains to doing aggregation within a sort using a tournament tree. A tournament tree is a binary tree that has been a favored sort technique because it is based on a technique called replacement/selection, which generates sort runs that are typically twice the size of main memory devoted to the sort, thereby possibly reducing the merge activity. The tournament tree sort is also favored because special hardware assists are known to speed up tournament tree sorts.
In one technique for aggregation within a sort, known as “aggregation push-down into sort”, consecutive winners (e.g., for data elements, such as keys, a wilmer may be the data element having the lowest value) of tournaments are compared, and, if the winners are identical, the aggregation operations (e.g., sums and counts) are accumulated for these two keys. The technique proceeds recursively, with the key of the aggregate being the key against which the next winner is compared. A group break is detected when a non-matching key emerges as the new winner. Then, a new group begins, with the initialization of a new accumulator and count variables. This approach is referred to as “winner aggregation”.
The problem with the winner aggregation approach is that several runs of different groups could be formed, even if the number of groups is less than the number of leaf nodes in the tournament tree. Once runs are formed, each run has to be inserted into a separate, temporary data set and fetched later for merging, thus causing CPU overhead. Temporary data sets occupy main memory, and when their memory requirements exceed memory, they cause input/output (I/O) interruptions.
Therefore, there is a need in the art for an improved technique for groupby aggregation in tournament tree sort.
SUMMARY OF THE INVENTION
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus, and article of manufacture for a computer implemented groupby aggregation system.
In accordance with the present invention, groupby aggregation is performed when a statement is executed in a database stored on a data storage device, which is connected to a computer. Initially, data elements are input into a tournament tree from an input sequence. Then, a winner of a tournament in the tournament tree is identified. The winner is recycled into the tournament tree.
An object of the invention is to provide a technique for efficient groupby aggregation in a tournament tree sort. Another object of the invention is to minimize the need to write data to temporary data sets.


REFERENCES:
patent: 5287494 (1994-02-01), Garcia et al.
patent: 5619693 (1997-04-01), Troisi
patent: 5619713 (1997-04-01), Baum et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Efficient groupby aggregation in tournament tree sort does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Efficient groupby aggregation in tournament tree sort, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Efficient groupby aggregation in tournament tree sort will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2502115

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.