Optimizing an aggregate join query

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06732096

ABSTRACT:

BACKGROUND
Structured Query Language (SQL) is a standardized language for accessing and updating relational databases. SQL provides a mechanism for creating and maintaining tables, each of which contains rows and columns of information that are collectively assembled into a database. Ideally, the tables are “normalized” in that the structure of the tables avoids data redundancy and allows the resulting data model to be mapped to many different physical database designs. In order to avoid redundancy, yet still be able to display data from multiple tables, SQL provides a mechanism called a “join.”
In a join, rows from one table are combined with rows from another table. The rows selected to be combined are determined by a “join condition,” which causes a comparison of one column from each of the tables being joined. In a massively parallel processing system in which the rows of each of the tables are distributed among a large number of data storage facilities, the database system “redistributes” one of the tables among the data storage facilities prior to performing the join. The redistribution involves retrieving the table from the data storage facilities and then distributing the rows of the table on the data storage facilities using one side of the join condition as the basis for the distribution. If the other table in the join is distributed using the other side of the join condition as the basis for its distribution, the join for each row will be performed using a single data storage facility.
In one particular type of join query, called an “aggregate join query,” data in a column of one of the tables being joined is aggregated (i.e., summed, averaged, counted, etc.). Performing such a query normally requires all the joins to be performed in order for the aggregation processing to be done on the join results.
SUMMARY
In general, in one aspect, the invention features a method for optimizing aggregate join queries to a database. The query specifies one or more data-to-be-aggregated columns (DAC), one or more explicit aggregation keys (EAK) and one or more join conditions. The query involves table T
1
. T
1
includes the one or more data-to-be-aggregated columns and an aggregation key. All join conditions that involve T
1
join the aggregation key of T
1
to the unique primary index of another table that is involved in the query. T
1
is joined to a table T
2
in accordance with a join condition of the aggregate join query. The joining process includes sorting T
1
on a sort key that is the aggregation key. Aggregation processing is combined into the step of sorting if T
1
is a significant table. Aggregation processing is also combined into the step of sorting if T
2
is a significant table and the first join condition is between the aggregation key of T
1
and the unique primary index of T
2
.
Implementations of the invention may include one or more of the following. T
1
can be an intermediate join result IJR. T
1
can also be a database table specified in the query. The aggregation key of T
1
can be the EAK of a significant table. The aggregation key of T
1
can also be a new aggregation key derived from a significant table. Determining if a table is a significant table can include checking if only one table includes the one or more EAKs. If so, then that table is a significant table. Determining if a table is a significant table can include checking if the set of tables TS that includes all tables that contain at least one of the one or more EAKs, has a member table for which the query includes a join condition between that member table and the unique primary index of every other table in TS. If so, then that member table is a significant table.
In general, in another aspect, the invention features a database system in which an aggregate join query can be optimized. The query specifies one or more data-to-be-aggregated columns (DAC), one or more explicit aggregation keys (EAK) and one or more join conditions. The query involves table T
1
. T
1
includes the one or more data-to-be-aggregated columns and an aggregation key. All join conditions that involve T
1
join the aggregation key of T
1
to the unique primary index of another table that is involved in the query. The database system includes a massively parallel processing system. That massively parallel processing system includes one or more nodes and a plurality of CPUs, with each of the one or more nodes providing access to the one or more CPUs. Also included are a plurality of processes each of the one or more CPUs providing access to one or more virtual processes with each process configured to manage data stored in one of a plurality of data-storage facilities. The rows of each of the tables specified in the query are distributed among the plurality of data-storage facilities based on respective primary indices for each table. The system also includes a database-management component configured to execute the query in a manner where T
1
is joined to a table T
2
in accordance with a join condition of the aggregate join query. The joining process includes sorting T
1
on a sort key that is the aggregation key. Aggregation processing is combined into the step of sorting if T
1
is a significant table. Aggregation processing is also combined into the step of sorting if T
2
is a significant table and the first join condition is between the aggregation key of T
1
and the unique primary index of T
2
.
In general, in another aspect, the invention features a computer program, stored on a tangible storage medium, for use in optimizing aggregate join queries to a database. The query specifies one or more data-to-be-aggregated columns (DAC), one or more explicit aggregation keys (EAK) and one or more join conditions. The query involves table T
1
. T
1
includes the one or more data-to-be-aggregated columns and an aggregation key. All join conditions that involve T
1
join the aggregation key of T
1
to the unique primary index of another table that is involved in the query. The computer program includes executable instructions that cause the computer to join T
1
and a table T
2
in accordance with a join condition of the aggregate join query. The joining process includes sorting T
1
on a sort key that is the aggregation key. The instructions cause the computer to combine aggregation processing into the step of sorting if T
1
is a significant table. The instructions also cause the computer to combine aggregation processing into the step of sorting if T
2
is a significant table and the first join condition is between the aggregation key of T
1
and the unique primary index of T
2
.


REFERENCES:
patent: 6496819 (2002-12-01), Bello et al.
D. Tanjar et al., Aggregate-Join Query Processing in Parallel Database Systems, Feb., 2000, IEEE, pp. 824-829.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Optimizing an aggregate join query does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Optimizing an aggregate join query, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Optimizing an aggregate join query will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3189695

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.