Initial ordering of tables for database queries

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06377943

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to computer database systems and more particularly to processing queries that specify a join operation.
BACKGROUND OF THE INVENTION
Relational databases store information in collections of tables, in which each table is organized into rows and columns. FIG.
5
(
a
) illustrates an exemplary database
500
containing four tables, useful in a sales application. Order table
502
records information about orders, identified by an order identifier (OID), that are submitted by customers, identified by a customer identifier (CID). For example, the first row of order table
502
indicates that customer #2 submitted order #1. Customer table
504
records information about customers, identified by a CID, including the ZIP code in which the customer is located. For example, the first row of customer table
04
indicates that customer #2 is located in ZIP code 19555. Assignment table
506
lists which sales associates, identified by a sales associate identifier (SID), are responsible for which customers, identified by the CID. In the example, the first and second rows of assignment table
506
indicate that customer #2 is assigned to sales associates #1 and #2. Finally, sales associate table
508
records information about each sales associates, identified by the SID, including the name of the associates. For example, the first two rows of sales associate table
508
indicate that “Smith” is the name for sales associate #1 and “Jones” is the name for sales associate #2.
A database user retrieves information from the tables of a relational database by entering input that is converted to queries by a database application. The database application, in turn, submits the queries to a database server. In response to receiving a query, the database server accesses the tables specified in the query to determine which information within the tables satisfies the query. The information that satisfies the queries is then retrieved by the database server and transmitted to the database application and ultimately presented to the user. Database servers are also capable of “joining” information contained in many of the tables into a single result in response to a query. For example, one query for the exemplary database is to list all the orders of customers in ZIP code 19555 for sales associate Smith.
For any given database application, the queries must conform to the rules of a particular query language. Most query languages provide users with a variety of ways to specify information to be retrieved. For example, in the Structured Query Language (SQL), the following query requests the retrieval of all the orders of customers in ZIP code 19555 for sales associate Smith:
[QUERY 1]
select OID, NAME
from Order 0, Customer C, Assignment A, Sales_Associate S
where O.CID =C.CID and C.ZIP=19555 and
C.CID=A.CID and A.SID=S.SID and
S.NAME=“Smith”;
This query performs a “join” operation on the order table
602
, customer table
604
, assignment table
606
, and sales associate table
608
. A join operation combines rows from two or more relational database objects, such as tables, views, or snapshots, that meet a specified criterion, called the “join conditions.” A join operation is performed whenever multiple tables appear in the FROM clause of query. In this example, four tables are listed. The SELECT list of the query can reference any of the columns from any of the base objects listed in the FROM clause. The join conditions are specified by the predicates in the WHERE clause. A join condition can relate to a single table, for example C.ZIP=19555, or to multiple tables, for example, O.CID=C.CID. A result of the query is depicted in FIG.
5
(
b
). Specifically, query result
510
includes a tuple of order #1 and Smith's name.
When the join operation is commutative and associative, as in this example, the tables may be joined in any order without affecting the final result. However, while not affecting the final result, the order in which the tables are joined or “join order” may have a significant effect on the performance of the database system. For example, joining the order table
502
with the customer table
504
first results in a single tuple, which helps to constrain the amount of input/output in later join operations, but joining customer table
504
with sales associate table
508
results in a Cartesian product of twelve tuples, larger than any of the tables and very expensive to process. Therefore, it is desirable to find a join order that has good performance, because an arbitrary join order, for example, the order listed in the FROM clause, may be prohibitively expensive to process. Techniques have been developed in an attempt to discover a good join order for a database query, but many of these techniques have proven to be inadequate.
For example, one technique, called a “heuristic approach,” examines the join conditions and statistical information about the tables. Various rules of thumb or “heuristics” are applied to this information to select a good join order as the desired join order in executing the query. There is much controversy about which heuristic is superior to another, and it appears that the best heuristic may vary from query to query. Thus, a heuristic that is appropriate from some queries may produce a join order that is too expensive for other kinds of queries and miss a good join order.
Another technique is called an “exhaustive search” wherein every possible join order is evaluated for a performance cost and the best cost join order is selected. The possible join orders are produced from an initial join ordering by an enumeration function that permutes the order of the tables in a predetermined, and typically recursive, sequence. A common sequence is a depth-first search sequence, wherein the tail of the initial join ordering is permuted before the head of the initial join ordering is permuted. The performance cost is typically based on the estimated amount input/output that would occur if the query was executed according to each join order. A major difficulty with finding a good join order according to an exhaustive search is apparent when there are many tables to be joined, for example, more than eight. For n tables to be joined, there are n-factorial (n!) number of possible join orders to consider. For example, with nine tables there are 9!=362,880 possible join orders.
Consequently, pruning techniques have been applied to limit the number of join order permutations that are considered. For example, with a depth-first search enumeration function, if the cost of performing the first join operation in a given generated join order is more than the cost of the best join order generated so far, then all join orders with that join operation first in the join order can be safely disregarded. Typically, however, pruning techniques do not disregard a sufficient number of join order permutations to make an exhaustive search feasible, especially for queries with a very large number of joined tables. Consequently, it is not uncommon for database systems to spend ten times as much time determining a good join order for a query than executing the query.
A “cut-off search” is a refinement of an exhaustive search. Using a “cut-off” search, the best cost estimate for executing the query (the estimated cost of executing the query using the join order that is currently considered “best”) is continually compared with a cost estimate for performing the search for the best join order. When the search cost exceeds the best cost estimate, then the search is terminated and the current best cost join order is chosen for processing the query. Consequently, the database system employing a cut-off search technique will rarely spent more time determining a good join order than executing the query.
The efficiency of a cut-off search is likely to be sensitive to the initial ordering that is the starting point of the search. The earlier

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Initial ordering of tables for database queries does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Initial ordering of tables for database queries, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Initial ordering of tables for database queries will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2894120

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.