Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-07-05
2002-12-10
Mizrahi, Diane D. (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06493701
ABSTRACT:
COPYRIGHT NOTICE
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to information processing environments and, more particularly, to join operations in a data processing system, such as a Database Management System (DBMS).
2. Description of the Background Art
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as “records” having “fields” of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of a database management system is known in the art. See e.g., Date, C., “An Introduction to Database Systems,” Volume I and II, Addison Wesley, 1990; the disclosure of which is hereby incorporated by reference.
DBMS systems have long since moved from a centralized mainframe environment to a de-centralized or distributed environment. One or more PC “client” systems, for instance, may be connected via a network to one or more server-based database systems (SQL database server). Commercial examples of these “client/server” systems include Powersoft® clients connected to one or more Sybase( Adaptive Server® database servers. Both Powersoft® and Sybase® Adaptive Server® (formerly Sybase® SQL Server®) are available from Sybase, Inc. of Emeryville, Calif.
“Join” is a common operation in an RDBMS. Nested Loop Join (NLJ), sort merge join, and hash join are the three well-known join methods. Optimization and execution of queries involving joins have been extensively discussed in the literature. See, e.g., Selinger, Patricia G., et. al., “Access Path Selection in a Relational Database Management System,” ACM SIGMOD Conference, pp. 23 -34, 1979 which deals with finding optimal join orders and join methods to use. See, e.g., Shapiro, Leonard D., “Join Processing in Database Systems with Large Main Memories,” TODS 11(3), pp. 239 -264, 1986 and Graefe, Goetz, et. al., “Hash Joins and Hash Teams in Microsoft SQL Server,” VLDB, pp. 86 -97, 1998 which deal with merge joins and hash joins. See, e.g., Chen, Ming-Syan, et. al., “On Applying Hash Filters to Improving the Execution of Multi-Join Queries,” VLDB Journal 6(2), pp. 121 -131, 1997 and Roussopoulos, Nick and Kang, Hyunchul, “Pipeline N-way Join Algorithm Based on the 2 -way Semijoin” which deal with use of semijoin based approaches to process multi-join queries efficiently. See, e.g., O'Neil, Patrick E. and Graefe, Goetz, “Multi-Table Joins Through Bitmapped Join Indices,” SIGMOD Record 24(3), pp. 8 -11, 1995 which deals with using bitmapped join indices to process multi-table joins more efficiently. See, e.g., Dewitt, David J., et. al., “Nested Loops Revisited,” PDIS, pp. 230 -242, 1993 which deals with parallelization of joins. The disclosures of the foregoing are hereby incorporated by reference.
What is needed is a technique that can be used to process some n-ary NLJs more efficiently, for n>2, as queries with multiple joins are common in decision support and OLAP. The present invention fulfills this and other needs.
SUMMARY OF THE INVENTION
A database system implementing a methodology or technique that can be used to optimize processing of nested loop joins of three or more tables (n-ary NLJs for n>2) more efficiently is described. The methodology is straightforward to implement and has been prototyped in a commercial RDBMS.
Operation of the methodology may be summarized as follows. First, a query is received (e.g., SQL-based query from a client database application) specifying a join of three or more tables, where at least one join condition exists between an inner table (in the join order) and an outer table that is not the immediately or directly preceding table (in the join order). The join order itself specifies the particular sequence or order that tables (or index) are accessed for retrieving rows (for examination) during query execution.
Query execution proceeds as follows. A loop is established to retrieve rows from successive tables (per the join order). The method determines whether a condition is being tested (i.e., value being compared) that refers back to a more-outer table that is not a directly preceding table. Consider the following example. query:
select [. . . ] from T
1
, T
2
, T
3
where T
1
.C
1
=T
2
.C
2
and T
2
.C
2
=T
3
.C
3
and T
1
.C
5
=T
3
.C
4
In the example, the join condition of T
1
.C
5
=T
3
.C
4
requires a current row under examination from the third table in the join order (i.e., Table T
3
) to match the join condition (equality, in that example) specified for the first table (i.e., Table T
1
, which is not an immediately preceding table). If that condition is not met, then query execution (method) proceeds to fetch the next row (if any) from that outer table (whose just-tested condition failed).
Otherwise (i.e., the just-tested condition succeeds), the method continues down the join order to examine any remaining/subsequent tables in the join order (if any), applying any subsequent query conditions (if any) that must be met in order to qualify for the query. In the instance that a set of rows under examination meets the query condition(s), those rows are qualified (as having met the query). If any further rows/tables remain to be .examined, the method loops back to examine those rows/tables.
In implementation, upon encountering a failure condition (i.e., a given join condition does not hold true) from a join operator (scan child), context information (about the failure) is returned (to the n-ary nested-loop join operator) for indicating exactly which condition (i.e., join condition) failed. This information is tracked in a scan descriptor, which includes a “fail sarg” data field indicating exactly which particular search argument (“sarg” ) failed. Based on this information, the system (operating through the n-ary nested-loop join operator) knows exactly which scan child to return back to (i.e., how far back to go in the join order to fetch the next row). In this manner, the methodology optimizes processing of n-ary nested loop joins by eliminating comparisons that will not hold true for the corresponding join condition (for which the comparisons were to be tested).
REFERENCES:
patent: 5590324 (1996-12-01), Leung et al.
patent: 5600831 (1997-02-01), Levy et al.
patent: 5991754 (1999-11-01), Raitto et al.
patent: 6134546 (2000-10-01), Bestgen et al.
patent: 6374235 (2002-04-01), Chen et al.
patent: 6397204 (2002-05-01), Liu et al.
Graefe, Goetz et al., Hash Joins and Hash Teams in Microsoft SQL Server, VLDB 1998, pp. 86-97, 1998.
Chen, Ming-Syan et al., On Applying Hash Filters to Improving the Exec
Mizrahi Diane D.
Smart John A.
Sybase Inc.
Wu Yicun
LandOfFree
Database system with methodogy providing faster N-ary nested... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Database system with methodogy providing faster N-ary nested..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Database system with methodogy providing faster N-ary nested... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2958247