System and method for hash loops join of data using outer...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06253197

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to computer database systems, and more particularly to systems and methods for executing multi-table joins in response to a query to a computer database for information.
2. Description of the Related Art
When a user requests data from a database by submitting a query to the database, it is often necessary that the database management system (dbms) that is associated with the database combine, i.e., join, data from two or more tables that contain data. For example, it might be desired to know the names and salaries of employees in a company, by department. Using the database query language known as “SQL”, two tables could be created as follows:
“create table DEPT (DNO, NAME, BUDGET)”, where DNO is an integer representing the department number, NAME is a variable character alpha-numeric name of the department, and BUDGET is a floating point variable representing the department's budget; and
“create table EMP (DNO integer, NAME varchar, SALARY float)”, where DNO is an individual employee's department number, NAME is the employee's name, and SALARY is the employee's salary.
Thus, in the above example, the relevant data is arranged in a “department” table and in an “employee” table. In the “department” table, each row has the above-listed three department entries, and in the “employee” table, each row has entries corresponding to a particular employee's department number, name, and salary. To return the employee names, individual salaries, and department names requires joining at least one column from the “department” table with at least one column from the “employee” table.
In the particular example discussed above, the department number entry from the “department” table is commonly referred to as a key (it defines a department), whereas the individual employees' department numbers are referred to as foreign keys (they define the respective departments in which the employees work). The columns DEPT.DNO and EMP.DNO establish a join key, and a row in the “department” table is joined to a row in the “employee” table only if the rows match a query-defined join predicate that in turn essentially defines the join key, that is, only if DEPT.DNO=EMP.DNO for that row. The above-described join can be undertaken to associate employees by department using the following join query syntax:
select DEPT.NAME // department name
EMP.NAME // employee name
EMP.SALARY // employee salary
from DEPT, EMP // the two tables discussed above
where DEPT.DNO=EMP.DNO//such that the employee department number equals a department number in the “department” table
The result of the above join is a third table in which employee names and salaries have been associated with department names by joining a row in the department table with a corresponding row in the employee table.
A method known to those skilled in the art for effecting the above-described join is referred to as “hash loops join”. Using a hash loops join, one of the of two tables to be joined is designated a build table, and the other table is designated a probe table. Either table can be a base table, i.e., a table stored in the database, or a derived table, i.e., a table that is the result of a previous, intermediate join, or other query operations. In any case, if the entire build table can be stored in the main memory of the dbms computer, it is written into main memory. Each row is entered into a hash table on a hash of the join columns. The probe table is then hashed on its join columns and checked against the build table for matches on the join key. The hash is undertaken on the join key by mathematically operating on the value of the join columns at each of the rows of the build/probe table in accordance with a hash function to yield respective hash values.
Sometimes, however, the entire build table is too large to be written into the main memory of the dbms computer. Under these circumstances, as much of the build table that can fit into main memory is written into memory and hashed. The probe table is processed against the chunk of the build table in memory. After processing the chunk of the build table in memory, the next build table chunk is written into memory, hashed, and the probe table rows are again checked against the build table rows as summarized above, and so on until the entire build table has been processed and the probe phase completed. When the probe phase is complete, the table join has been executed in response to the query.
As recognized by the present invention, however, it is sometimes possible that, while not all rows in the probe table have a corresponding build table match and, thus, will not be output as part of the join, it might nevertheless be desirable to output the unmatched probe table rows. For example, if one or more employees in the employee table happen to be unassigned to a department, it might nevertheless be desired to output their names and salaries with an “unassigned” department name for completeness of a salary report. This operation of preserving rows of a table that do not join to or match rows in the other table is known as a left (or right) outer join. Hash loops join as described above does not support left (or right) outer join, particularly when the build table does not fit in memory in its entirety.
Moreover, the present invention recognizes that for some queries, it is not necessary to output a complete list of all matching table rows. Consequently, processing and outputting all matches under such circumstances unnecessarily prolongs query processing. As an example, suppose it is desired to know just the names of departments that have one or more employees. In this case, the name of each department having one or more employees need be joined and output only once. This operation, known as early-out join, is usefull in many circumstances to improve the efficiency of query processing, see U.S. Pat. No. 5,548,754, incorporated herein by reference. Unfortunately, conventional hash loops join does not support early-out query processing when the build table is too large for memory. The present invention, however, understands that it is possible to use hash loops join in combination with an early-out join and/or a left (or right) outer join.
SUMMARY OF THE INVENTION
A computer-implemented method is disclosed for integrating at least one of: an early-out table join defining a “match once” table, and a left or right outer table join defining a “preserved” table, in a hash loops procedure when the size of a build table of the hash loops procedure exceeds the main memory size. In accordance with the present invention, a probe table of the hash loops procedure is established by the “match once” table or the “preserved” table. The method includes reversing the roles of the build table and probe table when the size of the build table exceeds the main memory size, and flagging rows of the probe table that match rows of the build table. In a left or right outer join in which the probe table is established by the “preserved” table, rows of the probe table that are not flagged after the probe phase are joined to one or more fields having NULL values and output. In contrast, in the early-out join in which the probe table is established by the “match once” table, rows of the probe table that are joined to respective rows of the build table are flagged “ignore” on a first match, such that rows already flagged “ignore” are not joined to any more build table rows.
In another aspect, the invention is a general purpose computer system programmed according to the inventive steps herein to efficiently join a build table to a probe table in response to a user query for data using a hash loops left or right outer join and/or a hash loops early-out join. A left or right outer join involves outputting all rows of a table referred to as a “preserved” table, and in the present invention the “preserved” table always establishes the probe table of a hash loops join. Similarly, an early-out join

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for hash loops join of data using outer... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for hash loops join of data using outer..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for hash loops join of data using outer... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2534010

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.