Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-07-31
2001-02-06
Black, Thomas G. (Department: 2771)
Data processing: database and file management or data structures
Database design
Data structure types
C002S002120, C002S005000
Reexamination Certificate
active
06185557
ABSTRACT:
FIELD OF THE INVENTION
The present invention is related to database systems and in particular to equijoin operations in relational database systems.
BACKGROUND OF THE INVENTION
In a relational database, records are stored as rows (tuples) within tables (relations), with each data field in a record represented as a column in the table. The table data is commonly stored in a balanced tree structure (B-tree), and can be associated with other B-tree structures used to sort the data on selected columns of the row.
Primary key B-trees, which are also called “base tables,” are B-trees sorted according to a “primary key” column. For any given table of data, one or more columns of the row is selected as the primary key. The primary key must contain data unique to that row. For example, in a row defined to contain a person's name, social security number, and data of birth, the social security number could be selected as the primary key since this will contain unique data for each row. In a primary key B-tree structure, the top (root) and intermediate nodes of the B-tree are index pages which contain index records that determine how the tree is to be traversed to find the requested data row(s). Each index record contains a primary key value and a page number. The page number indicates the page in the next level of the B-tree that contains records with a primary key value which is less than or equal to the primary key value in the current index record. The bottom (leaf) nodes, or “data pages,” contain the data rows. In some B-tree data structures, the data pages are linked together in primary key order so that the data structure can be traversed by following the links from one page to the next.
The user may also decide that it is valuable to create a B-tree on a non-unique “secondary index” column of the table data. The resulting B-tree structure is called a “secondary index B-tree”. Although only a single primary key B-tree exists for a given table, multiple secondary index B-trees may be created as the user deems necessary. Like a primary key B-tree, a secondary index B-tree contains root and intermediate nodes that indicate how the secondary index B-tree is to be traversed to find the requested secondary index value. In a common implementation of a secondary index B-tree, the secondary index B-tree does not contain data rows at the leaf nodes. Instead, the leaf nodes contain records which correlate secondary index values to records in the primary key B-tree. The records of the secondary index B-tree records store the primary key values of one or more rows that have the designated secondary key values. The primary key B-tree is searched using these primary key values to retrieve the associated data rows.
A join is a query operation that selects data from one or more tables using a single “select” operation that specifies values for certain column or columns. The specified values are often referred to as the “join attributes” and a column containing a join attribute is a “join column.” The rows of data which result from the join operation are created by forming the Cartesian product of the specified tables and eliminating rows that do not meet the specified join selection criteria. For example, assume three tables exist, each containing different information about various people. A join operation using a common join column “name” could be performed on the three tables to combine all information about one person from the three tables into a single row when the person's name is found in all three tables. Joins can also be performed using multiple join columns. For example, a first table could be joined to a second table using a first join column, and a third table could be joined to the second table using a second join column.
A join operation to retrieve rows from multiple tables is the most frequently used operation in a relational database system. It is also the most expensive in terms of computer processing and file input/output (I/O) time, especially when large tables are involved. Join functions are most efficient when the join column is a key for a primary or secondary index B-tree. When all of the tables in the join are not keyed on the join column, the join operation requires full scans of the tables in order to find rows that satisfy the selection criteria, a process which requires a large number of central processor cycles and file I/O's. Additionally, depending on the exact nature of the join operations, full table scans may have to be performed multiple times for a single query.
Join operations fall into two basic categories: an “equijoin” operation in which the selection criteria contains an equality condition, such as requiring a match on one or more join columns common to two tables, as opposed to other join operations specify either less-than or greater-than conditions. In either case, the algorithm that performs the join operation must compare each row in one table with each row in every other table designated in the join operation. In a computer system with insufficient memory to hold all the tables in memory, a join operation becomes quite costly in terms of file I/O time. Equijoin operations are of particular focus for reducing processing costs as they are the most prevalent type of query issued against a relational database in a transactional environment. In addition, equijoins involving only two tables are often employed as the starting point for more complex join operations. Therefore, increasing the speed of equijoins on two tables can dramatically improve the performance of relational database retrievals.
One method used to join rows of two tables is the “sort merge” join. A common sort merge join scans both tables in the order of the join columns. If there are no efficient indices on the join columns to provide the ordering, a sort on either table or both is required. When a row of one of the tables (the “first” table) matches a row in the other table (the “second” table) on the join columns, the sort merge join merges the two rows and returns the combined result. Then the sort merge join reads another row in the second table that might match the same row of the first table. It continues reading rows in the second table as long as there is a match. When there is no longer a match, another row from the first table is read. If the next row in the first table has the same value in the join column as the previous row in the first table, sort merge scans the rows in the second table again and returns the group of rows that matched the previous row in the first table.
If the next row in the first table has a different value in the join column than the previous row, the sort merge join searches the second table until it finds either (a) an row that matches the row in the first table, or (b) an row that has a join column value greater than the join column value of the row in the first table. If (a) occurs, the duplicates matching process described above is performed. If (b) occurs, the sort merge join discards the row in the first table and begins the searching process again for the row that is next in sequence in the first table.
Thus, the sort merge join must scan a group of duplicate rows as many times are there are matching rows in the outer table. Furthermore, because B-tree index pages only indicate the highest and lowest join column values which potentially could occur on a data page, additional searching of a data page is required to determine if a particular join column value does actually exist on the page. Additional scanning and searching means additional file input/output (I/O) operations, which lengthens the processing time for database retrievals.
Therefore, there is a need for an equijoin process that reduces the number of file I/O operations in order to increase the performance of database queries.
SUMMARY OF THE INVENTION
A merge join process combines rows from two tables when one table is indexed on a data column that is common to both tables. The table that is indexed is designated an “inner” table, the other table is designated an “ou
Black Thomas G.
Johnson Charles A.
Jung David
Starr Mark T.
Unisys Corporation
LandOfFree
Merge join process does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Merge join process, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Merge join process will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2574823