Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-12-23
2002-03-05
Black, Thomas (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06353821
ABSTRACT:
FIELD OF THE INVENTION
The present invention generally relates to database processing systems, and more specifically to optimizing SQL database queries to optimize processing of multi-column database indexes.
BACKGROUND OF THE INVENTION
Relational database optimizers have the task of converting the “what is desired” that a user expresses in the database interface language SQL into the “how to accomplish” specifics entailed in a database access plan.
In order to more efficiently access (especially relational) databases, indexes are typically utilized. A given database may have multiple indexes. Each index contains essentially a pair consisting of a key and a link to a database row or record. The pairs are then effectively organized in sorted key order. One well known method of implementing indexes is the use of B-Trees. A database row or record can then be rapidly accessed utilizing the row or record “key” to identify the corresponding row or record. In the case of a relational database, the index “keys” are typically the column values for one or more columns for each row or record in the database. In the case of a multiple column index, the index “key” can be viewed as the corresponding column values for each row concatenated together.
Indexes are also often utilized when processing database queries. A database optimizer is typically utilized to transform a SQL database query into an optimal set of database operations. One typical optimization is to utilize indexes to a database in order to minimize the necessity of reading entire database rows. That is because the requisite column values for the rows have already been extracted from the rows during index creation or update before the query is processed. Thus, it is often very efficient to eliminate all rows or records from a query that don't match search key values in an index or indexes before any actual rows are retrieved for processing. Thus, database optimizers often group query components that reference indexed columns together. The following disclosure is primarily concerned with this situation.
A typical SQL query against a database has the form shown in the following example:
SELECT FIRST_NAME, LAST_NAME, PHONE_NUMBER FROM TELEPHONE_TABLE WHERE LAST_NAME=“GRAY”
This query returns the names and phone numbers of people listed in the TELEPHONE TABLE that have a last name of “GRAY”.
A good access plan for this simple example would be accessing the table via an index on the LAST_NAME column. An index search using the value “GRAY” is a fast method of returning the correct answers. Refer to “An Introduction to Database Systems” by C. J. Date for further background on relational databases and optimizers.
In current commercial relational database optimizers, there are deficiencies in producing the best access plan in all cases. In particular, when a multi-column index is defined, and a range of values is needed to satisfy the query, then the index may not be used in the most efficient manner.
This deficiency is the result of an “impedance mismatch” between the need for SQL to specify search conditions on column values and the methods needed to specify range conditions on multi-column indexes. This is because SQL by its design goals has to be independent of the current index definitions on the database tables (i.e. not making use of index names). For further information, refer to explanations of the ANSI SQL standard such as “SQL Instant Reference” by M. Gruber or “Understanding the New SQL: A Complete Guide” by J. Melton and A. Simon. This use of the term “impedance mismatch” is not to be confused with that of using SQL embedded in programming languages such as COBOL. When used in that context, the term refers to SQL's set oriented nature as opposed to the row at a time capability of the language/database interface and its use of cursors.
It would be advantageous to be able to efficiently translate SQL queries into a proper access plan when there is a set of conditions on the columns of a multi-column index that represent a range of values on that index. This would provide significantly faster processing of multi-column indexes by selecting only qualifying key values from the index where there is a set of conditions on the columns of a multi-column index that represent a range of values on that index.
For example, assume that a table DATE_TABLE has an index defined on the following three columns whose combined value represents a date:
YEAR, MONTH, DAY
Suppose the result of a query is to be those rows from the table that have dates in the following range:
March 15, 1994 to June 23, 1996
Assume that the table's MONTH values are encoded as 1 to 12 for January through December. The SQL required to express this range of index values can be given in positive (inclusive) form as follows:
1.
SELECT *
2.
FROM DATE_TABLE
3.
WHERE ((YEAR = 1994 AND MONTH = 3 AND DAY >= 15)
4.
OR (YEAR = 1994 AND MONTH > 3)
5.
OR (YEAR > 1994))
6.
AND ((YEAR = 1996 AND MONTH = 6 AND DAY <= 23)
7.
OR (YEAR = 1996 AND MONTH < 6)
8.
OR (YEAR < 1996))
FIG. 1
is a diagram illustrating the logical operation of that SQL statement in positive (inclusive) form. The FIG. consists of single and double headed time line segments. The double-headed line segments represent time intervals bounded at both ends, whereas the single-headed line segments represent time intervals bounded at one end, but not the other. Table T-1 contains the correspondence between the above SQL and the reference numbers in the FIG. The final result or selection (March 15, 1994 to June 23, 1996) is shown as
66
.
TABLE T-1
Ref#
Operation
53
Line 3 (L3)
54
Line 4 (L4)
55
Line 5 (L5)
56
Line 6 (L6)
57
Line 7 (L7)
58
Line 8 (L8)
62
L3 | L4 | L5
64
L6 | L7 | L8
66
(L3 | L4 | L5) & (L6 | L7 | L8)
This same multi-column index range condition can also be expressed in a negative (exclusive) form. This is often as recommended by vendors and/or database specialists (reference DB2 Design & Development Guide by G. Wiorkowski and D. Kull, section 9.3 Cursors and Repositioning, especially page 253).
The following is an example of the recommended negative form of the same conditions:
1.
SELECT*
2.
FROM DATE_TABLE
3.
WHERE YEAR >= 1994
4.
AND NOT (YEAR = 1994 AND MONTH <3)
5.
AND NOT (YEAR = 1994 AND MONTH = 3 AND DAY < 15)
6.
AND
YEAR <= 1996
7.
AND NOT (YEAR = 1996 AND MONTH > 6)
8.
AND NOT (YEAR = 1996 AND MONTH = 6 AND DAY > 23)
This is equivalent to specifying the total range of 1994 to 1996 and then removing the pieces that don't fit as specified by the remaining NOT conditions.
FIG. 2
is a diagram that illustrates this negative (exclusive) form as marked by the line numbers from the example above on which the conditions occur. The correspondence between the reference numbers in the FIG. and the line numbers in the above negative form SQL is shown in Table T-2. The final result or selection (March 15, 1994 to June 23, 1996) is shown as
86
.
TABLE T-2
Ref#
Operation
73
Line 3 (L3)
73′
Line 3 negated (L3′)
74
Line 4 (L4)
74′
Line 4 negated (L4′)
75
Line 5 (L5)
75′
Line 5 negated (L5′)
76
Line 6 (L6)
76′
Line 6 negated (L6′)
77
Line 7 (L7)
78
Line 8 (L8)
82
L7 & L8
84
82 & ({circumflex over ( )}L6′) & ({circumflex over ( )}L4′)
86
84 & ({circumflex over ( )}L5′) & ({circumflex over ( )}L3′)
One problem with most current relational database systems is that when either form of the index range condition is supplied in the SQL, only the first column of the index is used in the access plan. For the negated form, this means that only the following conditions are used in accessing the index:
YEA
Black Thomas
Bull HN Information Systems Inc.
Le Uyen
Phillips J. H.
Solakian J. S.
LandOfFree
Method and data processing system for detecting patterns in... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and data processing system for detecting patterns in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and data processing system for detecting patterns in... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2827268