Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-05-08
2003-12-09
Choules, Jack M. (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06662175
ABSTRACT:
BACKGROUND
Query optimization is important in relational database systems that deal with complex queries against large volumes of data. Unlike earlier navigational databases, a query on a relational database specifies what data is to be retrieved from the database but not how to retrieve it. Optimizing a query against a relational database is not as important in transaction-oriented databases where only a few rows are accessed either because the query is well specified by virtue of the application or because the query causes the data to be accessed using a highly selective index. In decision support and data mining applications, where the space of possible solutions is large and the penalty for selecting a bad query is high, optimizing a query to reduce overall resource utilization can provide orders of magnitude of overall performance improvement.
One existing query optimization technique is to rewrite the user-specified query. The query is transformed into a logically equivalent query that costs less, i.e. requires less time, to execute. The existing techniques for query transformation include syntactic and semantic techniques. Syntactic or algebraic transformations use the properties of the query operators and their mapping to rewrite the query. Some forms of magic set transformation, most forms of predicate push down, and transitive closures are techniques that fall under this category. Semantic query transformations use declarative structural constraints and the semantics of an application's specific knowledge, declared as part of the database, to rewrite the query. Semantic query transformation based rewrites are called semantic query optimization or SQO.
The basic purpose of a query rewrite is to reduce the number of rows processed during the query. Existing techniques for query rewrite are focused primarily on structural constraints of the database and knowledge of the domain of the database. For example, structural constraint based semantic optimizations use functional dependencies, key dependencies, value constraints, and referential constraints that are defined on relations in the database. Other existing query optimizers use constraints called implication integrity constraints and subset integrity constrains.
SUMMARY
In general, in one aspect, the invention features a method for optimizing queries to a database. The database includes a first table (T
1
) having a primary key (PK) column and a first correlated value column (CV
1
) and a second table (T
2
) having a foreign key (FK) column related to the primary key column of the first table and a second correlated value column (CV
2
). The method includes joining T
1
to the T
2
using PK=FK as the join condition to produce a join result having rows. Each row includes a value from CV
1
and a value from CV
2
. The method further includes creating an initial running constraint (RC). The initial running constraint includes a null range. The method further includes producing a derived constraint rule (DCR) having the following form:
(
PK=FK
)→
CV
2
+C
1
≦CV
1
≦CV
2
+C
2
where C
1
and C
2
are constants, and “→” means “implies,” by performing the following processing for each row in the join result: computing a new constraint (NEW), having a range, and modifying RC by merging the range of NEW with the range of RC.
Implementations of the invention may include one or more of the following. The join result produced by joining may include a virtual join result. Creating the initial RC may include applying the following equation:
CV
2
+C
1
≦CV
1
≦CV
2
+C
2
C
1
=C
2
.
Computing a new constraint may include solving for C
1
and C
2
in the following equation:
CV
2
+C
1
≦CV
1
≦CV
2
+C
2
;
where CV
1
and CV
1
are from the same row in the join result. The range of RC may be specified by the following equation:
CV
2
+C
1
≦CV
1
≦CV
2
+C
2
and modifying RC may include adjusting C
1
and C
2
so that the range of the modified RC covers the range of the unmodified RC and the range of NEW.
The method may further include determining the usefulness of the DCR. Determining the usefulness of the DCR may include comparing the range of the DCR to the range of one of CV
1
or CV
2
. Determining the usefulness of the DCR may include computing usefulness using the following equation:
usefulness
=
C
1
-
C
2
SIZE
where SIZE is the range of one of CV
1
or CV
2
. The method may further include discarding the DCR if its usefulness is greater than a threshold.
The method may further include maintaining the DCR in view of changes to T
1
or T
2
. Maintaining may include doing nothing if a row is inserted in T
1
. Maintaining may include doing nothing if a column other than PK, FK, CV
1
, or CV
2
is updated in T
1
or T
2
. Maintaining may include doing nothing if a row is deleted from either T
1
or T
2
. Maintaining may include recomputing the DCR after a predetermined number of rows have been deleted from T
1
or T
2
. Maintaining may include joining a new row added to T
2
with T
1
, finding the constraint associated with the new row, and merging the new constraint with the DCR to form a new DCR.
The method may include storing the DCR and applying the right-hand side of the DCR to a query if the left hand side of the DCR is present in a conjunction in the query and at least one of CV, or CV
2
is referenced in the conjunction.
CV
1
and CV
2
may be date columns.
In general, in another aspect, the invention features a method for optimizing queries to a database. The database includes a first table (T
1
) having a primary key (PK) column and a first correlated value column (CV
1
) and a second table (T
2
) having a foreign key (FK) column related to the primary key column of the first table and a second correlated value column (CV
2
). The method includes joining T
1
to the T
2
using PK=FK as the join condition to produce a join result having rows. Each row includes a value from CV
1
and a value from CV
2
. The method further includes creating an initial running constraint (RC). The initial running constraint includes a null range. The method includes producing a derived constraint rule (DCR) having the following form:
(
PK
=
FK
)
→
⁢
(
CV
2
+
C
1
⁢
A
≤
CV
1
≤
CV
2
+
C
2
⁢
A
)
⋃
⁢
(
CV
2
+
C
1
⁢
B
≤
CV
1
≤
CV
2
+
C
2
⁢
B
)
⋃
⁢
…
⁢
(
CV
2
+
C
1
⁢
N
≤
CV
1
≤
CV
2
+
C
2
⁢
N
)
;
where C
1A
, C
2A
, C
1B
, C
2B
, . . . C
1N
and C
2N
are constants, “→” means “implies,” and each parenthesized term on the right-hand side of the above equation represents an interval, by performing the following processing for each row in the join result: computing a new constraint (NEW), having a range, and modifying RC by forming a union of the range of NEW with the range of RC.
Implementations of the invention may include one or more of the following. The method may include merging two intervals if the number of intervals N exceeds a predetermined maximum number of intervals, K. Merging two intervals may include merging the two intervals that are closest to each other.
The method may include computing the usefulness of the DCR using the following equation:
usefulness
=
C
1
⁢
A
-
C
2
⁢
A
SIZE
+
C
1
⁢
B
-
C
2
⁢
B
SIZE
+
…
+
C
1
⁢
N
-
C
2
⁢
N
SIZE
where SIZE is the range of one of CV
1
or CV
2
.
In general, in another aspect, the invention features a computer program, stored on a tangible storage medium, for use in optimizing queries to a database. The database includes a first table (T
1
) having a primary key (PK) column and a first correlated value column (CV
1
) and a second table (T
2
) having a foreign key (FK) column related to the primary key column of the first table and a second correlated value column (CV
2
). The program includes executable instructions that cause a computer to join T
1
to the T
2
using PK=FK as the join condition to produce a join result having rows. Each row includes a value fr
Ghazal Ahmad Said
Sinclair Paul Laurence
Baker & Botts LLP
Choules Jack M.
NCR Corporation
LandOfFree
Semantic query optimization using value correlation does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Semantic query optimization using value correlation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Semantic query optimization using value correlation will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3164006