Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-11-08
2004-01-27
Breene, John (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
C709S241000
Reexamination Certificate
active
06684203
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to optimizing execution of queries on a database system, and in particular, transforming queries so that they can be executed more efficiently.
BACKGROUND OF THE INVENTION
In relational database systems, a star schema is distinguished by the presence of one or more relatively large tables and several relatively smaller tables. Rather than duplicating the information contained in the smaller tables, the large tables contain references (foreign key values) to rows stored in the smaller tables. The larger tables within a star schema are referred to as “fact tables”, while the smaller tables are referred to as “dimension tables”.
FIG. 1
illustrates an exemplary star schema with two dimensions.
Referring to
FIG. 1
, it illustrates a database system
100
that includes tables
102
,
104
and
106
. Table
102
is named “stores” and contains information about each of the stores in which sales may occur. Each row in store table
102
contains a unique store-id and information about the particular store that corresponds to the store-id. Table
104
is named “products” and contains information about each type of product that may be sold in any of the stores. Each row in product table
104
contains a unique product-id and information about the particular product.
Table
106
is named “sales” and contains information about each sale in each of the stores represented in the store table
102
. Each row in sale table
106
includes a unique sale-id, a store-id to indicate the store at which the sale was made, a product-id to indicate the product sold in the sale, and the date of the sale. Typically, the number of sales will be vastly greater than both the number of stores at which the sales are made and the number of products carried by the stores. Detailed information about the store and product involved in a sale transaction does not have to be stored in the rows of table
106
because such detailed information is available in tables
102
and
104
, respectively. Instead, the rows of table
106
simply contain values (store-ids and product-ids) that reference information stored in the other tables
102
and
104
. Therefore, tables
102
,
104
and
106
constitute a star schema in which table
106
is the fact table and tables
102
and
104
are dimension tables.
Queries that operate on data stored in tables that belong to a star schema are referred to as star queries. The format of a star query is described below.
STAR QUERY FORMAT
Star queries typically contain clauses which (1) identify the fact table and dimension tables, (2) specify the correspondence between the fact table columns and the dimension table columns, and (3) specify the constraints on the dimension table columns. For example, star queries in SQL typically have the form:
QUERY 1
select . . . from fact, dim1, dim2, . . . , dimn
where fact.key1 = dim1.key
and fact.key2 = dim2.key
and . . . and fact.keyn = dimn.key
and <dimension table constraints>
In Query 1, the fact table is named “fact” and the dimension tables are named “dim
1
” through “dimn”. The “where” clause includes expressions that equate columns in the fact table with corresponding columns in the dimension tables. Such expressions are referred to herein as “join predicates”. For example, the expression “fact.key
1
=dim
1
.key” is a join predicate that specifies that the column “key
1
” in the fact table corresponds to the column “key” in the dimension table that is named “dim
1
”.
Constraints are predicates referring to tables that restrict the query to a subset of the rows in the table referenced by the constraint. A table referenced by a constraint is referred to herein as a constrained table. The <dimension table constraints>are constraints referring to some or all of the dimension tables but not the fact table. For example, the query:
QUERY 2
select sales.date from sales, stores, products
where sales.store-id = stores.store-id
and sales.product-id = products.product-id
and stores.city = ‘San Jose’
and products.cost > $1,000
includes the dimension table constraints “stores.city=San Jose” and “products.cost>$1,000”.
STAR QUERY TRANSFORMATION
To execute a star query more efficiently, a star query may be rewritten. The process of rewriting queries is referred to herein as transformation. A method for transforming a star query is described in U.S. Pat. No. 5,848,408, issued to Hakan Jakobsson, et al., on Dec. 8, 1998.
Specifically, a star query (or the internal representation of a star query) is transformed by adding to the WHERE clause of the star query additional subqueries. These additional subqueries are constructed from the join predicates and dimension table constraints contained in the original star query. Specifically, for each constrained dimension k, an IN subquery may be generated of the form:
fact.keyk IN (select dimk.key from dimk where <dimk constraints>)
In the above expression, fact.keyk is the column of the fact table that a join predicate correlates with the “key” column of “dimk”. For example, consider the query:
QUERY 3
select * from fact, dim1, dim2, dim3
where fact.key1 = dim1.key
and fact.key2 = dim2.key
and fact.key3 = dim3.key
and dim1.c3 > 4 and (dim1.c6 = 10 or dim1.c6 = 11)
and dim3.c5 between 100 and 1000
Query 3 contains a dimension table constraint “dim
1
.c
3
>4 and (dim
1
.c
6
=10 or dim
1
.c
6
=11)” for dim
1
. The query contains a join predicate “fact.key
1
=dim
1
.key” which specifies a correlation between the “key” column of dim
1
and the “key
1
” column of the fact table. Based on these expressions, the following subquery would be generated:
fact.key
1
in (select dim
1
.key from dim
1
where dim
1
.c
3
>4 and (dim
1
.c
6
=10 or dim
1
.c
6
=11))
Query 3 also specifies a dimension table constraint “dim
3
.c
5
between 100 and 1000” for dim
3
. The query contains a join predicate “fact.key
3
=dim
3
.key” which specifies a correlation between the “key” column of dim
3
and the “key
3
” column of the fact table. Based on these expressions, the following subquery would be generated:
fact.key
3
in (select dim
3
.key from dim
3
where dim
3
.c
5
between 100 and 1000)
The subqueries generated in this manner are ANDed to the other predicates that constrain the dimension table dimk. In the present example, the resulting “transformed” star query is:
QUERY 5
select * from fact, dim1, dim2, dim3
where fact.key1 = dim1.key
and fact.key2 = dim2.key
and fact.key3 = dim3.key
and dim1.c3 > 4 and (dim1.c6 = 10 or dim1.c6 = 11)
and dim3.c5 between 100 and 1000
and fact.key1 in (select dim1.key from dim1
where dim1.c3 > 4 and (dim1.c6 = 10 or dim1.c6 = 11))
and fact.key3 in (select dim3.key from dim3
where dim3.c5 between 100 and 1000)
PARALLELIZING QUERY EXECUTION
Database systems may run on multi-processing systems. Multi-processing systems are typically partitioned into nodes, where each node may contain multiple processors executing multiple concurrent processes. To fully utilize the computing power of a multi-processing system, a database system may divide a large processing task required by a query into smaller work granules which may then be distributed to processes running on one or more processing nodes. Because the various work granules are being performed in parallel, the processing required by the query can be completed much faster than if the processing were performed on a single node by a single process.
To parallelize the execution of a star query, the work to be done on the fact table is dynamically partitioned by, for example, rowid range to create “work granules”. The work granules are than assigned to a set of processes. These processes, referred to herein as parallel query slaves, execute their assigned work granules in parallel with the other parallel query slaves.
Each parallel query slave accesses the part of the fact table specified by the r
Amor Angela
Depledge Michael Charles
Esq.
Ozbutun Cetin
Waddington William H.
Ali Mohammad
Bingham Marcel K.
Breene John
Esq.
Hickman Palermo & Truong & Becker LLP
LandOfFree
Using global temporary tables to transform queries does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Using global temporary tables to transform queries, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Using global temporary tables to transform queries will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3209112