Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-05-21
2004-08-10
Channavajjala, Srirama (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06775662
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to query optimization, and in particular, optimizing queries by rewriting the queries.
BACKGROUND OF THE INVENTION
Storing and retrieving large amounts of data are some of the most important functions of computers in today's society. To carry out these functions, database systems are typically used to retrieve and store data in databases. Database systems developed so far have performed these functions very successfully, creating for society the ability to retrieve data at speeds and quantities previously unimagined, and giving society an unprecedented level of access to information. The success of database systems has unleashed an insatiable demand for even faster and more efficient database systems that process even greater quantities of data.
In a database management system (DBMS), data is stored in one or more data containers, such as tables. The term table is used to refer to any set of data that is processed as a set of records, each record being organized into one or more fields. In relational database systems, the records are referred to as rows, and the fields are referred to as columns. A table may be a data container with rows and columns, or the results of a query, including subqueries. A subquery is a query within a query, and shall be described later in further detail.
In object-oriented databases, the data containers correspond to object classes, the records are referred to as objects, and the fields are referred to as attributes. Other database architectures may use other terminology.
Systems that implement the present invention are not limited to any particular type of data container or database architecture. However, for the purpose of explanation, the examples and the terminology used herein shall be that typically associated with relational databases. Thus, the terms row and column shall be used herein to refer respectively to record and field.
Database systems retrieve information in response to receiving queries that specify the information to retrieve. In order for a database system to understand the query, the query should conform to a database language recognized by the database system, such as the ANSI Structured Query Language (SQL). To illustrate the structure of a query, the following query E
1
is provided.
SELECT T.a FROM T WHERE T.b=1 and T.c=1
Query E
1
, when executed by a database system that supports SQL, retrieves column T.a of table T from rows containing the value 1 in column T.b of table T. The above query includes a SELECT clause (i.e. “SELECT T.a”), a FROM clause (i.e. “FROM T”), and a WHERE clause (i.e. “T.b =1 and T.c=1”). The FROM clause references one or more tables from which to retrieve values. The tables in the FROM clause are collectively referred to as the FROM list. The SELECT clause specifies one or more columns in the items in the FROM list from which to retrieve values. The one or more columns in the SELECT clause are collectively referred to as the SELECT list.
The WHERE clause specifies the rows from which the values are retrieved. Specifically, the WHERE clause contains one or more conditions defining criterion that must be met by a column(s) of the rows from which values are retrieved. The conditions in the WHERE clause are referred to as predicates. Query E
1
contains the predicates “T.b=1” and “T.c=1”. The term “expression” is used to refer to a logical expression based on one or more predicates. Query E
1
contains the expression “T.b=1 and T.c=1”. The criteria specified by the WHERE clause is referred to herein collectively as filtering criteria.
When a database system executes a query, it generates results in the form of a table. Such a table is referred to herein as a result set. The result set has the columns specified in the SELECT list of the query.
One of the most important functions for data generation and retrieval performed by a database system is the generation of aggregated information. Aggregated information is information derived by applying an aggregate function to the values in a column of a subset of rows in a “base table”. Examples of aggregate functions are functions that sum values, calculate averages, and determine minimum and maximum values. The column that contains the values to which an aggregate function is applied is referred to as the measure.
The subsets of rows to which an aggregate function is applied are determined by values in a “group-by” column. The aggregate information generated by a database system is presented as a result set having the group-by column and the measure column. In particular, the result set has one row for each unique value in the group-by column. Each row in the result set corresponds to the group of rows in the base table containing the value for the group-by column of the row. The measure column in the row contains the output of the aggregate function applied to the values in the measure column of the group of rows.
Aggregate information is generated by a database system in response to receiving an aggregate query. An aggregate query specifies a group-by column, the measure column, and the aggregate function to apply to the measure values. The following query E
2
is provided as an illustration.
Select d, sum(s) sum_s From t
Group by d
Table t contains data representing the sales of an organization. Each row represents a particular sales transaction. For a particular row in table t, column d contains the date of the sales transaction, and s contains the sale amount.
The SELECT clause in query E
2
contains “sum(s)”, which specifies that the aggregate function “sum” is to be applied to the values in column s of table t. The query E
2
also includes the group-by clause “Group by d”, which denotes column d as the group-by column.
Execution of query E
2
generates a result set with a column for d and sum (s). A particular row in the result set represents the total sales for all sale transactions in a given day. Specifically, for a particular row in the result set, d contains a unique date value from table t for column d. Column sum_s contains the sum of the sales amount values in column s for the group of rows from t that have the unique date value in column d.
It is often useful to generate aggregate information grouped by multiple columns. For example, table t may also contain column r, a column containing values representing regions.
It is may be useful to generate a result set that summarizes sales by region, and for each region, sales date. Such a result set may be generated by referencing column r and d in the group-by clause, as illustrated by the following query E
3
.
Select d, r sum (s) from t
group by r, d
Another useful way to provide aggregate information is to generate one result set that groups data by various combinations of columns. For example, a result set may contain a set of rows grouped by region and date, and a set of rows grouped only by region. Such a result set may be generated by submitting a query that includes multiple subqueries operated upon by the union operator.
The union operator is used to specify that the result set of a subquery is to be merged with the result set of another subquery. The following query E
3
is used to illustrate how the union operator may be used to specify a result set that groups by multiple combinations of group-by columns.
select a, b, sum(m)from t
group by a,b
union
select a, null, sum(m)from t
group by a
union
select null, b, sum(m)from t
group by b
union
select null, null, sum(m)
from t
Query E
3
specifies four subqueries, each with a group-by clause specifying a different combination of columns by which to group rows. A particular group of columns by which to group rows is referred to herein after as a grouping. The first subquery in query E
3
contains the group-by clause “group by a, b”, thus specifying a grouping of (a,b). The remaining subqueries respectively specify the groupings (a), (b), and ( ). “( )” denotes the grand total, specifying no group-by column.
The union operator has t
Bozkaya Tolga
Witkowski Andrew
Bingham Marcel K.
Channavajjala Srirama
Hickman Palermo & Truong & Becker LLP
Lu Kuen S
Oracle International Corporation
LandOfFree
Group pruning from cube, rollup, and grouping sets does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Group pruning from cube, rollup, and grouping sets, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Group pruning from cube, rollup, and grouping sets will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3347755