Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2002-02-26
2004-08-10
Mizrahi, Diane D. (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
Reexamination Certificate
active
06775682
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to database management systems and, more specifically, to techniques for efficiently evaluating database queries including rollups and distinct aggregates.
BACKGROUND OF THE INVENTION
In a database management system (DBMS), data is stored in one or more data containers, each container contains records, and the data within each record is organized into one or more fields. In relational database management systems, the data containers are referred to as tables, the records are referred to as rows, and the fields are referred to as columns. In object oriented databases, the data containers are referred to as object classes, the records are referred to as objects, and the fields are referred to as attributes. Other database architectures may use other terminology.
Database management systems retrieve information in response to receiving queries that specify the information to retrieve. In order for a database management system to understand the query, the query should conform to a database language recognized by the database management system, such as the Structured Query Language (SQL).
In an OLAP (on-line analytical processing) environment or a data warehousing environment, data is often organized into a star schema. A star schema is distinguished by the presence of one or more relatively large tables and several relatively smaller tables. Rather than duplicating the information contained in the smaller tables, the large tables contain references (foreign key values) to rows stored in the smaller tables. The larger tables within a star schema are referred to as “fact tables”, while the smaller tables are referred to as “dimension tables”. Typically, each dimension has “levels” which correspond to columns of the dimension table, which are organized in a hierarchical manner. For example, a TIME dimension might consist of the levels year, quarter, month, and day, corresponding to columns of the TIME table. The hierarchical organization is such that years consist of quarters, quarters consist of months, and months consist of days.
AGGREGATE FUNCTION
An important function for data generation and retrieval performed by a database management system is the generation of aggregated information. Aggregated information is information derived by applying an aggregate function to the values in a column of a subset of rows in a table or on the result of a join of two or more tables. Examples of aggregate functions are functions that sum values, calculate averages, and determine minimum and maximum values. The column that contains the values to which an aggregate function is applied is referred to as the measure.
The subsets of rows to which an aggregate function is applied are determined by values in “group-by” columns. The aggregate information generated by a database management system is presented as a result set having the group-by column(s) and the aggregated measure column. In particular, the result set has one row for each unique value in the group-by column. Each row in the result set corresponds to the group of rows in the base table containing the value for the group-by column of the row. The measure column in the row contains the output of the aggregate function applied to the values in the measure column of the group of rows.
Aggregate information is generated by a database management system in response to receiving an aggregate query. An aggregate query specifies a group-by column, the aggregate measure column, and the aggregate function to apply to the measure values. The following query is provided as an illustration.
SELECT d, SUM(s) sum_s
FROM t
GROUP BY d
Table t contains data representing the sales of an organization. Each row represents a particular sales transaction. For a particular row in table t, column d contains the date of the sales transaction, and s contains the sale amount.
The SELECT clause contains “SUM(s)”, which specifies that the aggregate function “sum” is to be applied to the values in column s (aggregate measure) of table t. The query also includes the group-by clause “GROUP BY d”, which denotes column d as the group-by column.
Execution of this query generates a result set with a column for d and a column for sum (s). A particular row in the result set represents the total sales (s) for all sale transactions in a given day (d). Specifically, for a particular row in the result set, d contains a unique date value from table t for column d. Column sum_s contains the sum of the sales amount values in column s for the group of rows from t that have the unique date value in column d.
It is often useful to generate aggregate information grouped by multiple columns. For example, table t may also contain column r, a column containing values representing regions. It is may be useful to generate a result set that summarizes sales by region, and for each region, sales date. Such a result set may be generated by referencing column r and d in the group-by clause, as illustrated by the following query.
SELECT d, r SUM (s)
FROM t
GROUP BY r, d
DISTINCT AGGREGATE FUNCTION
Another type of aggregate function exists: a distinct aggregate. The following illustrates a query including a distinct aggregate function.
SELECT mgr, deptno, count(DISTINCT job)
FROM emp
GROUP BY mgr, deptno
Execution of a query that includes a distinct aggregate function, such as the preceding query, removes duplicate records (since the aggregate function specifies “DISTINCT” records), and applies the aggregate function to the resulting records. The subset of rows that are included in the final query result set are obtained from the GROUP BY columns (i.e., mgr and deptno) and the measure (job) of the distinct aggregate function.
ROLLUP OPERATOR
Another useful way to provide aggregate information is to generate one result set that groups data by various combinations of columns. For example, a result set may contain a set of rows grouped by region and date, and a set of rows grouped only by region. Such a result set may be generated by submitting a query that includes multiple subqueries operated upon by the union operator. While union queries may be used to generate a result set with multiple groupings, they can be very tedious to write. The programmer of the subquery must write a subquery for each desired grouping, which may become extraordinarily burdensome when the number of groupings desired is relatively large. Furthermore, such queries are very inefficient to execute, as some tables are accessed multiple times.
To avoid this burden, SQL defines extended group-by operators. Extended group-by operators include cube, rollup, and grouping sets. The group-by operators are used to specify groupings that include various combinations of the columns specified as arguments to the operators. For example, using the rollup operator, a query may be written as follows.
SELECT year, quarter, month, SUM(sales)
FROM fact.time
WHERE fact.tkey=time.tkey
GROUP BY rollup(year, quarter, month).
According to techniques that employ sort-based algorithms, execution of this query sorts data produced by the FROM and WHERE clauses on three columns (year, quarter, and month) specified in the GROUP BY clause, in order to logically group the records for efficient aggregation on the column (sales) specified in the aggregate function (SUM). The rollup operator aggregates data across levels specified as the keys (or columns) of the rollup operator, specified in the GROUP BY line. For example, “GROUP BY rollup(year, quarter, month)” produces aggregated results on the following groups:
(year, quarter, month);
(year, quarter);
(year); and
( ).
Note that a rollup on n columns produces n+1 groups. The grand total (referred to as the highest or coarsest) group is the () grouping, and the base (referred to as the lowest or finest) group is the (year, quarter, month) grouping.
ROLLUP OPERATOR WITH DISTINCT AGGREGATE FUNCTION
Processing distinct aggregates typically involves two steps, that is, eliminating duplicate rows and aggregating the resulting data. Computin
Ballamkonda Srikanth
Gupta Abhinav
Witkowski Andrew
Henkhaus John D.
Hickman Palermo & Truong & Becker LLP
Mizrahi Diane D.
Oracle International Corporation
LandOfFree
Evaluation of rollups with distinct aggregates by using... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Evaluation of rollups with distinct aggregates by using..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Evaluation of rollups with distinct aggregates by using... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3358265