Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-05-22
2001-03-20
Amsbury, Wayne (Department: 2771)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06205451
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to databases and, more specifically, to methods and apparatus for updating data in data summary tables.
BACKGROUND OF THE INVENTION
Computer database systems that are used for data warehousing frequently store pre-computed summary information in summary tables in order to speed up query processing. The data from which the summary tables are generated are referred to as base data. The tables that contain the base data are referred to as base tables.
Summary tables typically store aggregated information, such as “sum of PRODUCT_SALES, by region, by month.” Other examples of aggregated information include counts of tally totals, minimum values, maximum values, and average calculations. As new data is periodically added to the base tables, the summary information needs to be updated (i.e., refreshed) to reflect the new base data.
One approach to refreshing summary tables is referred to as the “total refresh” or “full refresh” approach. According to the total refresh approach, the values in summary tables are recalculated based on all of the base data every time new base data is supplied. Systems that employ full refresh approach have the disadvantage that the recreation process is a relatively lengthy operation due to the size and number of tables from which the summary information is derived. For example, when ten new rows are added to a particular base table that contains a million rows, a total refresh operation would have to process all one million and ten rows of the base table to regenerate the summary tables derived using the base table.
The process of updating summary information may be improved by performing incremental refresh, where rather than generating a new set of summary information based on calculations that use all of the base data, the summary information is updated based on previous summary values and the new base data.
One difficulty associated with performing incremental refresh is that a single summary table may contain summarized data derived from multiple base tables. For example, assume that a database includes the three base tables illustrated in FIG.
1
. Referring to
FIG. 1
, a PRODUCT_SALES table
106
contains information about specific sales made by a company. A LOCATION table
104
contains information about where the sales took place. A TIME table
102
contains information about the times at which sales were made.
FIG. 2
illustrates a SALESREPORT table
202
that stores the pre-computed result of an aggregate query based on the PRODUCT_SALES table
106
, LOCATION table
104
and TIME table
102
. In the specific example given, SALESREPORT table
202
stores the sum of all sales made in each city during each month.
Information that defines a summary table is referred to herein as a “summary definition”. A summary definition indicates (1) the location of the base data that is used to derive the summary table, and (2) how the base data from the base tables should be aggregated to derive the summarized data. In many systems, summary definitions take the form of queries. For example, the query for the SALESREPORT table
202
may be expressed in a relational database system using the following SQL language statement which joins the three base tables:
SELECT time.month, location.city, SUM (product_sales.total)
sumsales
FROM time t1, location t2, PRODUCT_SALES t3
WHERE time.time_id=product_sales.time_id AND
location.loc_id=product_sales.loc_id
GROUP BY month, city
The “SELECT” line indicates that columns that are to be used to generate the summary table. The “FROM” line indicates the tables that have those columns. The “WHERE” line indicates the criteria for joining values from one table with corresponding values from the other tables. For example, values in row
108
of TIME table
102
, row
110
of LOCATION table
104
, and row
112
of PRODUCT_SALES table
106
would be joined together because the time_id value in row
108
matches the time_id value in row
112
, and the loc_id value in row
110
matches the loc_id value in row 112.
The “GROUP BY” line indicates that the rows retrieved by the select statement should be put into groups for each month/city combination. The “sum” function in the “SELECT” line indicates that for each of those groups, the values from the “total” columns are to be summed. The resulting summary table will thus have one row for every month/city combination, and that row will have a column called “sumsales” that contains the sum of the “total” column values for that month/city combination.
The SALESREPORT table 202 illustrated in
FIG. 2
is an example of the summary table that would be generated in response to the summary table definition specified above. The SALESREPORT table 202 has the three rows specified in the SELECT line of the summary table definition: month, city, and sumsales.
If the system that maintains the SALESREPORT table 202 does not support incremental refresh, the query listed above must be run against the base tables each time data is added to any of the base tables. Rather than re-compute the entire contents of SALESREPORT when new data is added to the system, it would be more efficient to re-compute just the changes to the existing SALESREPORT that result from the new data that was added to the base tables.
Prior mechanisms for incremental refresh have relied upon the re-computing of aggregates as data is being loaded into base tables. Thus, as one set of rows is added to a base table, the summary table is updated based on the data in the new rows. As a second set of rows is added to a base table, the summary table is again updated based on the data in the second set of rows. This is not a practical nor efficient solution for two reasons: first, the database user may have business reasons to not re-compute the summary each and every time data is loaded into base tables; and second, combining aggregation with data loading degrades the performance of the data-loading procedure.
Based on the foregoing, it is clearly desirable to provide a practical and efficient mechanism for incremental refresh of summaries that allows refresh to be deferred until after base data has been loaded into the data warehouse. It is further desirable that the mechanism correctly and efficiently refresh the summary regardless how many times new data has been added to the base table since the last time the summary table was refreshed.
In addition, it is desirable to provide an incremental refresh mechanism produces the correct results even when multiple base table updates have been loaded since the last refresh operation. It is further desirable that the mechanism perform correctly if a particular base table has had new data loaded into it multiple times since the last refresh of the summary table. The mechanism for incremental refresh should also perform multiple operations in parallel in order to perform the incremental refresh more quickly.
SUMMARY OF THE INVENTION
A method and apparatus are provided for performing deferred incremental refresh of summary tables that are derived from two or more base tables. Incremental refresh is performed by assigning a hypothetical load sequence to the base tables of the summary table. For each base table that contains new data that affects the summary table, a join is performed between (1) the new data in the base table, (2) the pre-update state of base tables that follow the base table in the hypothetical load sequence, and (3) the post-update state of base tables the precede the base table in the hypothetical load sequence. The results of the join are then merged with the existing summary table to refresh the summary table. According to one aspect of the invention, efficiency is improved by performing the joins for the various base tables in parallel.
REFERENCES:
patent: 5444842 (1995-08-01), Bentson et al.
patent: 5848405 (1998-12-01), Norcott
patent: 5873093 (1999-02-01), Williamson et al.
patent: 5960426 (1999-09-01), Pirahesh et al.
patent: 5974416 (1999-10-01), Anand et al.
patent: 6006216 (1999-12-01), Griffin et al.
pat
Finnerty James
Norcott William D.
Amsbury Wayne
Hickman Brian D.
Hickman Palermo & Truong & Becker LLP
Oracle Corporation
Pardo Thuy
LandOfFree
Method and apparatus for incremental refresh of summary... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for incremental refresh of summary..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for incremental refresh of summary... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2437984