Techniques for handling function-defined hierarchical...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06493708

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to database systems and, more particularly, to techniques for handling function-defined hierarchical dimensions.
BACKGROUND OF THE INVENTION
In a database management system (DBMS), data is stored in one or more data containers, each container contains records, and the data within each record is organized into one or more fields. In relational database systems, the data containers are referred to as tables, the records are referred to as rows, and the fields are referred to as columns. In object oriented databases, the data containers are referred to as object classes, the records are referred to as objects, and the fields are referred to as attributes. Other database architectures may use other terminology.
Systems that implement the present invention are not limited to any particular type of data container or database architecture. However, for the purpose of explanation, the examples and the terminology used herein shall be that typically associated with relational databases. Thus, the terms “table”, “row” and “column” shall be used herein to refer respectively to the data container, record, and field.
Computer database systems that are used for data warehousing frequently store pre-computed summary information in summary tables in order to speed up query processing. The data from which the summary tables are generated are referred to as base data. The tables that contain the base data are referred to as base tables. Summary tables typically store aggregated information, such as “sum of PRODUCT_SALES, by region, by month.” Other examples of aggregated information include counts of tally totals, minimum values, maximum values, and average calculations.
In the context of database systems, a “dimension” is a list of values that provide categories for data. A dimension acts as an index for identifying values of a variable. For example, if sales data has a separate sales figure for each month, then the data has a MONTH dimension. That is, the data is organized by month.
Dimensions may have many levels of granularity, where a hierarchical relationship exists between members of the various levels. For example, one dimension may be “geography”, where geography has the following levels: city, state, region, country. Of these levels, “city” has the finest granularity, while “country” has the coarsest. Each level of a hierarchical dimension is associated with a set of values. For example, the “city” level of the geography dimension may be associated with values “NYC”, “LA”, and “SF”, each of which represents a city. The values associated with a given hierarchical level are referred to as the “granules” of that level. The values associated with the level of finest granularity are referred to as base granules.
In analytical applications, the data stored in tables of a data warehouse is generally aggregated along dimensions and hierarchies. A typical example is the summation of dollar sales by time, where time is a dimension which includes different levels such as day, week, month, quarter, and year. The time granules occur at different levels of data abstraction and form a hierarchy.
Typically, an analytical application asks for an aggregated result up to some level of data abstraction. For example, the application may ask for sum-of-sales by week, where all sale amounts for each week are summed into a single aggregated value. Other examples are sum-of-sales by month, sum-of-sales by year, etc.
Because of the hierarchical relationship, if sum-of-sales by month is already computed, then sum-of-sales by year can be simply computed by summing the monthly sum-of-sales. The technique of deriving values for a coarser level in the hierarchy based on values associated with a finer level in the hierarchy is known as “rolling up” the values. For example, the monthly sum-of-sales may be rolled up into yearly sum-of-sales.
In a data warehouse, a dimension such as “time”, is usually broken out into different granules, where a different table column is used to store the granules for each of the hierarchical levels. The granules are hierarchically related to each other. For example, cal_day → cal_month → cal_quarter → cal_year represents a calendar hierarchy in a time dimension. Another example, store → city → state → region → country represents a geographic hierarchy in a store dimension.
A hierarchy, such as the calendar hierarchy, can be declared as follows:
CREATE DIMENSION time_dim
LEVEL cal_day
IS time_tab.day
LEVEL cal_month
IS time_tab.month
LEVEL cal_quarter
Is time_tab.quarter
LEVEL cal_year
IS time_tab.year
HIERARCHY calendar _rollup (
cal_day
CHILD OF
cal_month
CHILD OF
cal_quarter
CHILD OF cal_year );
This declaration includes various LEVEL statements that identify the levels of the hierarchy. From the perspective of the database server, the LEVEL statements identify the various columns that should be created for the time_dim dimension table. The declaration further includes a HIERARCHY section that identifies the hierarchical relationships between the various the levels. Once the hierarchy “calendar _rollup” is declared, a database server can use this information to determine which levels can be rolled up to other levels. This information is used by the server when it rewrites queries in terms of materialized views.
A materialized view is a table where the pre-computed data corresponding to a materialized view definition is stored. For example, a materialized view “mv
1
” may be defined as follows:
CREATE MATERIALIZED VIEW mv
1
AS
SELECT t.month, t.quarter, t.year,
sum(f.dollar_sales) as month_sales
FROM fact_tab f, time_tab t
WHERE f.time_key=3D t.time_key
GROUP BY t.month, t.quarter, t.year;
Once a materialized view has been created, a database server may answer some queries that are issued against the base table with data from the materialized view. For example, a user may submit the following query (Q
1
) that asks for sum-of-sales by month:
Query Q
1
:
SELECT t.month, SUM(f.dollar_sales) as sum_sales
FROM fact_tab f, time_tab t
WHERE f.time_key=3D t.time_key
GROUP BY t.month;
Query Q
1
requests a join to be performed between two base tables: fact_tab and time_tab. Scanning the based tables and performing the join operation may involve a significant amount of overhead. To avoid this overhead, the server can rewrite query Q
1
to access data from mv
1
, instead of performing a join of fact_tab and time_tab. In the present example, query Q
1
maybe rewritten as:
Rewritten query Q
1
′:
SELECT mv
1
.month, mv
1
.month_sales as sum_sales
FROM mv
1
;
The rewritten query Q
1
′ produces the same results as the original query Q
1
, but requests the retrieval of data from mv
1
rather than from a join between fact_tab and time_tab. Consequently, the amount of overhead required to process the rewritten query Q
1
′ may be several orders of magnitude less than the overhead required to process the original query Q
1
.
In the example given above, query Q
1
′ requires sales to be summed by month. Conveniently, materialized view mv
1
contains data that has already been summed by month. Consequently, rewritten query Q
1
′ does not involve any additional aggregation. However, when the hierarchical relationship between the levels of a dimension are known, queries that require aggregation at a relatively coarser level of a dimension can be rewritten to access a materialized view that stores data that has been aggregated at a relatively finer level of that dimension.
For example, suppose the user submits a query (Q
2
) that asks for sum-of-sales by year. Knowing the calendar _rollup hierarchy declared in time_dim, the server can still rewrite Q
2
to access data from mv
1
instead of performing a join of fact_tab and time_tab.
Query Q
2
:
SELECT t.year, SUM(f.dollar_sales) as sum_sales
FROM fact_tab f, time_tab t
WHERE f.time_key=
3
D t.time_key
GROUP BY t.year;
To access mv
1
, query Q
2
may be rewritten as:
Rewritten query Q
2
′:
SELECT mv
1
.year, SUM(mv
1

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Techniques for handling function-defined hierarchical... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Techniques for handling function-defined hierarchical..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Techniques for handling function-defined hierarchical... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2968594

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.