Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-06-25
2002-09-03
Mizrahi, Diane D. (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
06446063
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a method, system, and program for performing a join operation on a multi-column table and satellite tables with a join condition and, in particular, joining multiple dimension tables with a fact table in a star join.
2. Description of the Related Art
Data records in a computer database are maintained in tables, which are a collection of rows all having the same columns. Each column maintains information on a particular type of data for the data records which comprise the rows. One or more indexes may be associated with each table. An index is an ordered set of pointers to data records in the table based on the data in one or more columns of the table. In some cases, all the information needed by a query may be found in the index, making it unnecessary to search the actual table. An index is comprised of rows or index entries which include an index key and a pointer to a database record in the table having the key column values of the index entry key. An index key is comprised of key columns that provide an ordering to records in a table. The index key columns are comprised of the columns of the table, and may include any of the values that are possible for that particular column. Columns that are used frequently to access a table may be used as key columns.
Organizations may archive data in a data warehouse, which is a collection of data designed to support management decision making. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time. One data warehouse design implementation is known as star schema or multidimensional modeling. The basic premise of star schemas is that information is classified into two groups, facts and dimensions. A fact table comprises the main data base records concerning the organization's key transactions, such as sales data, purchase data, investment returns, etc. Dimensions are tables that maintain attributes about the data in the fact table. Each dimension table has a primary key column that corresponds to a foreign key column in the fact table. Typically, the fact table is much larger than the related dimension tables.
The fact table typically comprises numerical facts, such as the date of a sale, cost, type of product sold, location, site of sale, etc. The dimension table usually provides descriptive textual information providing attributes on one of the fact table columns. For instance, a time dimension table can provide attributes on the date column in the fact table describing the date of sale. The time dimension table may provide various weather conditions or events that occurred on particular dates. Thus, the time dimension table provides attributes on the time, i.e., weather, important events, etc., about data columns in the fact table.
The star schema provides a view of the database on dimension attributes that are useful for analysis purposes. This allows users to query on attributes in the dimension tables to locate records in the fact table. A query would qualify rows in the dimension tables that satisfy certain attributes or join conditions. The qualifying rows of the dimension tables have primary keys that correspond to foreign keys in the fact table. A join operation, such as an equijoin or natural join, is then performed on the qualifying rows of the joined dimension tables and the fact table. This join results in returning fact table entries that match the rows of the joined dimension tables, i.e., fact table entries that satisfy the search criteria on the dimension tables. Thus, join operations are used to query a fact table on dimension table attributes.
A join operation combines or concatenates the rows from the different dimension tables according to a condition or predicate to determine values to apply against the fact table. This is distinguished from a Cartesian product which concatenates every row from one table with every other row from another table without regard to a condition or predicate to exclude rows from the result. Rows from the tables involved in a join operation that do not satisfy the predicate or condition are excluded from the join result. Because the dimension tables are unrelated, the rows of the dimension tables that satisfy the join condition are then concatenated in every possible combination.
The Cartesian product of the rows of the dimension tables provides a data view of the entire space, i.e., Cartesian space, of every possible combination of the possible dimension table values. The join result, on the other hand, is a subset of the Cartesian space that is limited to those Cartesian space points that satisfy the join or search condition. One common type of join operation is an equijoin. An equijoin combines two rows from different tables that are equal according to some attribute. Once the combination of all dimension table rows that satisfy the search criteria is produced, the resulting rows are then applied to the Fact table in an equijoin operation to locate those rows in the fact table that have the same values as the rows resulting from the join on the dimension tables. Typically, the primary key columns of the dimension tables in the join result are compared against the corresponding foreign key columns in the Fact table to produce the equijoin results.
In multi-dimensional analysis, it is often desirable to form a query on the attributes specified in the dimension tables and then locate all records in the fact table that satisfy the criteria on the dimension table attributes. To perform such a query, the query engine joins the dimension tables on the conditions specified in the search criteria. The query engine then equijoins the dimension tables with the fact table to produce join results that satisfy the join condition.
The above query technique using join operations is very inefficient because the results of the join operation on the dimension tables may produce numerous concatenations that do not exist in the fact table. In fact, it has been found that on average only 1% or less of the concatenated results of the join operation on the dimension tables have corresponding matching entries in the Fact table that would concatenate in an equijoin operation. Nonetheless, prior art techniques would attempt to join all of the join results from the dimension tables to the fact table even though many of these attempted joins would not produce results as less than 1% of the concatenated results of the dimension table joins have corresponding matches in the fact table. Thus, numerous join operations are performed for which there will be no join result, thereby needlessly consuming I/O operations to perform the non-matching join operations.
FIG. 1
illustrates an example of a star schema
2
with multiple dimension tables
4
,
6
, and
8
and a fact table
10
. The fact table
10
includes sales data, wherein each record includes information on the amount sold in the AMOUNT column
12
; the time of sale in the TID column
14
, which includes a time identifier; the product sold in the PID column
16
which is a product identifier; and the location of the sale, e.g., store location, in the GID column
18
, which is a geographic identifier. The dimension tables
4
,
6
, and
8
provide attributes on the TID
14
, PID
16
, and GID
18
columns in the fact table.
The primary key columns of each of the dimension tables
4
,
6
,
8
are the TID column
20
, PID column
28
, and GlD column
36
, respectively. The columns
14
,
16
, and
18
in the fact table
10
are foreign keys that correspond to primary keys
20
,
28
, and
36
of the dimension tables
4
,
6
,
8
that provide attributes on the data in the fact table
10
. For instance dimension table
4
provides attributes for each possible TID value, including month information in column
22
, quarter of the TID in the quarter column
24
, and the year of the TID in the year column
26
. Dimension table
6
provides product attributes for each PID value, including the product item i
Chen Stephen Yao Ching
Hoa Peter
Kyu William Y.
Lin Fen-Ling
Sampatrai Desai Paramesh
Konrad Raynes Victor & Mann
Mizrahi Diane D.
Victor David W.
LandOfFree
Method, system, and program for performing a join operation... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method, system, and program for performing a join operation..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method, system, and program for performing a join operation... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2890162