Method, system, and program for determining the join...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06397204

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a method, system, and program for joining a multi-column table and at least two satellite tables and, in particular, determining the order of joining the satellite tables and multi-column table.
2. Description of the Related Art
Data records in a relational database management system (RDBMS) in a computer are maintained in tables, which are a collection of rows all having the same columns. Each column maintains information on a particular type of data for the data records which comprise the rows. One or more indexes may be associated with each table. An index an ordered set of pointers to data records in the table based on the data in one or more columns of the table. In some cases, all the information needed by a query may be found in the index, making it unnecessary to search the actual table. An index is comprised of rows or index entries which include an index key and a pointer to a database record in the table having the key column values of the index entry key. An index key is comprised of key columns that provide an ordering to records in a table. The index key columns are comprised of the columns of the table, and may include any of the values that are possible for that particular column. Columns that are used frequently to access a table may be used as key columns.
Organizations may archive data in a data warehouse, which is a collection of data designed to support management decision making. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time. One data warehouse design implementation is known as star schema or multidimensional modeling. The basic premise of star schemas is that information is classified into two groups, facts and dimensions. A fact table comprises the main data base records concerning the organization's key transactions, such as sales data, purchase data, investment returns, etc. Dimensions are tables that maintain attributes about the data in the fact table. Each dimension table has a primary key column that corresponds to a foreign key column in the fact table. Typically, the fact table is much larger than the related dimension tables.
The fact table typically comprises numerical facts, such as the date of a sale ,cost, type of product sold, location, site of sale, etc. The dimension table usually provides descriptive textual information providing attributes on one of the fact table columns. For instance, a time dimension table can provide attributes on the date column in the fact table describing the date of sale. The time dimension table may provide various weather conditions or events that occurred on particular dates. Thus, the time dimension table provides attributes on the time, i.e., weather, important events, etc., about data columns in the fact table.
The star schema provides a view of the database on dimension attributes that are useful for analysis purposes. This allows users to query on attributes in the dimension tables to locate records in the fact table. A query would qualify rows in the dimension tables that satisfy certain attributes or join conditions. The qualifying rows of the dimension tables have primary keys that correspond to foreign keys in the fact table. A join operation, such as an equijoin or natural join, is then specified to qualify rows of the fact table. Typically, the primary key columns of the dimension tables in the join result are compared against the corresponding foreign key columns in the Fact table to produce the equijoin results.
FIG. 1
illustrates an example of a star schema
2
with multiple dimension tables
4
,
6
, and
8
and a fact table
10
. The fact table
10
includes sales data, wherein each record includes information on the amount sold in the AMOUNT column
12
; the time of sale in the TID column
14
, which includes a time identifier; the product sold in the PID column
16
which is a product identifier; and the location of the sale, e.g., store location, in the GID column
18
, which is a geographic identifier. The dimension tables
4
,
6
, and
8
provide attributes on the TID
14
, PID
16
, and GID
18
columns in the fact table.
The primary key columns of each of the dimension tables
4
,
6
,
8
are the TID column
20
, PID column
28
, and GID column
36
, respectively. The columns
14
,
16
, and
18
in the fact table
10
are foreign keys that correspond to primary keys
20
,
28
, and
36
of the dimension tables
4
,
6
,
8
that provide attributes on the data in the fact table
10
. For instance dimension table
4
provides attributes for each possible TID value, including month information in column
22
, quarter of the TID in the quarter column
24
, and the year of the TID in the year column
26
. Dimension table
6
provides product attributes for each PID value, including the product item in item column
30
, the class of the product in the class column
32
, and the inventory location of the product in inventory column
34
. The dimension table
8
provides attributes for each possible GID value, including the city of the GID in the city column
38
, the geographical region in the region column
40
, and the country in the country column
42
.
Much effort has been expended in developing optimization techniques to select the best possible join ordering for queries in relational database systems. The order in which the joins are performed has a substantial impact on query performance. Each possible plan for executing an SQL statement is an access plan. The choice of an access plan among the many possible such plans has a substantial effect on performance during execution of the query and joining of tables. The number of possible joins to consider grows exponentially as tables are added to the query. Star schemas involve a large number of dimension tables in the join. Thus, a query optimizer would have to consider perhaps millions of possible permutations from which to select the optimal join order. Further, if a database program has many different join algorithms, then the query optimizer will have to analyze performance not only for every possible join permutation, but also for every possible join algorithm with every possible join permutation.
Most optimizers are cost based because they operate by generating a list of access plans, comparing their costs, and then selecting a least cost plan. Current cost based query evaluation techniques experience significant difficulties when used to evaluate a query involving numerous tables because the number of permutations or orderings to consider expands exponentially as the number of tables involved in the query increases. Many of these query evaluation techniques require significant processing time and memory usage to determine the optimal search plan.
One common query evaluation plan is to use dynamic programming algorithms, which often are difficult to infeasible or extremely consuming to process if many tables, e.g., ten tables or more, are involved in the join operation. The article entitled “Optimization of Large Join Queries: Combining Heuristics and Combinatorial techniques,” by Arun Swami, in the ACM SIGMOD Record Vol. 18, No. 2, pgs. 367-376 by the Association for Computing Machinery (ACM Copyright 1989), discusses problems with dynamic programming query evaluation techniques as the number of tables involved in the query exceeds ten. This article is incorporated herein by reference in its entirety. The commonly assigned U.S. Pat. No. 5,301,317, entitled “System for Adapting Query Optimization Effort to Expected Execution Time,” which is incorporated herein by reference in its entirety includes further discussion of dynamic programming query evaluation plans and their computational complexity and performance problems.
Other query evaluation techniques employ heuristic approaches to limit the search space when selecting an optimal search. Further, certain approaches use global optimization strategies to select a strategy that matches certain predefined cr

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method, system, and program for determining the join... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method, system, and program for determining the join..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method, system, and program for determining the join... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2895652

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.