Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-03-18
2002-02-26
Mizrahi, Diane (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C002S004000, C002S005000, C002S006300
Reexamination Certificate
active
06351742
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to optimization in a database system.
2. Background
In a database system, optimization is the process of choosing an efficient way to execute a database query or manipulation action. Examples of such query or manipulation actions include searching, retrieving, modifying, organizing, adding, and/or deleting information from the database. These database query/manipulation actions are normally initiated by submitting commands to a database server in a database query language. One popular database query language is known as the Structured Query Language (“SQL”). For the purposes of explanation only, and not by way of limitation, the following description is made with particular reference to database statements involving SQL.
To execute a database query language statement (e.g., a SQL statement), the database system may have to perform steps involving the retrieval or manipulation of data from various database structures, such as tables and indexes. Often, there exists many alternate ways to execute the SQL statement. For example, a single SQL statement can be executed in different ways by varying the order in which tables and indexes are accessed to execute the statement. The exact combination and order of steps taken to execute the SQL statement can drastically change the efficiency or speed of execution for the statement. The combination and order of steps that are used to execute a SQL statement is referred to as an “execution plan.”
As an example, consider the following SQL statement, which queries for the name of all employees having a salary equal to 100 from a database table “emp_table”:
SELECT employee_name
FROM emp_table
WHERE salary=100
A first execution plan could include the step of performing a full table scan of emp_table to execute the query. This first execution plan would retrieve every row from emp_table to identify particular rows that match the WHERE clause. Alternatively, if an index exists for the “salary” column of emp_table, then a second execution plan could involve accessing the index to identify rows that match the WHERE clause, and thereafter retrieving only those identified rows from the table. The index is considered an alternate “access path” to the data sought by the SQL statement.
Each execution plan has a “cost” that is associated with its execution. The cost of an execution plan can be expressed in terms of the resources that are consumed to execute the SQL statement using that execution plan. For example, the cost of an execution plan can be expressed in units of I/O usage, CPU usage, network usage, memory usage, or a single numerical value that combines several of these units.
An “optimizer” is used by a database system to choose what is believed to be the most efficient execution plan for a SQL statement. A “cost-based” optimizer bases its decision upon the costs of each execution plan. The cost-based optimizer typically generates a set of potential execution plans for the SQL statement based upon available access paths for the data sought to be operated upon by that statement. The cost is then estimated for each execution plan based upon, for example, data distribution and storage characteristics for database structures holding relevant data for the SQL statement. The optimizer then compares relative costs of the execution plans to choose the one with the smallest cost.
The cost-based optimizer may use statistics to estimate the cost of the execution plans. Statistics are used to quantify the data distribution and/or storage characteristics of data in database structures. For example, with reference to the SQL statement example set forth above, statistics may be kept for the distribution of values in the “salary” column of the table “emp_table.” Selectivity estimates can be performed by taking into account the data skew of data values. Selectivity is normally calculated with reference to the statistics, and can be stated as the percentage of entries within a schema object that satisfies a given predicate.
The cost of an execution plan can be estimated based upon the statistics and selectivity associated with terms within the SQL statement predicate. As an example, consider if an index exists upon the “salary” column for the above SQL statement example. If so, then the following is an example of a cost calculation that can be used to estimate the cost of an execution plan that uses an index to execute the above SQL statement:
COST=(cost of access for a single row)*(selectivity)*(number of rows in table)+(cost of index access)
An example of a cost calculation for an execution plan that performs a full table scan is expressed as follows:
COST=(cost of access for a single row of table)*(number of rows in table)
Based upon such cost calculations, an optimizer can make a determination as to which of these execution plans is relatively less costly.
Optimization of a database statement normally occurs at compilation time. If a predicate in the database statement contains one or more arguments, then the values of the arguments must be known at compile-time for optimizers to make an effective estimation of selectivity for the predicate, and thus the cost for the database statement. Unfortunately, if the values of one or more arguments are not known at compile-time, then these values cannot be passed to the optimizer. Without this information, conventional optimizers have great difficulty generating accurate predicate selectivity or cost for the database statement.
Consider the following SQL statement:
SELECT *
FROM emp_table
WHERE emp_number<:x
This database statement queries for all entries from table emp_table in which the emp_number column is less than the value of a bind variable “:x”. The value of a bind variable is not necessarily known at compile-time, but instead becomes known at run-time. Since optimization occurs at compile-time, the optimizer thus does not always know the value of the bind variable :x. Therefore, it becomes difficult to estimate a selectivity value for emp-number<:x, which greatly affects the optimizer's ability to select an optimal execution plan (e.g., one that uses either an index scan or table scan).
One approach that can be taken to estimate a selectivity value is to employ a default selectivity value if a bind variable is encountered in a database statement. Typically, the default selectivity value is used regardless of exact contents or operator of the database statement. The optimizer would use the default selectivity value to compute a cost estimate for an execution plan involving that database statement. However, using a default value to calculate the cost of an execution plan may result in what is at best a gross approximation of the true cost of the execution plan. Furthermore, such default values do not take into account any parameters that may be passed to a database statement predicate. This may result in the selection of an execution plan that has a significantly higher true cost than other execution plans that could have been chosen.
An alternate approach is to heuristically guess a selectivity value. For example, a small value for the selectivity of a database statement can be assumed if the involved column is indexed. This assumption is typically employed whenever a bind variable is used as a boundary value in a condition with one of the following operators:<,>,<=, or >=. The optimizer would heuristically guess a small selectivity value for indexed columns to favor the use of indexes. The drawback to this approach is that the assumptions used in the heuristic guess do not always match up to real world conditions. For example, the assumption that an execution plan involving an index access is always more efficient than alternate execution plans may be grossly incorrect for a given query of an indexed column, depending upon the statistics for that column and the exact values of the arguments involved.
Therefore, there is a need for a method and mechanism that can
Agarwal Nipun
Das Dinesh
Srinivasan Jagannathan
Mizrahi Diane
Oracle Corporation
LandOfFree
Method and mechanism for database statement optimization does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and mechanism for database statement optimization, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and mechanism for database statement optimization will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2939458