Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-04-20
2001-07-24
Choules, Jack (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
Reexamination Certificate
active
06266658
ABSTRACT:
FIELD OF THE INVENTION
This invention relates generally to the field of databases, and in particular to an index tuner given a workload of queries to a database.
COPYRIGHT NOTICE/PERMISSION
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawing hereto: Copyright© 2000, Microsoft Corporation, All Rights Reserved.
BACKGROUND
Databases have grown very large in size. When a user desires information, they issue a query, which asks for information, such as give me the names of people from Bozeman, Mont. with income greater than a certain amount. There may be a table containing a row for each person in the United States. Each row may have several columns, including income, social security number, street address, city, state, telephone number, names of dependents, occupation, etc. By searching this large table, a computer can find all the people in Bozeman, Mont., and then determine from that list of people, who has income greater than the specified amount. To do this search, the computer has to deal with the entire large table, retrieving it from storage into memory, and then searching row by row for the desired information.
One way to make this type of search more efficient, is to build indexes. An index makes it more efficient to retrieve a subset of data from a table required to answer a query. It helps identify subsets of rows from which further data may be required to answer the query. An index is a subset of a table, which typically contains fewer columns than the table itself. Indexes are sometimes created prior to a user query being made. Some indexes are arranged in a tree like structure, such as a B
+
tree, which makes finding information even faster. If an index exists that contains the data that the user desires, it is much easier to simply search the index to provide the information.
In the case of the above query, an index could be generated based on State, and then people in Montana could be quickly identified. This index would be much smaller than the table, and the answer could be provided much more quickly because the entire table does not need to be searched. An index with income can also be generated to quickly identify people with income greater than the specified amount. An index on city could also be created to quickly identify people in the city of Bozeman. In each of these cases, the table would have to be consulted absent further relevant columns in the index.
A covering index for a query may also be used. A covering index for a query is an index that is a subset of the large table, yet has at least all the columns from a table needed to answer the query. For the example query, an index having city, state, people and income columns would be a covering index because it has all the columns needed to answer the query without resort to the table. Use of this covering index would be the fastest way to answer the query. However, it is likely larger than the other indexes.
A workload is a set of queries and updates which are run against a given database. A configuration is a set of indexes used to improve performance for the workload. Given a workload of multiple queries, the decision as to which indexes to include in the configuration is very complex and time consuming. Since there is overhead associated with generating, maintaining and storing the indexes, this must be offset against the benefit obtained by using them.
A query optimizer is used to obtain information about indexes given a set of queries. Optimizers are database tools that can take a representative workload, and return information about a plan of execution of the queries in the workload, as well as information about the cost of executing the plan. The optimizer provides detailed information in the form of a tree which has nodes corresponding to a plan of execution of the query. The nodes provide information on the indexes used, and the cost of using the index. From this cost information, a user can try to determine which indexes should be built or used to enable the workload to be processed in an efficient manner.
One way of determining which set of indexes provides the most benefit given a workload is to actually propose a set of indexes, build them, and then run the workload. The total cost of the workload is then calculated. Several different sets of indexes are measured in this manner to find the best set. This is a very expensive proposition, especially when it would need to be done several times for different sets of indexes. In other prior database systems, optimizers have been run for each proposed configuration or set of indexes to try and find the best set for a given workload and database system.
This is also a costly way to try to determine which set of indexes provides the best performance of the database system. As the workload changes, the entire process must be repeated to arrive at a new set of indexes.
A further approach takes semantic information such as uniqueness, reference constraints and rudimentary statistics (“small” vs. “big” tables) and produces a database design. Such designs may perform poorly because they ignore valuable workload information. A further approach uses an expert system like approach, where the knowledge of “good” designs are encoded as rules and are used to come up with a design. This approach can take workload information into account, but is disconnected from the query optimizer. This has adverse ramifications for two reasons. First, a selection of indexes is only as good as the optimizer that uses it. In other words, if the optimizer does not consider a particular index for a query, then its presence in the database does not benefit that query. Second, the expert system operates on its own model of the query optimizer's index usage. While making an accurate model is itself hard, ensuring consistency between the tool and an evolving optimizer is a software-engineering nightmare.
In yet a further approach, an index selection tool uses the optimizer's cost estimates to compare goodness of alternative hypothetical designs. This approach avoids asynchrony between the index selection tool and the query optimizer. It further reduces the amount of indexes to consider by eliminating a large number of indexes from consideration that provide little or no benefit for any query in the workload. It also searches subsets of candidate indexes and picks a configuration with a low total cost. Multi column indexes are also generated for consideration in this approach.
There is a need for a system that can more quickly determine a set of beneficial indexes given a workload. There is a further need for a system that does not need to repeatedly call an optimizer to determine a set of beneficial indexes.
SUMMARY OF THE INVENTION
An index tuning wizard produces a fast and reasonable recommendation identifying database indexes to use given a specified workload for a given database and database server. A query optimizer is used to determine the expected usefulness of potential indexes for the specified workload by taking the cost of queries in that workload into account. A cost based pruning of indexes is then performed to provide an intermediate set of proposed indexes. The optimizer is then used again, and further pruning is done based on the expected improvement in the cost of the workload by having the index. An index is not recommended unless it has a significant impact on the workload.
A number of indexes are proposed based on the parsing of queries in the workload. A potential for improvement for a query is captured as a weight. Weights for each index are generated for corresponding queries by taking into account the cost of each query from an execution plan
Adya Atul
Agrawal Sanjay
Chaudhuri Surajit
Narasayya Vivek R.
Choules Jack
Lewis Cheryl
Microsoft Corporation
Watts, Hoffmann, Fisher & Heinke Co. L.P.A.
LandOfFree
Index tuner for given workload does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Index tuner for given workload, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Index tuner for given workload will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2442742