Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-06-04
2002-07-16
Mizrahi, Diane (Department: 2771)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
06421662
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to database management systems and more particularly to generating and implementing indexes based on criteria set forth in queries.
BACKGROUND OF THE INVENTION
Relational databases store information in indexed tables that are organized into rows and columns. Each row in the table represents an individual record. For example, a table that stores demographic records may contain columns that store: social security number, age, gender, marital status, number of children, income range, etc. Each row in the table stores the same column information for each individual identified by social security number. A user retrieves information from the tables by entering a request that is converted to queries by a database application, which then submits the queries to a database server. In response to the queries, the database server accesses the tables specified by the query to determine which information within the tables satisfies the queries. The information that satisfies the queries is then retrieved by the database server and transmitted to the database application and ultimately to the user.
For any given database server, the queries must conform to the rules of a particular query language. One popular query language is known as the Structured Query Language (“SQL”). SQL provides the user with the ability to generate complex queries that can be used to retrieve specific information from the tables.
In a typical database system, data is stored in a table in an unordered form. As records, i.e., rows are entered into a table, they are inserted into the next available location. Such a location can often be, relative to the location of the previous record, at a non-contiguous storage sector of a persistent storage device, such as a fixed disk drive or optical device. Over time, as records are added or dropped, the physical arrangement of the data in the persistent storage device usually does not correspond to the order, if any, of the rows in the table. The data for consecutive rows may appear to be randomly spread over a number of blocks in the persistent storage device. Consequently, it is not always possible to directly access or retrieve the record or range of records that satisfy a given set of search criteria. For example, in order to locate all rows in a table that have a given value in a column A, every row of the table must be fetched and column A of each row examined. Even when a row with the target value in column A is found, the remainder rows in the table must be fetched and column A examined, unless the values in column A are unique.
Another problem associated with data retrieval is that, typically, data for a particular row is stored in one or more units of contiguous blocks in a persistent storage device. A block is the smallest quantity of data that can be read from a persistent store into dynamic memory. If a database system requires any information stored in a particular block, the database system must read the entire block into memory. To retrieve values for a target column of a table, the database system must read the entire block or all the blocks that have any data from that column of the table, rather than reading only that portion of the block or blocks that contain values from the target column of the table. Since values for the target column may be present in all or almost all the blocks of a table, the entire table or significant portion thereof must be read into memory in order to retrieve the column values. In such a case, the database server, in response to a query, performs a “full table scan” by fetching every row of the table and examining the column or columns referenced in the search criteria specified in the query. This retrieval can be very costly because, if the amount of the data for the columns not used in the query is very large, then the fill table scan methodology becomes very inefficient due to the unnecessary amount of disk input/output.
Accordingly, in one approach to improving the efficiency of data retrieval, database systems provide indexes to increase the speed of the data retrieval process. A database index is conceptually similar to a normal index found at the end of a book, in that both kinds of indexes comprise an ordered list of information accompanied with the location of the information. Values in one or more columns of a table are stored in an index, which is maintained separately from the actual database table (the underlying base table). The ordered list of information in an index allows for quick scanning to find a target value or range of values. Moreover, since a conventional index stores only the values from one or more columns that serve as the key to the index, the amount of data being read into memory is significantly reduced as compared to a full table scan.
The above approach to improving the efficiency of data retrieval works well for simple queries, for example, queries that specify search criteria based on only one or two columns of a table. Such queries can be answered using a small number of indexes built on the appropriate columns. However, to process a complex query where the search criterion is a complex expression that involves values from many columns of a table, it may be necessary to maintain multiple indexes, one for each of the columns referenced in the query, or in the alternative, a large complicated index.
Referring to
FIG. 1A
, an exemplary Table
100
, consisting of 100,000 rows, contains data on a set of individuals. The first column
102
in
FIG. 1A
is “Citizen ID” and contains unique key values. The Citizen ID numbers range from 0 to 99,999. Thus each row can be identified by its corresponding unique Citizen ID value. Apart from the unique key value column, the other columns are State
104
, County
106
, name
108
, age
110
, hair color
112
, gender
114
, marital status
116
, number of offspring
118
, and annual salary
120
indicators. In order to most efficiently process a query that specifies the search criteria of “all men over 30 years of age, with brown hair, who are married and who have 5 children”, indexes would be maintained for the column values in the gender, age, hair color and number of children columns. The disadvantage of maintaining multiple indexes is that each time a row of data is added, deleted, or changed in the base table, the associated indexes have to be updated. The cost of maintenance of the indexes can become expensive. The alternative to maintaining multiple indexes that will satisfy the search criteria of a complex query, such as the example above, is to maintain a single complicated composite index built on the specific combination of columns referenced in the complex query. It is unlikely that a composite index will be built on the exact combination of columns that is targeted by a given query. Moreover, multiple composite indexes built on different combinations of columns in the table may have to be maintained in anticipation of the search criteria of a given query. Again, the cost of maintenance of such an index is significant.
There exists a need to process such a complex query more efficiently. There also exists a need for reducing the amount of data read from a persistent storage device when retrieving information from a table.
SUMMARY OF THE INVENTION
There is a need for improving the efficiency of retrieving values from a column or columns of a table in a database. There also exists a need for reducing the amount of data read from a persistent storage device when retrieving information from a table.
These and other needs are addressed by the present invention, which generates one or more indexes for use in processing a query by selecting values as unique keys from one or more columns in a table in a relational database system. Each key corresponds to a row in the table. The condition of whether a row in a table satisfies a search criterion, is represented by a bit in a bit sequence or bitstring. The unique keys and corresponding bitstring are then stored in an index.
Still other objects and advantage
Hickman Brian D.
Hickman Palermo & Truong & Becker LLP
Mizrahi Diane
Oracle Corporation
Tan Carina M.
LandOfFree
Generating and implementing indexes based on criteria set... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Generating and implementing indexes based on criteria set..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Generating and implementing indexes based on criteria set... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2886452