Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-08-30
2003-12-23
Breene, John (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C709S226000
Reexamination Certificate
active
06668263
ABSTRACT:
A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent Office patent files or records, but otherwise reserves all copyrights whatsoever.
FIELD OF THE INVENTION
The present invention relates to database management systems and more particularly to a method and system for searching for free space in a table of a relational database management system, where the table has a clustering index defined on at least one or more of its columns.
BACKGROUND OF THE INVENTION
A database management system (DBMS) includes the combination of an appropriate computer, direct access storage devices (DASD) or disk drives, and database management software. A relational database management system is a DBMS that uses relational techniques for storing and retrieving information. The relational database management system or RDBMS comprises computerized information storage and retrieval systems in which data is stored on disk drives or DASDs for semi-permanent storage. The data is stored in the form of tables which comprise rows and columns. Each row or tuple has one or more columns.
A typical database management system includes both database files and index files. The database files store data in the rows and columns of tables stored on data pages. In such a table, the rows may correspond to individual records while the columns of the table represent attributes of the records. For example, in a customer information table of a database management system, each row might represent a different customer, while each column represents different attributes of the customers such as the name of each customer, the amount owed by each customer and the cash receipts received from each customer.
Instead of providing for direct sorting and searching of the records in the tables, the database management system relies on the index files, which contains information or pointers about the location of the records in the tables stored in the database files. The index can be searched and sorted (scanned) much more rapidly than can the database files. An index is scanned through transactions in which criteria are stipulated for selecting records from a table. These criteria include keys, which are the attributes by which the database finds the desired record or records using the index.
An index for a table is said to be “clustered” when the rows of the underlying table are physically stored so that each data page contains all rows, and only those rows, of a key interval of the index. Referring to
FIG. 22
, there is illustrated an index on unclustered data.
FIG. 23
, shows a B-Tree index for the same data; however, in
FIG. 23
the data has been clustered. The differences between clustered and unclustered indexes can be seen by focusing on the key value “N”. In both the B-Tree indexes of
FIG. 22 and 23
, the leaf nodes of the B-Tree indexes point to four records in the underlying tables; these records are identified by the record identifiers of four keys. The record identifiers are rid
1
, rid
2
, rid
3
and rid
4
. In the unclustered B-Tree index of
FIG. 22
, the key interval <N,rid
1
> to <N,rid
2
> is stored on two different data pages—specifically data pages
1
and
2
. Similarly, the key interval <N,rid
3
> to <N,rid
4
> are stored on data pages
2
and
3
. This contravenes the definition of a clustered index that the rows of the underlying table be physically stored so that each data page contains all rows, and only those rows of a key interval of the index. Turning to the B-Tree index and underlying data pages of
FIG. 23
, the same data is shown clustered. Specifically, the key interval <N,rid
1
> to <N,rid
2
> points to rows that are stored on data page
2
, and the key interval <N,rid
3
> to <N,rid
4
> points to data that is stored on data page
3
. Note that the records need not be adjacently located on the same data page. While the records underlying <N,rid
3
> and <N,rid
4
> both lie on data page
3
, they are separated by a record having an attribute value equal to “P”.
Even if an index is clustered to begin with, the index and table may become disordered as a result of rows being added or deleted from the table. Specifically, if a new data row is to be added to a table, then this data would preferably be added on the data page for the key interval of the index in which the data row falls. However, there may be no or insufficient free space on this data page to accommodate the new record, and the data row may, as a result, have to be stored on another data page on which sufficient free space is available, or appended to the end of the table. Accordingly, even if an index is clustered to begin with, it may become unclustered, at least to some extent, as additional information is added. The extent to which a clustered index has become unclustered is measured by the cluster factor; the cluster factor is measured on a scale of 0 to 1, where 1 means the index is fully clustered.
A clustering index is an index whose cluster factor is, as much as possible, maintained or improved dynamically as data is inserted into the associated table. The cluster factor is maintained or improved by inserting new rows as physically close to the rows for which the key values of this index are in the same range, as is possible. Having a high cluster factor—a cluster factor close to 1—increases data retrieval efficiency when such retrieval involves accessing sequential ranges of that index's values.
When inserting a row into a table in a relational database management system (RDBMS), a search for free space must be done to find space for the row being inserted. Free space maps are used in the art to keep track of the free space that is available in a table. A free space map is a map or directory of the pages in a table, together with an approximation of the amount of the free space on each of these pages. This enables the pages that might have enough space for a record to be quickly looked for, without requiring the page to actually be fixed.
A free space map spans a plurality of free space map pages when it is too large to accommodate on a single page. A free space map page is one of the plurality of free space map pages containing a portion of the map. Each free space map page includes an array of free space entries for a series of pages that form part of the table. By fixing one free space map page, it is possible to scan through space information from many pages in the actual data files, in order to find a page that potentially has enough space for a record to be inserted. Each entry in the free space map page indicates the approximate amount of free space available in the page index in the array. When a row is to be inserted, the free space map is searched for a data page with sufficient free space to hold the record. The search usually starts at the beginning of the free space map, or from the position where space was last found in this free space map. In either case, the entire free space map is usually searched until a data page with sufficient free space is found or until the entire free space map has been searched. If a page with free space is found, the row is inserted on to the page. If no free space is found, the row is appended to the table.
A record, which is being inserted into a table having a clustering index defined on A one or more of its columns, should preferably be inserted on the target page in order to maintain or improve the cluster factor. The target page is the ideal page on which to insert a new data record into a table that has a clustering index, as this page contains the same or next key value in the table as that being inserted. However, this may not always be possible as the target page may have insufficient free space to accommodate the record. If there is insufficient space for the record on the target page, then the record must
Cranston Leslie A.
Hing Nelson Hop
Huras Matthew A.
Lindsay Bruce G.
Winer Michael J.
Ali Mohammad
Breene John
International Business Machines - Corporation
Konrad Raynes & Victor & Mann LLP
Victor David W.
LandOfFree
Method and system for efficiently searching for free space... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for efficiently searching for free space..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for efficiently searching for free space... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3128870