Apparatus and method for determining clustering factor in a...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06785684

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
This invention generally relates to computer systems and more specifically relates to databases in computer systems.
2. Background Art
Since the dawn of the computer age, computers have evolved and become more and more powerful. In our present day, computers have become indispensable in many fields of human endeavor including engineering design, machine and process control, and information storage and access. One of the primary uses of computers is for information storage and retrieval.
Database systems have been developed that allow a computer to store a large amount of information in a way that allows a user to search for and retrieve specific information in the database. For example, an insurance company may have a database that includes all of its policy holders and their current account information, including payment history, premium amount, policy number, policy type, exclusions to coverage, etc. A database system allows the insurance company to retrieve the account information for a single policy holder among the thousands and perhaps millions of policy holders in its database.
Databases generally contain one or more indexes that make searching the database for information much more efficient than performing a full database search for every query. The performance of a database system is dependent on the performance of a paged memory system that swaps pages from disk to a buffer. If the order of keys in a particular index is close to the physical order of the keys in the database table, the performance of the memory paging system using this index will be improved because many accesses will likely be made to the page buffer without performing page swaps. A statistical measure of the correlation of a column in the database to the corresponding data in physical storage is known as “clustering factor”. The clustering factor indicates the degree to which the data in the physical storage is clustered (i.e., close together) in physical storage.
Clustering factor in the prior art is typically computed as a function of the size of a memory page and the size of the page buffer. Making this computation is relatively straightforward when the size of the page buffer is known. The size of a page buffer is generally known for virtual memory systems that specify a virtual size for the buffer. However, some computer platforms, such as the IBM iSeries 400, do not have a virtual memory system that provides a fixed-sized page buffer, but instead have a single-level store. With a single-level store, the address space of the processor must be shared among the operating system and all applications. For this reason, it is impossible to set a fixed size for the page buffer, because the size can vary and even change dynamically as system requirements change. Without an apparatus and method for determining clustering factor in a database that has a variable-sized page buffer, the clustering factor for indexes will be unavailable for some types of computer platforms, making it difficult to optimize database performance based on clustering factor.
DISCLOSURE OF INVENTION
According to the preferred embodiments, an apparatus and method perform block-level sampling on a database, process the data to generate one or more matrices, and process the one or more matrices to generate a clustering factor for a selected index. In addition, the apparatus and method of the preferred embodiments allow the distribution of the clustering factor to be determined across a range, thereby allowing the identification of ranges where the clustering factor is high and ranges where the clustering factor is low. The clustering factor distribution can then be used to predict the memory paging performance of a search that uses an existing or a potential index that corresponds to the sampled data, and can therefore be used to predict the performance of searching the database using an existing or potential index for a particular database query.
The foregoing and other features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings.


REFERENCES:
patent: 5926812 (1999-07-01), Hilsenrath et al.
patent: 6049797 (2000-04-01), Guha et al.
patent: 6233571 (2001-05-01), Egger et al.
patent: 6347313 (2002-02-01), Ma et al.
patent: 6374251 (2002-04-01), Fayyad et al.
patent: 6389436 (2002-05-01), Chakrabarti et al.
patent: 6519591 (2003-02-01), Cereghini et al.
patent: 6529891 (2003-03-01), Heckerman
patent: 6591007 (2003-07-01), Petkovic et al.
patent: 6633882 (2003-10-01), Fayyad et al.
patent: 6654743 (2003-11-01), Hogg et al.
patent: 6728728 (2004-04-01), Spiegler et al.
Fahy, “Computer systems and methods for hierarchical cluster analysis of large sets of biological data including highly dense gene array data”, US Patent Application Publication, May 2002, pp. 1-21.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Apparatus and method for determining clustering factor in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Apparatus and method for determining clustering factor in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for determining clustering factor in a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3350944

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.