Method and data processing system for hashing database...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06449613

ABSTRACT:

FIELD OF THE INVENTION
The present invention generally relates to the art of data storage and retrieval, and more specifically to a physical address on a storage medium during such storage and retrieval.
BACKGROUND OF THE INVENTION
One well known type of data organization and access method for rapidly accessing data stored in main memory or in a file is “Hashing”. It is in particular used heavily in database systems to efficiently access records. It operates by extracting one or more fields, usually from the record, to form a “Hash Key”. Then, a function (“Hash Function”) is applied to the hash key to identify a “Hash Bucket”. If “K” represents the Hash Key, “F” represents the “Hash Function, and “B” represents a Hash Bucket, then B=F(K).
The physical disk sectors of Database files are typically grouped together in Pages. Physical and logical references to these files are done by accessing these Pages. Databases which are accessed via a Hash function have zero, one, or more Hash Buckets associated with each Page which holds data. The methods proposed by this invention can be extended to cover the cases other than one Hash Bucket per page.
At its simplest, especially when utilizing hashing in main memory, a contiguous series of Hash Buckets or Hash Entries form a Hash Array, with the Hash Bucket number or Hash Table Index computed by applying the Hash Function to the Hash Key being used to index into the Hash Array.
The problem is a little more complex when dealing with databases and hash files. In those instances, each Hash Bucket can typically contain multiple records, and Hash Buckets are organized into a Hash Table. In order to determine whether a record with a specified Hash Key exists in a database or hash structure, the corresponding Hash Bucket can be computed by applying the Hash Function to the Hash Key for the record. Then, the Hash Bucket is searched for the record containing that Hash Key.
A problem arises however when an entry in a Hash Array or a Hash Bucket fills up. When a subsequent record hashes to the same Hash Entry or Hash Bucket, you get what is termed a “Collision”. A number of different algorithms have been developed to address this Collision problem. For example, overflow pages or blocks can be chained to the Hash Bucket. Alternatively, the record can be stored in the next available Hash Bucket that has room. Then, when searching for a record with a given Hash Key, the search starts at the Hash Entry or Hash Bucket addressed by the Hash Function applied to the relevant Hash Key. The search progresses through the Hash Table, and ends in failure when a record is not found in a Hash Entry or Hash Bucket that is not full.
This later method works well in situations, such as compiler symbol tables, where there are insertions into a Hash Table, but not deletions. It fails however when there are deletions, as in the typical database, since deletions create holes, which could prematurely terminate the search for a matching Hash Key in failure.
Another problem that arises in Hashing when dealing with databases and files is when a Hash Table contains discontiguous blocks of Hash Buckets. There may be some pages uniformly dispersed throughout the hash table that may not be capable of containing data records or Hash Buckets. This can happen when, for example, space control information for managing the file content is located on pages spread uniformly through the file. These pages will be referred to as Space Control Pages for the purposes of this disclosure. This application is related to our copending patent applications assigned to the assignee hereof.
There is a related problem when the Hash Buckets do not start at the first of the space used for hashing. The problem is that Hash Functions typically generate a continuous range of Hash Bucket number. For example, if the Hash Function involves dividing the Hash Key by a specified prime number, and using the remainder as the Hash Bucket number, then the resulting Hash Bucket numbers will typically comprise all of the integers between 0 and Prime−1. However, information containers such as files typically contain header information which takes pages at the beginning of the file.
The problem that arises here is that when the Hash Table contains holes (and thus discontiguous blocks of Hash Buckets), including a hole at the beginning, the situation must be handled when a Hash Key hashes to one of the holes. The typical overflow procedure of going to the next open Hash Bucket is suboptimal. This suboptimality results in reduced performance. One reason for this suboptimality is that the next Hash Bucket after each hole would become overloaded with hash entries, since that Hash Bucket must not only support and contain the records containing Hash Keys that hash to that Hash Bucket, but also all of those in the preceding hole. The bigger the “hole” in the Hash Table, the worse the problem. For example, if a Space Control Page is the same size as a Hash Bucket, then the next Hash Bucket after such a Space Control Page would fill up twice as fast as the other Hash Buckets using such an overflow procedure. Similarly, if a Space Control Page is twice the size of a Hash Bucket, then the next Hash Bucket after such a Space Control Page would fill up three times as fast as the other Hash Buckets using such an overflow procedure.
One solution to these problems is found in U.S. Pat. No. 5,579,501, incorporated by reference herein. The solution embodied in that patent allows the direct computation of the table page which contains the hash bucket. It performed this computation using integer arithmetic and involves several computational steps. The present invention involves fewer computational steps in all cases.


REFERENCES:
patent: 5579501 (1996-11-01), Lipton et al.
patent: 5893086 (1999-04-01), Schmuck et al.
patent: 5940838 (1999-08-01), Schmuck et al.
patent: 6023706 (2000-02-01), Schmuck et al.
“Algorithms”, 12th Printing Author: Cormen, Leiserson & Rivest, c 1994McGraw-Hill Book Co.
“Fundamentals of Database Systems”, 2nd Edition Author: Elmasri/Navathe, c1994 The Benjamin/Cummings Publishin Co.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and data processing system for hashing database... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and data processing system for hashing database..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and data processing system for hashing database... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2849841

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.