Method and system for dynamically managing hash pool data...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06748401

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to dynamically managing hash pool data structures. More particularly, the present invention relates to dynamically adding index and contention records to hash data pools in response to data load and data distribution.
BACKGROUND OF THE INVENTION
Hashing is a technique for storing and retrieving data records. Hash functions provide quick access to individual data cells in a group, via a hash table, without linear searching. Therefore, hashing is especially beneficial for computational problems that grow into a very large scale in terms of data volume. A variety of hash functions can be implemented in order to minimize collisions between key values and to minimize processing time for a particular group of data key values. The decision to use a particular hash function is implementation dependent. A collision between key values occurs when the output of the hash function is the same for two or more records. When a collision occurs, one of the key values might, based on bucket size, be stored in another location and not the location indicated by the hash function. Collision resolution management refers to the process of determining where the other location is and tying the key value into the hash pool data structure in a manner that is accessible at a later point in time.
If the hash table is very large, it may not fit into available memory. A large hash table may be kept on a disk storage device such as DASD (Direct Access Storage Device). When the hash table is kept on disk storage the hash function or the operating system, in response to a request from the hash function, reads a portion of the entire hash table and must manage the process of paging the hash table. When this happens, the speed of processing can be diminished because of the extra time for the Input/Output (I/O) operations and the extra processing overhead required to manage the hash table. When the hash records are stored on external storage devices such as DASD, it is important to minimize the number of accesses to the hash records. When the hash records are stored on DASD the hash function may be compute intensive with the goal of minimizing the number of I/O operations required to insert or retrieve a record from the hash pool data structure. Hash technique exists that keep a small piece of information in memory to quickly find the hash table entry. If the hash table is very large even these techniques exhaust main memory so this associated information must also be placed on DASD which increases the I/O required.
An index can be used to reduce the required hash table space. This option requires at least one extra I/O operation. The benefit is a reduction in the number of required hash table records that must be pre-allocated. The number of “slots” that can be contained in a hash table record determines how much the hash table space can be reduced. If a four-thousand and ninety-six byte (4K) record size is used, and an allowance is made to use ninety-six bytes for system overhead, the 4K record could hold five-hundred eight byte storage addresses. In this scenario, the use of a one level index could reduce the required number of pre-allocated hash table records by a factor of five hundred. A second level index could reduce the number of pre-allocated hash table records by another factor of five-hundred, resulting in a reduction of two-hundred and fifty thousand. Similarly, a third level index reduces the requirement by a factor of one-hundred and twenty-five million but adds three required I/O operations in retrieving a data record if the index records are stored on DASD.
Contention of keys to the same hash slot can be handled with record chaining. A look-up routine can detect that there is more than one record and can examine the contents of the record for an exact match. The average number of I/O operations required in chaining can be expressed by the equation “(k+1)/2” where “k” is the length of the chain. The chain can be implemented using a linked list that connects the records together. After a hash function is performed on a key value, a sequential search is performed on the linked list until either the key value is found or the spot to insert the key value is located. The linked list could be sorted in terms of frequency of access or alphabetically to expedite the search.
Contention of names can also be handled with contention records. A look-up routine can detect that the record is a contention record and that the record contains a table of key values and record addresses. The look-up routine can go through the record looking for a match of the key value, and if found, can use the record address that corresponds to the key value to locate a user data record. Any of the above methods: indexes, contention records and chaining; can be utilized when contention is encountered in a hash pool data structure. Which one is selected depends on balancing factors that can include the number of I/O operations, the required storage space, the number of hash table entries, and the required speed of data access.
SUMMARY OF THE INVENTION
An exemplary embodiment of the present invention is a method for dynamically managing a hash pool data structure. A request to insert a new key value into a hash pool data structure that includes at least one index level is received. An insertion location is calculated for the new key value in response to the new key value and to existing key values in the hash pool data structure. The insertion location includes an index level. A new index level is added at the insertion location if the index level is not the maximum number of index levels in the hash pool data structure; if the insertion location contains a chain of existing key values with a length equal to the maximum chain length; and if the new index record locations of the new key value and the existing key values are dispersed. The insertion location is updated in response to adding a new index record and the new key value is inserted into the insertion location. An additional embodiment includes a storage medium for dynamically managing a hash pool data structure.


REFERENCES:
patent: 5276878 (1994-01-01), Sutton et al.
patent: 5410663 (1995-04-01), Blackburn et al.
patent: 5604879 (1997-02-01), Beavers et al.
patent: 5809494 (1998-09-01), Nguyen
patent: 5893086 (1999-04-01), Schmuck et al.
patent: 5911144 (1999-06-01), Schwartz et al.
patent: 6014733 (2000-01-01), Bennett
patent: 6067547 (2000-05-01), Douceur
patent: 6115802 (2000-09-01), Tock et al.
patent: 6122375 (2000-09-01), Takaragi et al.
patent: 6212525 (2001-04-01), Guha
patent: 6216214 (2001-04-01), Bryg et al.
patent: 6226629 (2001-05-01), Cossock
patent: 6226639 (2001-05-01), Lindsay et al.
patent: 6230231 (2001-05-01), DeLong et al.
patent: 6516320 (2003-02-01), Odom et al.
patent: 6630946 (2003-10-01), Elliott et al.
patent: 0 066 766 (1982-12-01), None
patent: 0 066 766 (1988-08-01), None
Title: “Extendible Chained Bucket Hashing for Main Memory Databases”; Published by: International Journal of Applied Software Technology, vol. 1, No. 2, pp. 123-125; Author: Pyung-Chul Kim, Kee-Wook Rim, Jin-Pyo Hong.
Title: “Linear Spiral Hashing for Expansible Files”, Published by: IEEE Transactions on Knowledge and Data Engineering, vol. 11, No. 6, pp. 969-984, Nov./Dec. 1999, Authors: Ye-In Chang, Chien I Lee, Wann-Bay ChangLiaw.
Title: “Climbing Hashing for Expansible Files”, Published by: Information Sciences, vol. 86, No. 1-3, pp. 77-99, Sep. 1995, Authors: Ye-In Chang, Chien-I Lee.
Title: “Multikey, Extensible Hashing for Relational Database”, Published by: IEEE Software, vol. 5, No. 4, pp. 77-85, Jul. 1988, Authors: Keith L. Kelley and Marek Rusinkiewicz.
Title: “Linear Hashing with Separators- A Dynamic Hashing Scheme Achieving One-Access Retrieval”, Published by: ACM Transactions on Database Systems, vol. 13, No. 3, pp. 366-388, Sep. 1988, Author: Per-Ake Larson.
Title: “Linear Hashing with Overflow-Handling by Linear Probing”, Published by; ACM Transactions on Database Systems, vol. 10, No. 1, pp

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for dynamically managing hash pool data... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for dynamically managing hash pool data..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for dynamically managing hash pool data... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3295816

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.