Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-09-08
2003-05-20
Rones, Charles (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C711S216000
Reexamination Certificate
active
06567817
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to data processing systems and, more particularly, to decreasing time to access data linked to hash tables.
BACKGROUND OF THE INVENTION
As processors have become faster, memory access has become a major bottleneck to overall increased performance. To improve performance, memory caching schemes have been adopted to lessen the effect of the memory access bottleneck. Direct access tables, sorted tables, and tree structured tables are conventional techniques to reduce the blockage. Also, hashing schemes are efficient in accessing stored information. Hashing schemes are useful because they provide direct access to data by mapping a large range of values into a smaller range of values. Hashing provides an efficient method of accessing stored data. Conventional hashing mechanisms utilize a data structure known as a hash table to provide access to the stored data.
One example of a conventional hashing mechanism
10
is depicted in FIG.
1
. The hashing mechanism
10
comprises a key
12
, a hashing function
14
, a hashing index
16
, and a hash table
18
. The hash table
18
contains M buckets
20
numbered 0 to M−1. Each bucket
20
contains data, such as a pointer. To access the hash table
18
, the key
12
is input into hashing function
14
which generates an index
16
that indicates a specific bucket
20
. The hashing function
14
is a very important part of the hashing mechanism. There are two principle criteria in selecting a hashing function. The hashing function should be easy and quick to compute, and it should achieve a generally even distribution of the indices across the range of indices. Numerous hashing functions meet this criteria to some extent. For example, direct, mid-square, division-remainder, folding, digit extraction, and non-numeric key hashing all meet the above criteria. With the exception of direct hashing, none of these methods use one-to-one mapping. This means that each time a new key is hashed to indicate a bucket, a collision may occur. A collision occurs when a first key and a second key both cause the hashing function to generate the same identical index. The conventional hashing mechanisms take steps to reduce the number of collisions.
A method for resolving collisions is called chaining. As compared with linear probing and secondary hashing, this method has the advantage that it can, if necessary, accommodate more pointers than there are buckets in the table
18
, though with a progressive degradation of performance as the number of pointers and keys chained to each bucket grows longer and as the linear search down such chains comes to occupy a larger fraction of the running time. Chaining requires that for each of the M buckets in the hash table, there are pointers corresponding to all of the keys that have hashed to that bucket. For example,
FIG. 2
depicts a hash table
22
that is subject to collisions. The hash table
22
contains M buckets (0 to M−1), and each bucket contains two linkages. The linkages are pointers to objects
24
and
26
that contain the first and last pointers and keys placed into the corresponding bucket. If a bucket is empty, then both the first and last linkages are nil. However, when a key is hashed into a bucket
23
, such as the bucket
21
, the first and last linkage values are adjusted to point to an object that contains a pointer and a copy of the corresponding key. If there is a collision, and accordingly there is more than one entry in a bucket, such as the bucket
25
, then the linkages are adjusted so that two objects
24
and
26
, each containing a pointer and a key, are linked to the bucket
25
. Bucket
21
contains a linkage to the object
23
, indicating the key K
3
has been hashed into the index
3
which points to that bucket. Bucket
25
is linked to the objects
24
and
26
, indicating that the keys K
4
and K
5
have been hashed into the index
9
which points to that bucket. When a key hashes into the index value
9
, the computer searches all the linked objects associated with the given bucket
25
, checking the keys, to access the proper pointer. As more collisions occur and the hash buckets become more populated, the benefits of using a hash table start to diminish. The processor must serially search the objects linked to buckets to locate the correct pointer, and this increases processor time.
The hashing function itself plays a key role in reducing the number of collisions and the length of the linkages that must be searched. A hashing function that distributes the indices relatively uniformly over the hash table reduces the probability of collisions. In the prior art, typically the index values are not uniformly distributed over their respective ranges. This non-uniformity of index values has the property of increasing the probability of collisions, thereby increasing the access time and decreasing processor performance. This undesirable property of prior art hash functions is illustrated in FIG.
7
.
FIG. 7
shows a hash table represented as a ring
92
, with the bucket
0
positioned as the uppermost entry in the ring, and with lines radiating from the perimeter of the circle representing assigned bucket linkages. The “a” indicates a relatively small distance between many nearby bucket linkages in the hash table. From the
FIG. 7
, it can be seen that the bucket linkages frequently tend to be bunched together and are non-uniformly distributed throughout the hash table. X, Y and Z indicate larger distances between groups of such object linkages, such that X, Y and Z are not equal. This nonuniform arrangement results in excess collisions and increased serial searching time. Therefore, the conventional hash mechanism must take further steps to reduce the number of collisions because of the non-uniformity of the hash function.
SUMMARY OF THE INVENTION
Briefly, the present invention comprises one embodiment for an operating system in a computer system. A virtual file management system for a computer, which generates an index to access data stored in hash table comprising: at least one device coupled to a processor, wherein each device has at least one memory unit which has been assigned an origin value; at least one allocated storage wherein each allocated storage corresponds to a particular device and each allocated storage has an identifier; a buffer hash table comprising a plurality of buckets wherein at least one bucket contains a linkage to a buffer cache; a program execution for generating an index into the buffer hash table using the identifier.
In a further aspect of the present invention, in an operating system for a computer system, a virtual file management system that allocates, whenever a mounted device requests a block of memory, an allocated storage area for each device, each allocated storage area having a unique identifier and each storage area containing device meta-data associated with a particular device, and where the operating system has at least one buffer hash array which links to a predetermined number of the allocated storage areas, a method for generating, for each the allocated storage area, an index into the buffer hash table by means of the linkages, to the corresponding allocated storage area, the method comprising the steps of: allocating an origin value for each mounted device, wherein the origin is a value representing a fractional value and the size of hash table, the fractional value has the property of being uniformly distributed over a range; operating on the allocated storage identifier with the origin value in order to produce the allocated storage unique identifier; generating an index into the buffer hash table, the index is computed by XOR the unique identifier with a block sequence number; and establishing a linkage to a bucket in the buffer hash table at the index location in the buffer hash table, if the bucket is having first been assigned a particular index value, or establish linkage chain to the correspondingly named bucket and to all other data structures having
Hewlett--Packard Development Company, L.P.
Rones Charles
Solaiman Shireen I
LandOfFree
Cache management system using hashing does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Cache management system using hashing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cache management system using hashing will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3009402