Electrical computers and digital processing systems: memory – Address formation – Hashing
Reexamination Certificate
1999-04-27
2003-06-10
Ellis, Kevin L. (Department: 2188)
Electrical computers and digital processing systems: memory
Address formation
Hashing
C711S170000
Reexamination Certificate
active
06578131
ABSTRACT:
TECHNICAL FIELD
This invention relates to memory systems and hash tables used in the memory systems. More particularly, this invention relates to shared-memory multiprocessor systems that utilize hash tables to facilitate data access.
BACKGROUND
Hashing is an efficient and popular technique for fast lookup of items based on a key value. Most research on hash tables has focused on programs with a single thread of control. However, many modern applications are multithreaded and run on multiprocessor systems. Server applications such as web servers, database servers, and directory servers are typical examples. Server applications often make use of one or more software caches to speed up access to frequently used items. The number of items in a cache may very greatly, both over time and among installations.
Hashing is often used to provide fast lookup of items in a cache. In this context, the hash table becomes a shared, global data structure that should grow and shrink automatically with the number of items and that must be able to handle a high rate of concurrent operations (insert, lookup, and delete), all without wasting memory.
The inventors have developed a hashing scheme designed to meet these requirements.
Some programmers still believe that hash tables have to be of fixed size. That is, the table size has to be determined in advance and stays fixed thereafter. In the late 1970s, several researchers proposed schemes that allowed hash files to grow and shrink gradually in concert with the number of records in the file. Two methods—linear hashing and spiral storage—were subsequently adapted for main-memory hash tables. The system developed by the inventors uses hashing techniques that are based on linear hashing. Accordingly, spiral storage is not addressed in this disclosure.
Linear Hashing
A higher load on a hash table increases the cost of all basic operations: insertion, retrieval, and deletion. If performance is to remain acceptable when the number of records increases, additional storage must somehow be allocated to the table. The traditional solution is to create a new, larger hash table and rehash all the records into the new table. Typically, the new hash table is twice the size of the old one.
In contrast to the traditional solution, linear hashing allows a smooth growth in the table size. The table grows gradually, one bucket at a time, rather than doubling in size. When a new bucket is added to the address space, a limited local reorganization is performed. Linear hashing was developed by W. Litwin for external files (see, Linear Hashing: A new tool for file and table addressing, Proceedings of the 6
th
Conference on Very Large Databases (VLDB '81), 1981, pgs. 212-223) and adapted to in-memory hash tables by P.-Å. Larson (see, Dynamic Hash Tables, Communications of the ACM, Vol. 31, No 4, 1988, pgs. 446-457).
To briefly describe linear hashing, consider a hash table consisting of N buckets with addresses 0, 1 . . . N−1. Linear hashing increases the address space gradually by splitting the buckets in a fixed order: first bucket
0
, then bucket
1
, and so on, up to and including bucket N−1. When a bucket is split, about half of its records are moved to a new bucket at the end of the table.
FIG. 1
illustrates the splitting process in linear hashing for an example table
20
with five buckets (N=5). A pointer p keeps track of the next bucket to be split.
FIG. 1
shows the table
20
at four different growth stages: A, B, C, and D.
At stage A, the first bucket
0
is split, with part of its records being transferred to new bucket
5
. At stage B, the second bucket
1
is split and some of the records are moved to new bucket
6
. Stage C shows splitting the fifth and last of the original buckets—bucket
4
(i.e., N−1)—and migrating some of the records to new bucket
9
.
When all N buckets have been split and the table size has doubled to 2N, the pointer p is reset to zero and the splitting process starts over again, as shown in stage D. This time, the pointer travels from 0 to 2N−1, doubling the table size to 4N. This expansion process can continue as long as is required.
FIG. 2
illustrates how each bucket is split. In this example, bucket
0
of the exemplary five-bucket hash table
20
is split. The hash table is illustrated both before and after expansion to a sixth bucket.
An entry in the hash table
20
contains a single pointer
24
, which is the head of a linked list
26
connecting all records that hashed to that address. When the table is of size five (i.e., five buckets), all records are hashed by the function h
0
(K)=K mod
5
. Once the table size has doubled to ten (i.e., ten buckets), all records will be addressed by the function h
1
(K)=K mod
10
. However, as illustrated in
FIG. 1
, linear hashing allows the table to expand one bucket at a time rather than doubling the table size immediately.
For this example, consider keys that hash to bucket
0
. Under the hashing function h
0
(K)=K mod
5
, the last digit of the key must be either 0 or 5. The linked list
26
shows records that end in either 0 or 5. Under the hashing function h
1
(K)=K mod
10
, keys with the last digit equal to 0 still hash to bucket
0
, while those with the last digit equal to 5 hash to bucket
5
. None of the keys hashing to buckets
1
,
2
,
3
, or
4
under function h
0
can possibly hash to bucket
5
under function h
1
.
To expand the table, a new bucket (with address
5
) is allocated at the end of the table and the pointer p is incremented by one. The process scans through the records of bucket
0
and relocates to the new bucket
5
all records that hash to
5
under h
1
(K)=K mod
10
. In this example, records “
345
” and “
605
” are transferred from bucket
0
to new bucket
5
. The records in buckets
1
-
4
remain unaffected.
As the table size changes, record addresses for various records affected by the expansion (or contraction) also change. The current address of a record can, however, be computed quickly. Given a key K, a value for h
0
(K) is computed. If h
0
(K) is less than the current value of p, the corresponding bucket has already been split; otherwise, the corresponding bucket has not been split. If the bucket has been split, the correct address of the record is given by h
1
(K). When all original buckets
0
-
4
have been split and the table size has increased to ten, all records are addressed by h
1
(K), which becomes the new h
0
(K) for N=10.
The address computation can be implemented in several ways, but the following solution appears to be the simplest. Let g be a normal hashing function producing addresses in some interval [
0
, M], where M is sufficiently large, say, M>2
20
. To compute the address of a record, the following sequence of hashing functions is employed:
h
i
(
K
)=
g
(
K
)mod(
N
×2
i
),
i
=0, 1, . . .
where N is the minimum size of the hash table. (It is noted that if N is a power of two, the modulo operation reduces to extracting the last bits of g(K).) The hashing function g(K) can be implemented in several ways. Functions of the type g(K)=(cK) mod M, where c is a constant and M is a large prime have experimentally been found to perform well. Different hashing functions are easily obtained by choosing different values for c and M.
The current state of the hash table is tracked by two variables:
L=number of times the table size has doubled (from its minimum size, N); and
p=pointer to the next bucket to be split, p<N×2
L
.
Given a key K, the current address of the corresponding record can be computed as follows:
addr=
h
L
(
K
) 1.
if(addr<p) then addr=
h
L+1
(
K
) 2.
Contracting the table by one bucket is exactly the inverse of expanding it by one bucket. First, the state variables are updated and then all records of the last bucket are moved to the bucket referenced by pointer p, and the last bucket is freed.
The discussion thus far has focu
Krishnan Murali R.
Larson Per-Ake
Reilly George V.
Ellis Kevin L.
Lee & Hayes PLLC
Microsoft Corporation
LandOfFree
Scaleable hash table for shared-memory multiprocessor system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Scaleable hash table for shared-memory multiprocessor system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Scaleable hash table for shared-memory multiprocessor system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3154609