Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-06-30
2001-12-11
Von Buhr, Maria N. (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C711S216000
Reexamination Certificate
active
06330557
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to computer systems, and more specifically to the storage of data using a hash table.
2. Related Art
Since the creation of computer systems, engineers have struggled to maintain an optimal balance between efficient storage and fast location of data: Storage needs to be efficient because memory is finite and costly; Data needs to be quickly located to optimize system performance.
A traditional method for storing and accessing data employs a hash table, which is a type of database. A computer system employing such a methodology is shown in FIG.
1
. The illustrated system
100
includes the standard array of components: a central processing unit
102
; a user interface
104
, which typically consists of a keyboard, a mouse, and a monitor; a primary memory
106
, such as a random access memory, for program execution; and a secondary memory
108
, such as a disk, for storing programs and data that are not immediately needed for execution. Contained within these memories
106
,
108
are: an operating system
110
that manages the computer's resources; a database
112
that contains a variety of data objects
114
, each containing a key
116
and associated data
118
; and a traditional hash program
120
.
The workings of the traditional hash program
120
are well-known to those skilled in the art, but will be explained here for the lay reader. Basically, the program
120
uses keys
116
to locate and store objects
114
in the database
112
: each key
116
is unique and therefore can be used to identify the key's associated data
118
. Typically, keys
116
are potentially large semi-random numbers drawn from a set of such numbers.
For example, consider a business that desires to track its clients' accounts. A database
112
would be created to hold the tracked data
118
. In this case, possible keys
116
include the customers' credit card or social security numbers; the associated data
118
could be billing information, such as available credit and billing addresses.
No business would have an account for every social security number or a VISA account holder, and consequently it is unnecessary and wasteful to size a database large to house data for all such individuals. To more efficiently utilize memory, the near infinite set of social security or VISA card numbers is mapped to a smaller, more manageable set. This is done using a hash function, which is contained in the hash program
120
. For example, a hash function might map
100
billion possible credit card numbers to a set of 150,000 possible numbers, using mathematical operations to translate the key to a smaller number. The resulting number is called a “hash address” or a “hash value” and identifies an entry of the database
112
for the key
118
. In
FIG. 1
, the dotted boxes
122
represent the hash address at which each respective data object
114
is stored in the database
112
.
Since the hash function greatly reduces the amount of information from the key
116
when translating it to the hash address
122
, the function may translate two keys to the same address. This occurrence is called a “collision” and must be resolved by the hash program
120
. That is, the program must have a method for storing two objects whose keys hash to the same address. One such method is known as “bucket chaining.” “Bucket” refers to the database entry for a given hash address; “chain” refers to a linked list of data objects that hash to the same address. In
FIG. 1
, a “bucket chain”
124
is shown linking two such data objects
114
, data object i and data object j. Other types of collision resolution are possible and are well-known those skilled in the art. Thus, they need not be described here.
When resolving collisions, by bucket chaining or similar methods, the traditional hash program
120
stores each object's key
116
along with the associated data
118
in the data object
114
. When the program
120
later attempts to access a given object
114
, it will again run that object's key
116
through the hash function to produce the key's hash address
120
. The program then compares the key used to produce the hash address
122
with the key
116
stored in the data object
114
at the hash address. If these keys are identical, the object has been found. If not, the program
120
continues its search along the chain
124
. If no stored key equals the newly produced key, there is no database entry for the key, so a new entry is created.
The traditional hashing methodology provides a reliable means for storing and locating data. But because memory costs are considerable, improvements to the scheme can represent significant cost savings. This is so because reducing the amount of memory needed to store a given set of objects frees memory for other resources. And when each object conserves memory, the overall cost of the computer system is reduced because less memory is required to achieve storage goals. Any reduction in the amount of memory needed to store and access data is therefore valuable.
INVENTION SUMMARY
Embodiments of the present invention provide methods and systems that allow for more efficient storing of data than was before possible. A hash function is utilized to map keys to addresses of a database. These addresses are then used to identify entries of the database in which data associated with the keys is stored. A second function is used to create a specification value for the key. This value, when compared with the hash address, uniquely defines the key for which the associated data belongs. That is, where s
i
, s
j
and h
i
, h
j
, represent respective specification and hash values for keys, k
i
, and k
j
, k
i
must equal k
j
whenever the two keys' hash and specification values are equal. Alternatively,
IF
(
s
i
=s
j
)&(
h
i
=h
j
)
THEN
(
k
i
=k
j
)
By designing a specification function that adheres to the above constraint, the invention significantly reduces the amount of memory needed for storage when the key's specification value is smaller than the key itself. This is because the embodiments of the invention store the specification values in the database instead of the keys. The embodiments conclusively identify the data associated with any key by examining the specification values stored at the key's hash address. More specifically, the embodiments rerun the key through the specification function and compare the resulting specification value with the specification values of objects that are hashed to the key's hash address. If the two specification values are the same, the object has been found because no two keys can have identical hash and specification values.
In one embodiment, the invention stores data using
25
percent less of memory than that needed by traditional systems. This embodiment uses IP addresses as keys to reference four bytes of associated data and uses the hash function:
H
(Key)=Key mod CONSTANT,
In this embodiment, the Key is an IP address and the CONSTANT is 65537. The function therefore becomes:
H
(
IP
)=
IP
mod 65537.
The embodiment employs the following specification function, which is designed to satisfy the constraint discussed above:
S
(
IP
)=Key div CONSTANT=
IP
div 65537.
The embodiment runs the IP addresses through these functions and stores the resulting two-byte specification values—as opposed to the four-byte IP addresses—in the database entries specified by the hash addresses.
To locate stored data for a particular IP address, that address is run through the hash function to discern the respective hash address. The specification value for the IP address is similarly produced by running the IP address through the specification function. The specification value stored at the hash address is retrieved and compared to the produced specification value. If the compared values are the same, the data stored at the hash address is conclusively determined to belong to the key.
RE
Carpenter, Esq. John R.
Sun Microsystems Inc.
Von Buhr Maria N.
LandOfFree
Method and system for storing data in a hash table that... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for storing data in a hash table that..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for storing data in a hash table that... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2581461