Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-01-04
2003-10-14
Alam, Shahid Al (Department: 2172)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06633879
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to computer systems and more particularly to a method and system for optimizing the combination of a direct table and trees for storing data.
BACKGROUND OF THE INVENTION
Currently, direct tables and decision trees are utilized for searching for and accessing information, or items, in computer systems. For example, a large number of items may be desired to be stored. The items may be identified by a key, which is typically a binary number string of a particular length. Based on the key, information is desired to be obtained from the table. For example, the key might indicate a particular piece of data or that a specific action is to be taken.
If a direct table is used alone, without a tree, the direct table would typically have an entry for each unique value of the key. However, the key is typically several digits long. For example, one conventional key might be one hundred and ninety-two bits long. In order to ensure that a unique entry exists for every possible value of the key, the direct table would need to contain 2
192
entries. Such a direct table is too large to be practical.
Thus, a smaller direct table utilizing a shorter key, which will be termed a hash value, is typically used. The smaller key is termed a hash value because the key is typically hashed in order to obtain the shorter key. Thus, in order to obtain the hash value, the key is operated on by a hash function. The hash function combines or otherwise operates on the digits of a particular value of the key in order to obtain a hash value which may be unique to that value of the key. In general, it is desired that the hash value be a random hash of the key in order to avoid tying the hash value to closely to specific information related to the key. However, no specific hash function is required.
The direct table typically has an entry for each unique hash value. Because the hash value can be significantly shorter than the key, the direct table can be significantly smaller than a direct table for the key. For example, a key which is ninety-two bits long may be hashed to hash values which are on the order of ten bits long. A direct table for a hash value having a ten-bit length need only have 2
10
, or 1024, entries. The width of such a direct table is defined by the length of the hash value. Thus, a direct table using a ten-bit hash value is ten bits wide.
There can be more items to be stored in the direct table using the hash value or key than there are entries in the direct table. This is true for a direct table having a width, W, for which 2
W
is less than the number of items to be stored or if the same hash value is selected for different keys. Consequently, a plurality of items may reside in the same entry and be described by the same hash value. If a search is carried out based on the hash value only, the computer system may be unable to provide a unique decision or item for a hash. value. Such a situation is known as a collision. A double collision in which two items correspond to the same hash value, a triple collision in which three items correspond to the same hash value, or a higher order collision may occur. In order to resolve these collisions, a tree is utilized in conjunction with the direct table. Note that the same is true when a key is used in lieu of a hash value and 2
W
is less than the number of items to be stored or if more than one item can correspond to the same key.
A tree provides a mechanism for obtaining additional information and making decisions based on the information in order to uniquely identify the item desired to be found. For example, the tree might utilize additional bits from the key. Based on the additional information, the tree makes a decision at a node between two paths. Each piece of additional information utilized corresponds to a node. The items which collide are stored at the leaves of the tree at the end of a particular decision path through the tree. Thus, the combination of the direct table and the tree can be used to search for each of the items.
FIG. 1
, for example depicts a direct table
10
and corresponding decision trees. However, for the purposes of clarity, only decision trees
50
and
60
are labeled and discussed. In addition, the number and size of the corresponding trees for the direct table
10
is for explanation only and not meant to represent a particular preferred distribution. The direct table
10
is four bits wide, corresponding to a hash value or key of 4 bits. The direct table
10
thus has sixteen entries
12
,
14
,
16
,
18
,
20
,
22
,
24
,
26
,
28
,
30
,
32
,
34
,
36
,
38
,
40
and
42
. Suppose, however, that up to twenty-four items are desired to be found using the direct table
10
and trees
50
and
60
. Because there are more items than entries
12
,
14
,
16
,
18
,
20
,
22
,
24
,
26
,
28
,
30
,
32
,
34
,
36
,
38
,
40
and
42
, at least one of the entries
12
,
14
,
16
,
18
,
20
,
22
,
24
,
26
,
28
,
30
,
32
,
34
,
36
,
38
,
40
and
42
corresponds to more than one item. For example, entries
16
,
20
,
26
,
36
and
40
correspond to no items. Entries
12
,
22
,
32
and
34
correspond to a single item. Entries
14
and
28
correspond to two items. Entries
24
,
30
and
42
correspond to three items. Entry
18
and entry
38
correspond to four items. Thus, there is a quadruple collision for the entry
18
.
Where an entry corresponds to more than one item, a decision tree is used, for example, in order to uniquely identify an item corresponding to the entry
18
and the entry
38
, a tree
50
and
60
, respectively, will be used. Note that only a single tree corresponds to a particular collision. Because four items correspond to the entry
18
and the entry
38
, the trees
50
and
50
each have four leaves
54
,
55
,
56
,
57
and
64
,
65
,
66
, and
67
, respectively, which correspond to the four items. Depending upon the additional information for the four items used by the trees
50
and
60
, the items may be more difficult to identify. For example, the tree
50
makes a decision at node
51
corresponding to a first piece of information. The decision at node
51
leads to node
52
or
53
. Another piece of information obtained for either node
52
or node
53
leads to one of the leaves
54
,
55
,
56
, or
57
and, therefore, to the item desired. Thus, two decisions, one at node
51
and another at either node
52
or node
53
, lead to the item. In contrast, for the tree
60
that corresponds to the four items stored in the entry
38
, up to three pieces of information may be required to uniquely identify the item corresponding to the leaf
67
. Thus, a first piece of information will result either in the item corresponding to node
64
being found or in a decision requiring a second piece of information at node
62
. The second piece of information will lead either to the item corresponding to node
65
being found or to a decision requiring a third piece of information at the node
63
. The third piece of information will lead either to the item corresponding to the leaf
66
or to the item corresponding to the leaf
67
. Thus, the tree
50
and the tree
60
can be used to resolve the collision at the entry
18
and the entry
38
, respectively.
Thus, using the direction table
10
and the corresponding trees, such as the trees
50
or
60
, items can be uniquely identified. A wider direct table contains more entries and, therefore, can be used to uniquely identify a larger number of items. In other words, for a given number of items, a wider direct table will result in fewer collisions. Where more than one item can correspond to the same entry in a direct table, however, trees are utilized to resolve collisions.
Direct tables and trees each have different costs associated with them. For example, a direct table may require a first amount of memory for an entry. A tree may require a different amount of memory for a node. Similarly, it may take a different amount of time to find an item stored in an entry of
Al Alam Shahid
Sawyer Law Group
LandOfFree
Method and system for optimizing direct tables and trees does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for optimizing direct tables and trees, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for optimizing direct tables and trees will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3171904