Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-08-01
2003-05-20
Choules, Jack (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06567815
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to a technique for efficient storage of recursive data structures in computer memory. More particularly this invention relates to improvements in the arrangement of binary tree elements in a cache computer memory and improved performance of binary tree operations.
2. Description of the Related Art
Binary trees are widely employed data structures in practical computer applications, as they enable the rapid access of keyed data sets. Search binary trees, an example of which is seen in
FIG. 1
, are particularly useful for rapid localization of data. A search binary tree
10
satisfies the following conditions:
1. Every node, for example node
12
, has a unique key
14
.
2. All keys in the left subtree, indicated by the dotted line
16
are smaller than the key
18
in the root
20
.
3. All keys in the right subtree, indicated by the dotted line
22
, are larger than the key
18
in the root
20
.
4. The left and right subtrees are also binary search trees.
Known common operations on binary trees include the insert operation, which inserts a new key into the set, the delete operation, which deletes a key from the set, and the traversal of the tree. The traversal operation outputs the keys in a specific order. Forms of traversal are the inorder traversal, preorder traversal, and postorder traversal. Another important and relevant operation is the membership operation, which checks if a given key belongs to the set.
Frequently a binary tree is a dynamic data structure, in which nodes are allocated during runtime using the heap-allocation mechanism found in languages like C and C++. In its classical implementation, each node of the binary tree contains a key, and two pointers to descendant nodes. Often, two additional pointers are kept in each node: a pointer to the data associated with the key, particularly if the data is large, and a pointer to the ancestor node.
There are two main drawbacks to such an implementation. First, the memory space occupied by the binary tree can considerably exceed the original memory space needed for the data set itself. For example, if the size of each key and pointer is eight bytes, then the size of each node is 24 bytes—three times as large as the original key. It will be evident that eliminating the pointers saves more than 50% of the memory store for the binary tree.
Secondly, as a result of using the heap to allocate and deallocate the nodes, the nodes can become scattered across memory. This is especially true when the user performs a large number of insertions and deletions, because each operation usually allocates or deallocates memory. The result is a functionally inefficient layout, and an increased number of cache misses and page faults during traversals of the binary tree. Analysis of pointer-based algorithms has indicated that as hardware performance has improved over time, cache performance increasingly outweighs instruction count as a determinant of overall performance. This is partly due to an increasing cache-miss penalty in modern machines, compared to older computers, and partly due to a trend of decreasing processor cycle time relative to memory access time. An example of such analysis is found in the paper The Influence of Caches on the Performance of Heaps, LaMarca Anthony, and Ladner, Richard E., The ACM Journal of Experimental Algorithmics, Jan. 6, 1997.
In dynamic binary trees, gaps between nodes in memory layout can be exploited during insertions and deletions. Improved performance in these dynamic operations tends to offset slow tree traversal due to increased memory latency. However in search trees in which insertion and deletions are uncommon, there is no offsetting benefit, and poor spatial locality of data in memory is especially undesirable.
Prior art approaches to improving global performance of algorithms having poorly localized data layout have involved reorganizing the computation while leaving the data layout intact. This is difficult, and in practice not too effective in the case of recursive data structures such as binary trees.
Hardware optimizations such as prefetching have resulted in better performance, however they have not afforded a completely satisfactory solution.
Another alternative to reducing cache misses is to maintain the entire binary tree in the cache during the program lifetime. Cache misses would only occur the first time the tree is accessed. Generally, however, this is not practical. If the tree is large, it will not fit into the cache. Furthermore the availability of the cache to other processes would be reduced, which might decrease the performance of the program itself, and generally degrade the performance of other processes executing on the computer.
SUMMARY OF THE INVENTION
It is therefore a primary object of some aspects of the present invention to improve the performance of computer algorithms involving pointer-based data structures.
It is another object of some aspects of the present invention to speed up binary tree traversals.
It is a further object of some aspects of the present invention to improve the performance of cache memory by increasing spatial locality in memory of recursive data structures.
It is still another object of some aspects of the present invention to decrease the memory space required by recursive data structures.
These and other objects of the present invention are attained by arranging a tree structure in a memory by the steps of: defining a tree data structure, wherein a parent node in the tree has a predetermined number of child nodes; defining an indexed array of data elements for storage thereof in a memory, wherein each element of the array holds a node of the tree; associating a parent node of the tree with a first index of the array, and associating each child node of the parent node with second indices of the array, wherein predefined individual functional relationships exist between the first index and each of the second indices; and mapping the nodes indexed by the first index and by the second indices to a predefined area of a memory which can be efficiently accessed. The arrangement is such that each node of the tree is assigned a unique position or index inside the array. From the functional arrangements existing therebetween, and given the position of any node it is easy to calculate the position of a parent or a child node. In some preferred embodiments of the invention, triplet-tiles formed by a parent and its children are positioned consecutively in the array.
According to an aspect of the invention the memory is a cache memory, and the predefined area of the memory is a cache line.
According to another aspect of the invention the predefined area of the memory is a page.
According to yet another aspect of the invention the tree is traversed by accessing one of the nodes in the predefined area of the memory according to the index associated therewith, calculating an index of a parent node or a child node of the accessed node according to the functional relationship therebetween, and then accessing the node that is associated with the calculated index.
The invention provides a method of arranging a binary tree structure in a memory, which is performed by: defining a binary tree data structure which has a plurality of nodes, including at least a parent node, a first child node and a second child node; and defining an indexed array of data elements for storage thereof in a memory, wherein each data element holds a node of the tree; associating a parent node of the tree with a first index of the array; associating a first child node of the parent node with a second index of the array; and associating a second child node of the parent node with a third index of the array. Predefined functional relationships exist between the first index and the second index, and between the first index and the third index. The nodes associated with the first index, the second index and the third index are mapped to a predefined area of a memory which can be efficiently accessed.
Rubin Shai
Zaks Ayal
Choules Jack
Darby & Darby
LandOfFree
Technique of clustering and compaction of binary trees does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Technique of clustering and compaction of binary trees, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Technique of clustering and compaction of binary trees will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3070061