Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-06-04
2004-08-31
Rones, Charles (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C341S050000
Reexamination Certificate
active
06785687
ABSTRACT:
TECHNICAL FIELD
The present invention relates generally to the field of data structures, and more particularly to a dynamic data structure which dynamically adapts storage allocated to describe a population to the size of the population.
BACKGROUND
Computer processors and associated memory components continue to increase in speed. As hardware approaches physical speed limitations, however, other methods for generating appreciable decreases in data access times are required. Even when such limitations are not a factor, maximizing software efficiency maximizes the efficiency of the hardware platform, extending the capabilities of the hardware/software system as a whole. One method of increasing system efficiency is by providing effective data management, achieved by the appropriate choice of data structure and related storage and retrieval algorithms. For example, various prior art data structures and related storage and retrieval algorithms have been developed for data management including arrays, hashing, binary trees, AVL trees (height-balanced binary trees), b-trees, and skiplists. In each of these prior art data structures and related storage and retrieval algorithms an inherent trade-off has existed between providing faster access times and providing lower memory overhead. For example, an array allows for fast indexing through the calculation of the address of a single array element but requires the pre-allocation of the entire array in memory before a single value is stored, and unused intervals of the array waste memory resources. Alternatively, binary trees, AVL trees, b-trees and skiplists do not require the pre-allocation of memory for the data structure and attempt to minimize allocation of unused memory but exhibit an access time which increases as the population increases.
An array is a prior art data structure which has a simplified structure and allows for rapid access of the stored data. However, memory must be allocated for the entire array and the structure is inflexible. An array value is looked up “positionally”, or “digitally”, by multiplying the index by the size (e.g., number of bytes) allocated to each element of the array and adding the offset of the base address of the array. Typically, a single Central Processing Unit (CPU) cache line fill is required to access the array element and value stored therein. As described and typically implemented, the array is memory inefficient and relatively inflexible. Access, however, is provided as O(
1
), i.e., independent of the size of the array (ignoring disk swapping).
Alternatively, other data structures previously mentioned including binary trees, b-trees, skiplists and hash tables, are available which are more memory efficient but include undesirable features. For example, hashing is used to convert sparse, possibly multi-word indexes (such as strings) into array indexes. The typical hash table is a fixed-size array, and each index into it is the result of a hashing algorithm performed on the original index. However, in order for hashing to be efficient, the hash algorithm must be matched to the indexes which are to be stored. Hash tables also require every data node to contain a copy of (or a pointer to) the original index (key) so you can distinguish nodes in each synonym chain (or other type of list). Like an array, use of hashing requires some preallocation of memory, but it is normally a fraction of the memory which must be allocated for a flat array, if well designed, i.e., the characteristics of the data to be stored are well known, behaved and matched to the hashing algorithm, collision resolution technique and storage structure implemented.
In particular, digital trees, or tries, provide rapid access to data, but are generally memory inefficient. Memory efficiency may be enhanced for handling sparse index sets by keeping tree branches narrow, resulting in a deeper tree and an increase in the average number of memory references, indirections, and cache line fills, all resulting in slower access to data. This latter factor, i.e., maximizing cache efficiency, is often ignored when such structures are discussed yet may be a dominant factor affecting system performance. A tree is a tree of smaller arrays, or branches, where each branch decodes one or more bits of the index. Prior art digital trees have branch nodes that are arrays of simple pointers or addresses. Typically, the size of the pointers or addresses are minimized to improve the memory efficiency of the digital tree.
At the “bottom” of the digital tree, the last branch decodes the last bits of the index, and the element points to some storage specific to the index. The “leaves” of the tree are these memory chunks for specific indexes, which have application-specific structures.
Digital trees have many advantages including not requiring memory to be allocated to branches which have no indexes or zero population (also called an empty subexpanse). In this case the pointer which points to the empty subexpanse is given a unique value and is called a null pointer indicating that it represents an empty range of indexes. Additionally, the indexes which are stored in a digital tree are accessible in sorted order which allows identification of neighbors. An “expanse” of a digital tree as used herein is the range of values which could be stored within the digital tree, while the population of the digital tree is the set of values that are actually stored within the tree. Similarly, the expanse of a branch of a digital tree is the range of indexes which could be stored within the branch, and the population of a branch is the number of values (e.g., count) which are actually stored within the branch. (As used herein, the term “population” refers to either the set of indexes or the count of those indexes, the meaning of the term being apparent to those skilled in the art from the context in which the term is used.)
“Adaptive Algorithms for Cache-Efficient Tree Search” by Acharya, Zhu and Shen (1999), the disclosure of which is hereby incorporated herein by reference, describes cache-efficient algorithms for tree search. Each of the algorithms use different data structures, including a partitioned-array, B-tree, hashtable, and vectors, to represent different nodes in a trie. The data structure selected depends on cache characteristics as well as the fanout of the node. The algorithms further adapt to changes in the fanout at a node by dynamically switching the data structure used to represent the node. Finally, the size and the layout of individual data structures is determined based on the size of the symbols in the alphabet as well as characteristics of the cache(s). The publication further includes an evaluation of the performance of the algorithms on real and simulated memory hierarchies.
Other publications known and available to those skilled in the art describing data structures include
Fundamentals of Data Structures in Pascal
, 4th Edition; Horowitz and Sahni; pp. 582-594
; The Art of Computer Programming
, Volume 3; Knuth; pp. 490-492; Algorithms in C; Sedgewick; pp. 245-256, 265-271; “Fast Algorithms for Sorting and Searching Strings”; Bentley, Sedgewick; “Ternary Search Trees”; 5871926, INSPEC Abstract Number: C9805-6120-003; Dr Dobb's Journal; “Algorithms for Trie Compaction”, ACM Transactions on Database Systems, 9(2):243-63, 1984; “Routing on longest-matching prefixes”; 5217324, INSPEC Abstract Number: B9605-6150M-005, C9605-5640-006; “Some results on tries with adaptive branching”; 6845525, INSPEC Abstract Number: C2001-03-6120-024; “Fixed-bucket binary storage trees”; 01998027, INSPEC Abstract Number: C83009879; “DISCS and other related data structures”; 03730613, INSPEC Abstract Number: C90064501; and “Dynamical sources in information theory: a general analysis of trie structures”; 6841374, INSPEC Abstract Number: B2001-03-6110-014, C2001-03-6120-023, the disclosures of which are hereby incorporated herein by reference.
An enhanced storage structure is described in U.S. patent application Ser. No. 09/457,164 filed Dec. 8, 1999, current
Baskins Douglas L.
Silverstein Alan
Hewlett--Packard Development Company, L.P.
Rones Charles
LandOfFree
System for and method of efficient, expandable storage and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System for and method of efficient, expandable storage and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for and method of efficient, expandable storage and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3356712