Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-12-30
2004-01-06
Amsbury, Wayne (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
Reexamination Certificate
active
06675173
ABSTRACT:
FIELD OF THE INVENTION
The present invention is in the general field of databases and database management systems.
BACKGROUND OF THE INVENTION
Using trees as a database structure for accessing data records is very common, and indeed, tree schemes that serve to this end are known in literature. When considering a large amount of data, it is of particular importance to maintain a so-called balanced structure of the tree, in order to avoid long paths for accessing a given data record from the root node to the leaf node that is associated with the sought data record. In order to cope with these shortcomings, various tree structures, such as the known Btree of 2- 3-tree, confer in inherent balanced tree structure, even after the tree has undergone modification, such as the insertion of a new data record, the deletion of an existing data record and/or the updating of the value of a given data record in the tree. The inherent balance (or essentially balanced) structure is accomplished, however, at the penalty of inflating the contents of the nodes in the tree and, consequently, unduly increasing the file size that holds the tree, particularly insofar as large trees which hold multitude of data records are concerned. The large volume of the files adversely affects the performance of the data management system in terms of accessing time to a sought data record, which is obviously undesired.
There are trees available in the art which are more efficient in terms of the volume of data that is held in entry nodes, e.g. the tri-S tree and, consequently, the file size of tri-S-tree, which holds the same number of data records, is significantly smaller than the counterpart size of an inherently balanced tree, e.g.-2-3-tree or Btree. However, the tri-S-tree is inherently unbalanced which, as explained above, adversely the affects the performance in terms of access time to data records, and whilst there are proposed techniques which render this tree balanced, the application thereof in real life scenarios is practically infeasible.
There is a accordingly a need in the art to provide a generic technique which will enable to essentially balance trees which are inherently susceptible to an unbalanced structure, and which will not interfere with the intrinsic search scheme that is associated with the new balanced tree.
Realization of data dictionaries which provide information as to the type of stored data, definition of data fields etc. is well known in the literature, and there are multitute techniques that serve to this end. There is however a need in the art to provide a a data dictionary structure that is incorporated with the digital tree structure. Reflection of the data model (such as, Hierarchy, Relational, Object Oriented, Object Relational) and reflection of several data models simultaneously from within the data elements and the embodiment of the data relations would allow higher efficiency in DBMS mechanism.
Detailed information on Tri-S (tries) can be found at—Donald Knuth, The Art of Computer Programming, Vol. III, Sorting and Searching, Third Edition, pages 481-490, 493-494, 499-502, 505. A specific form of tries is a compressed form of tires called Patricia tries—Donald Knuth, The Art of Computer Programming, Vol. III, Sorting and Searching, Third Edition, pages 490-493, 497-499, 501-504. A Patricia trie is an example of a sparse trie that differs from a standard trie in that nodes with one child are compressed into their parent node, so that all nodes have at least two children. An example of a Patricia trie, is shown in
FIG. 3A
, where the nodes are labeled with their depth: the position in the key represented by the node (in the example of
FIG. 3A
, the node represent nibble position in the key). Because not every character of the key is examined during the search, the record that is ultimately found must be checked against the search key. For example, if we search for record g (A333444) in
FIG. 3A
, we will follow nodes with the values 3 and 7 in block
60
and the node with the value 9 in block
61
to reach the g record by the link labeled
4
. We now need to compare the search key with the key of record g hence a search for (A333445) would lead to record g as well. The size of the Patricia trie does not depend on the length of inserted keys. Rather, each new key adds at most a single link and node to the index regardless of the actual key length. Furthermore, the unlike B-trees, Patricia tries grow slowly even as large numbers of strings are inserted because of the aggressive (lossy) compression inherent in the structure.
Although researchers have long known about Patricia tries, such structures have rarely been used to manage large amounts of data, especially disk-based data, because they are unbalanced and best suited for usage in main memory. There is a need in the art for a structure that has the graceful scaling properties of Patricia tries, but that is balanced and optimized for disk-based access like B-trees.
SUMMARY OF THE INVENTION
The technique of the invention allows for a structure of the kind specified (applied for tries and sparse tries, not only to Patricia tries). It adds extra index layers to allow an update or search to proceed directly to the needed portion of the index. Every update and query accesses about the same number of layers, providing balanced access to the index. The extra layers constitute a horizontal index (referred to as horizontal oriented digital tree structure) that includes the vertical structure of the original index (in the example of FIG.
3
A—a Patricia trie), referred to as vertical oriented digital tree structure.
The present invention provides for A method for obtaining balanced digital tree structure; the digital tree structure including a first vertical oriented digital tree structure that is susceptible to unbalanced structure of blocks due to modify transactions; the first digital tree including blocks, each, accommodating a plurality of nodes and links originating from said nodes; the method including the steps of:
constructing i (i>=1) vertical oriented digital tree structure levels which, along with said first digital tree structure, constitute i+1 vertical oriented digital tree structure levels,; said first digital tree constituting the lower vertical oriented tree; the i trees are arranged such that from blocks of the j
th
tree from among said i trees, it is possible to access horizontally all the blocks of the (j+1)
th
, lower level, digital tree structure, according to a common key value of the accessed block, whereby an essentially balanced horizontal oriented digital tree structure is obtained.
Still further the invention provides for a method for obtaining a balanced digital tree structure; the tree including blocks each accommodating a plurality of nodes and links originated from said nodes; leaf nodes from among said nodes are associated with data records; the method comprising executing the following steps as many times as required:
(I) replacing a block, constituting a replaced block, with at least two split blocks, constituted by a splitting block and at least one split block, such that few from among the nodes of said split block are accommodated within said splitting block and the remaining nodes from among the nodes of said split block are accommodated within the al least one split block; the said few nodes including a splitting node associated with at least one split link and the remaining nodes including at least one split node associated with said at least one split link;
(II) in the case that said splitting block is not a child block,
(a) constructing a father block;
(b) coping at least the splitting node to the father block, thereby constituting at least one duplicate splitting node;
(c) linking at least one duplicate splitting node to the splitting block by means of a direct pointer);
(d) linking, by far link, at least one duplicate splitting node to the at least one split block; the far link(s) having the value of said split link(s);
(e)
(III) in the case that said splitting bl
Amsbury Wayne
Oliff & Berridg,e PLC
Ori Software Development Ltd.
LandOfFree
Database apparatus does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Database apparatus, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Database apparatus will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3225211