Streaming metatree data structure for indexing information...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06721723

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to storing information and, more particularly, to a tree configuration by which an indexing data base is stored in streaming memory and accessed.
2. The Prior Art
In many computer applications, large amounts of information must be stored and accessed. Generally, during the process of deciding how this information is to be stored, a tradeoff must be made between time and memory. The time variable includes the amount of time necessary to store information, to locate a particular piece of information, and to recreate the information once located. The memory variable includes the amount of memory necessary to store the information and to store and execute the software necessary to store, locate, and recreate the information.
There are actually two time/memory issues related to storing information, the first issue being how the information itself is stored in an information data base and the second issue being how a particular item of information is found within the information data base. The simplest way to store information is linearly, that is, information is stored in data memory as it is received and is not modified or compressed in any way. In such a system, a given amount of information occupies a proportional amount of data memory. The main advantage of such a system is that the amount of time needed store the information is minimized. The main disadvantage is that data memory requirements and the time needed to retrieve the information grow in direct proportion to the amount of information stored.
The simplest way to find a particular item of information is to linearly search the entire information data base for the item until it is found. This method is advantageous in that it is simple to implement, but the amount of time needed to find particular information is unpredictable in the extreme and the average time to find a particular piece of information can be unduly great.
An alternate method for finding information is to use a keyword data base, also called an index. The index is stored in memory separate from the information data base. Each keyword of the index includes a set of pointers that points to one or more locations in the information data base that correspond to that keyword. Thus, rather than searching a large information data base for particular items of data, an index is searched for keywords, typically greatly reducing the search time.
The simplest index structure is an alphabetic list of the data items, with each item followed by the appropriate pointers. The disadvantage of such a structure is that, in order to find any particular data item, the list must be searched from the beginning, leading to a potentially long search time. There are ways of decreasing the search time, such as by fixing the size of each index entry or by creating another list of pointers to each item in the index. Each increases the amount of memory needed, requiring a time/memory tradeoff.
An alternate structure for decreasing search time of an index data base is a tree structure, which consists of a group of related nodes, each node containing a subset of the stored data items, where the relationship between the nodes defines the data items. Each unique data item is stored as a set of linked nodes. In one tree structure, such as that described in U.S. Pat. Nos. 5,488,717 and 5,737,732, common parts of data items are combined into single nodes, followed by nodes containing the unique parts of the data items. The node containing the first part of the data item is called the root node and is generally common to more than one data item. The node containing the last part of the item is called the leaf node and is unique for each data item. The data base is searched from the root node for a known data item. When the search reaches the leaf node for that data item, a pointer or other identifier in the leaf node is used to locate the data item in the information data base.
The memory in which the index data base is stored has two forms, primary storage and secondary storage. Primary storage is typically the local random-access memory, or RAM. Secondary storage is typically a disk drive or other mass storage device. The significant differences between the two are that primary storage is much smaller and much faster than secondary storage. For example, current personal computers typically have 64 Mbytes of RAM and 10 Gbytes of disk storage space, a factor of 200 difference. Further, the time it takes to access the RAM is, on average, more than 100,000 times faster than to access the disk.
The typical index data base that is so large that linear storage is out of the question is also far too large to fit completely into primary storage. Consequently, most of the data base resides on disk, which means that the vast majority of search time is taken up by reading data from disk, not by the processing time needed to find the object of the search.
Additionally, most tree structures require that new data items be added individually, which means that the vast majority of inversion time, the process of adding data to a tree, is taken up by writing the updated tree to disk following each data item inversion. Typically, a trade-off must be made between inversion time and search time. When selecting a tree structure of the prior art, one must decide whether inversion time or search time is to be minimized because none of the tree structures of the prior art provide both fast inversion time and fast search time.
Thus, there continues to be a need for a data structure for indexes that provides for heavy concentration of data, rapid and predictable information location times, rapid inversion times, and that is easily adapted to the physical structure of secondary storage media.
SUMMARY OF THE INVENTION
An object of the present invention is to provide a data base structure that reduces secondary storage accesses during both the process of adding data items and searching for data items.
Another object is to provide a data base structure that minimizes secondary storage accesses while maximizing storage usage.
The essence of the streaming metatree (SMTree) of the present invention is its data storage structure. A typical prior art data structure uses randomly located nodes that point to the next succeeding node of the data item. Thus, nodes are followed by pointers to find a particular data item, jumping randomly between primary and secondary storage. On the other hand, the SMTree structure of the present invention stores nodes in a logically linear fashion, hence the term “streaming”. Pointers are used in the present invention, although not in the same way as in the data structures of the prior art. Physical computer memory is composed of fixed-size blocks, throughout which the SMTree is distributed. Since the blocks can be randomly located in a physical sense, pointers are required. However, the data structure is such that a memory block will only be traversed at most one time during a search. Information stored within the node indicates its relationship to the other nodes and within the tree hierarchy. The SMTree structure is particularly suited for indexing-type structures.
There are two basic embodiments of the SMTree, a “horizontal” embodiment and a “vertical” embodiment, and a hybrid embodiment that uses components of both. The horizontal embodiment is most preferred because it is more efficient for the vast majority of applications.
Logically, the SMTree is composed of lists of alternative nodes. The root alternate list is a list of all data units that begin data items in the SMTree. When searching for a data item in the SMTree, the appropriate data unit from the root alternate tree is found and followed to the next lower level alternate list. This continues until the leaf node for the data item being search is reached. Following the leaf node is at least one identifier that references external objects.
Note that every data unit is considered to be part of an alternate list, and that many data units are in alternate

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Streaming metatree data structure for indexing information... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Streaming metatree data structure for indexing information..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Streaming metatree data structure for indexing information... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3228037

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.