Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-01-04
2003-07-15
Mizrahi, Diane D. (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06594655
ABSTRACT:
FIELD AND BACKGROUND OF THE INVENTION
The present invention relates to radix-tree search structures and, in particular, it concerns radix-tree search structures having the ability to handle wildcards in any location within an entry.
A large number of computing or networking tasks require recognizing keys from a database, such as a lookup table in a computer network or dictionaries in general. After establishing a match between the search key and a data sequence from the database, either the information linked to this search key is retrieved or a program driven by this search key is executed.
In communication networks consisting of a number of interconnected users or nodes, data can be sent from one node to any other. Specialized nodes called routers are responsible for delivering or “forwarding” the data to their destination. By analogy, the routers act as post offices. As letters, any of the data sent through a communication network contains information about the destination address, generally as part of a so-called header.
Each router compares this information or at least part of it with a list of addresses stored internally. If a match between stored addresses and the destination address is found, the router establishes a path leading to the destination node. Depending on the size of the network and its structure, the data are either directly forwarded to their destination or sent to another (intermediate) router, very much the same way a letter is passed through several post offices until reaching its final address (if ever).
The explosive growth of the Internet has forced a review of address assignment policies. The traditional uses of general purpose networks have been modified to achieve better use of IP's 32-bit address space. Classless Inter Domain Routing (CIDR) is a method currently being deployed in the Internet backbones to achieve this added efficiency. CIDR depends on deploying and routing to arbitrarily sized networks. In this model, hosts and routers make no assumptions about the use of addressing in the internet. The Class D (IP Multicast) and Class E (Experimental) address spaces are preserved, although this is primarily an assignment policy.
By definition, CIDR comprises three elements:
topologically significant address assignment,
routing protocols that are capable of aggregating network layer reachability information, and
consistent forwarding algorithm (“longest match”).
The use of networks and subnets is now historical, although the language used to describe them remains in current use. They have been replaced by the more tractable concept of a network prefix. A network prefix is, by definition, a contiguous set of bits at the more significant end of the address that defines a set of systems; host numbers select among those systems. There is no requirement that all the internet use network prefixes uniformly. To collapse routing information, it is useful to divide the internet into addressing domains. Within such a domain, detailed information is available about constituent networks; outside it, only the common network prefix is advertised.
The classical IP addressing architecture used addresses and subnet masks to discriminate the host number from the network prefix. With network prefixes, it is sufficient to indicate the number of bits in the prefix. Both representations are in common use. Architecturally correct subnet masks are capable of being represented using the prefix length description.
An effect of the use of CIDR is that the set of destinations associated with address prefixes in the routing table may exhibit subset relationship. A route describing a smaller set of destinations (a longer prefix) is said to be more specific than a route describing a larger set of destinations (a shorter prefix); similarly, a route describing a larger set of destinations (a shorter prefix) is said to be less specific than a route describing a smaller set of destinations (a longer prefix). Routers must use the most specific matching route (the longest matching network prefix) when forwarding traffic.
Thus, the CIDR routing method provides an excellent example of the necessity of having partial or prefix matching ability within a data structure for retrieving keys.
The data structure of the invention is a database organized as tree. A tree consists of a number of nodes, each of which possibly branch or point to other nodes.
A class of trees is known in which the key is stored in the node. A decision whether and to which node to branch implies a comparison between the search key and those stored in the node. The result of this comparison determines the choice of the following subtree. Typically, binary search trees belong to this class. The main disadvantage of this class of trees and its corresponding search method is that all bits of the key are in the worst case compared k times, k denoting the maximum number of levels within the tree.
In another approach, termed radix-search methods, the search proceeds by examining the search keys one small piece at a time, rather than using full comparisons between keys at each step. Radix-search methods are particularly suited to handling variable-length keys. They also provide reasonable worst-case performance. Two important disadvantages, noted by R. Sedgewick (“Algorithms in C++”, Addison Wesley, 1998), are that such search methods can make inefficient use of space, and that performance can suffer if efficient access to the bytes of the keys is not available.
In another class of trees known as “tries”, the keywords or data belonging to these keywords are stored in the terminal nodes or leaves of the tree. When a key is searched or inserted, the bits of the key determine the path to be followed down to a leaf. Trees of this class are called “trie” in the technical field—a term introduced as an allusion to the term “retrieval”. If a key, for example, is a sequence of n bits (either 0 or 1), the decision to branch to the left (represented by 0) or to the right (1) at the kth level of the tree is made by using the kth bit of the key sequence, with k being a number between 1 and n. In a data structure belonging to this class, each bit of the key is compared only once.
This data structure is, however, storage consuming: many of these internal nodes may have one empty subtree, and, thus, are traversed without a gain of information.
Such unnecessary internal nodes are eliminated in a compact data structure known as Patricia trie. The Patricia trie is characterized by having a minimum number of internal nodes, or, equally, pointers or key-bit inspections. Each node within a Patricia trie contains information about the bits which characterize the compacted path, e.g., the number of bits to skip before making another bit comparison to decide the direction to branch to. An example of a Patricia trie can be found in: D. E. Knuth, “The Art of Computer Programming”, Vol. 3: Sorting and Searching, 1973, pp. 490-493.
Patricia tries handle keys of varying lengths and are characterized by a deterministic structure. Additional characteristics of Patricia tries are summarized by R. Sedgewick (“Algorithms in C++”, Addison Wesley, 1998), pp. 637-645. Typical applications include locating strings containing names of different lengths, searching for IP addresses, etc.
To date, however, Patricia tries have been fundamentally incapable of handling keys containing wildcards, with the exception of wildcards at the suffix. When a key is stored in a leaf of a Patricia trie, and the path to the leaf examined is no more than the k
th
bit out of n bits (n>k), the bits that follow the k
th
bit have no effect on the tree, such that a plurality of keys that are identical over the first k bits can be stored in a single leaf. Thus, the plurality of keys may be allowed to contain one or more wildcards in the remaining (n−k) bits. It must be emphasized that such wildcards are not part of the tree
ode structure, but are contained in the leaves.
The use of strings containing wildcards in the node structure of radix-search tr
Gabbay Itzchak
Rachamim Shifra
Tal Alex
Tiger Roostam
Ezchip Technologies Ltd.
Friedman Mark M.
Mizrahi Diane D.
Wu Yicun
LandOfFree
Wildcards in radix- search tree structures does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Wildcards in radix- search tree structures, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Wildcards in radix- search tree structures will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3040354