System and method for using a compressed trie to estimate...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06829602

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is directed to the field of database management, and, more specifically, to using a compressed data structure to estimate the amount of data processed by a query.
2. Description of the Prior Art
Prior to executing a query, a database management system (DBMS) may determine a “plan” for executing the query in the most efficient manner. To determine the plan, the DBMS estimates the amount of data that will be processed by a query at each stage of the execution. To make such estimation, the DBMS may use a data structure referred to as a “trie.” The trie is a model of a set of strings stored in a collection of data such as, for example, a relational data table. The trie enables the DBMS to quickly determine the number of strings in the collection of data that match a like predicate in a query.
An exemplary conventional trie is shown in FIG.
1
. The exemplary trie of
FIG. 1
includes the following strings: apple, apply, applying, seated, and seating. As shown, the top node
110
in trie
100
, which may be referred to as the “root” node, is empty. The remaining bottom nodes each include a single character. A square node identifies the last letter in each string. Tracing a path from the root node to a corresponding square node and concatenating the characters stored in the rightmost nodes at each level of the path forms each string.
A conventional trie such as trie
100
of
FIG. 1
has several drawbacks. Because each node in the trie includes only a single character, the trie may include a large number of nodes that occupy a large amount of memory. Furthermore, character-by-character matching may require a lot of time to perform, thereby delaying query execution. Another drawback is that repetitive suffixes such as “ing”, which is a suffix in both “applying” and “seating”, are identified in the trie multiple times. Such suffix repetition increases the amount of memory required to store the trie and increases the time required to perform matching. Thus, there is a need in the art for a “compressed” trie, in which multiple characters may be stored in a single node. Furthermore, it is desired that repetitive suffixes be identified and eliminated from such a compressed trie.
SUMMARY OF THE INVENTION
Accordingly, systems and methods for using a compressed trie to estimate like predicates are described. A compressed trie in accordance with the present invention has nodes including multiple character sub-strings. Such multiple character storage reduces the number of nodes in the trie, thereby reducing the amount of memory required for storing the trie and reducing the amount of time required to perform matching. Furthermore, in such a compressed trie, sub-strings are stored in a single character string. Each node references its corresponding sub-string by the sub-string's starting position and length in the character string. Multiple nodes may reference a single sub-string. Thus, referencing rather than storing sub-strings in corresponding nodes eliminates repetitive sub-string storage, thereby reducing the amount of memory required for storing the trie.
An exemplary embodiment of the present invention enables a string to be inserted into the trie. The string is assigned to one or more nodes in the trie by dividing the string into one or more sub-strings and assigning each sub-string to a corresponding node. Each sub-string is then added to a character string, in which each sub-string is preferably identified by a starting position and a length. The starting position and length of each sub-string is then stored at its corresponding node.
Another exemplary embodiment of the present invention enables the trie to be used to estimate the number of rows in a data table that match a like predicate. Beginning at a root node, the nodes in the trie are examined to determine if they match the like predicate. After examination, the counts of occurrences at each matching node are accumulated to determine a non-scaled estimate. The non-scaled estimate is then scaled based on the representative portion of the table that is included in the trie.


REFERENCES:
patent: 4864501 (1989-09-01), Kucera et al.
patent: 5111398 (1992-05-01), Nunberg et al.
patent: 5691917 (1997-11-01), Harrison
patent: 5701456 (1997-12-01), Jacopi et al.
patent: 2004/0003374 (2004-01-01), Van De Vanter et al.
Tries: Standard Tries, Compressed Tries, Properties of Suffix Ties, http://www.cs.purdue.edu/homes/axa/cs251/transparencies/Ch11-Tries-4×4.pdf, 3 pages.
Al-Suwaiyel, M. et al, “Algorithms for Trie Compaction”ACM Transactions on Database Systems,Jun. 1984, 9(2), 243-263.
Comer, D. et al., “The Complexity of Trie Index Construction”,Journal of the Association for Computing Machinery,Jul. 1977, 24(3), 428-440.
Comer, D., “Heuristics for Trie Index Minimization”,ACM Transactions on Database Systems,Sep. 1979, 4(3), 383-395.
Comer, D., “Analysis of a Heuristic for Full Trie Minimization”,ACM Transactions on Database Systems,Sep. 1981, 6(3), 513-537.
Heinz, S. et al., “Burst Tries: A Fast, Efficient Data Structure for String Keys”,ACM Transactions on Information Systems,Apr. 2002, 20(2), 192-223.
Krishnan, P. et al., “Estimating Alphanumeric Selectivity in the Presence of Wildcards”,SIGMOD,Jun. 1996, 282-293.
Maly, K., “Artificial Intelligence and Language Processing- Compressed Tries”,Communications of the ACM,Jul. 1976, 19(7), 409-415.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for using a compressed trie to estimate... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for using a compressed trie to estimate..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for using a compressed trie to estimate... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3295086

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.