Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-11-30
2001-08-21
Black, Thomas (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
06279007
ABSTRACT:
FIELD OF THE INVENTION
This invention relates generally to data storage and retrieval, and more particularly to data structures used in storing and retrieving hierarchical strings.
COPYRIGHT NOTICE/PERMISSION
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawings hereto: Copyright©1997, Microsoft Corporation, All Rights Reserved.
BACKGROUND OF THE INVENTION
A node in a hierarchical data structure can be addressed by traversing a data tree from a top, or root, node down through the “branches” of the data tree until reaching the target node. The path from the root node to the target node can be used to uniquely identify the node. Thus, in a data tree, such as data tree
100
shown in
FIG. 1
, node D
109
is uniquely defined by the string A/B/C/D, referred to as a “hierarchical value.” Each node above the node D
109
is a “parent” node for node D
109
, i.e., node A
101
, node b
103
, and node c
105
, and node D
109
itself is a “child” node for each of its parent nodes
101
,
102
and
103
. The node values of A, B, C and D in the hierarchical value can be replaced by other identifiers that define the parent nodes of the target node.
Traditional approaches to storing hierarchical values in standard relational databases create inefficiencies in performing database queries on the hierarchical values as can be seen in referring to the tables in
FIG. 2
used to store the hierarchical values for the data tree
100
of FIG.
1
. Hierarchical values for nodes are stored in multiple rows
201
in a relational database table
200
. Each row
201
contains a hierarchical value identifier
203
(assigned when the node is stored and often the next number in a sequence of identifiers) for a node, a parent hierarchical value identifier
205
that defines the immediate parent node, and the node value
207
. Comment column
209
contains the hierarchical value defining the node in the row and is shown in the table
200
to clarify the explanation of the table
200
but is not usually stored as part of the table
200
.
The structure of the table
200
provides for very efficient updates, such as inserting a new parent node, deleting a node, or renaming a node. However, the structure of the table
200
requires traversing many rows of the table
200
to find all children nodes of a particular parent node or all children nodes N levels down in the hierarchy.
For example, suppose a user wants to know all the children of node A
101
. First, the hierarchical value of “A” must be converted to the corresponding hierarchical value identifier by searching table
200
to find the row
201
with a node value
207
matching “A.” Once found, the hierarchical value identifier
205
for node A, or
1000
in this example, is used to search the table
200
to find all rows
201
with a parent hierarchical value identifier equal to
1000
. The node values
207
of all the matching rows
201
are cached, usually in memory, for return to the user. In the current example, only child node B for parent node A is found at this level. However, the query is not completely satisfied as deeper levels of children nodes for parent node A remain on the A/B branch of the data tree
100
. Therefore, the hierarchical value identifier
205
for A/B (
1001
) is used to find the children nodes of A/B. In the table
200
, two children nodes for A/B exist, A/B/C and A/B/E. Both the A/B/C and A/B/E branches of data tree
100
must be searched in the table
200
to find even deeper children nodes of parent node A. The search of the table
200
ends when the deepest node in all branches descending from parent node A have been found. The cached values are then returned to the user as the query result.
A similar traversing of the table
200
happens when a user wants to find all the children nodes of parent node A
101
that are four levels deep in the hierarchy of the data tree
100
. Counting parent node A
101
as level one, such a query will return hierarchical values A/B/C/D and A/B/E/F. However, the query process must traverse the entire table
200
as described above to return these two hierarchical values. The query process also must also keep track of its current depth within the hierarchy in order to know when it has reached the fourth level. Additional queries such as “all children greater than two and less than five levels deep” require still more logic in the query processing software.
Furthermore, determining the hierarchical value identifier for an existing hierarchical value or determining the hierarchical value represented by a hierarchical value identifier also requires a similar traversing of the table
200
.
The two tables shown in
FIG. 3
present a normalized version of the table
200
which eliminates the duplication of node values within the table
200
by storing the node values in a separate table
300
and assigning a node identifier
303
to each node value
305
. The node identifiers
303
are then used in the table
310
instead of the actual node values
207
which are shown in the table
200
. Although the normalization decreases the amount of data stored in the table
300
, it does not increase the efficiency of queries run against the tables
300
and
301
.
An alternate approach to speed up queries by using a relational database table with a different structure to store the hierarchical values. The table
400
in
FIG. 4
stores the hierarchical value
407
for a node as a full string in each row
401
in the table
400
. Each row
401
also contains a depth value
405
which represents the child node's position below the highest parent node in terms of level depth. Querying for all children nodes of a particular parent node matches the hierarchical value for the parent node against the full string
407
in each row (“prefix string match query”). With this approach it is also possible to query on children nodes of a certain depth in the hierarchy without the special processing described above. However, prefix string match queries are highly inefficient for all but the smallest tables. Moreover, the database table structure shown in
FIG. 4
requires more processing for update operations, and the size of the hierarchical value
407
can easily exceed the column limit for many relational database implementations.
Therefore, the is a need for a system that can quickly process complex queries on data stores containing hierarchical values with minimal impact on the processing time required to insert new hierarchical values in the data store.
SUMMARY OF THE INVENTION
The above-mentioned shortcomings, disadvantages and problems are addressed by the present invention, which will be understood by reading and studying the following specification.
An architecture for managing query friendly hierarchical values contains a data structure having node value entries for node values that make up the hierarchical values, hierarchical value entries for the hierarchical values expressed in terms of node value identifiers found in the node value entries, and hierarchy parent entries for parent-child pairs of hierarchy values. A node value entry contains a node value, a node hash value generated from the node value by a first hashing algorithm, and the node value identifier. The node hash value defines the node value entry in which the corresponding node value is stored. The hierarchical value entry contains a hierarchical value represented by the node value identifiers that correspond to the node values that make up the hierarchical value. The hierarchical value entry also contains a hierarchical value hash value derived from the node value identifier representation of the hierarchical value using a second hashin
Black Thomas
Le Uyen
Lee & Hayes PLLC
Microsoft Corporation
LandOfFree
Architecture for managing query friendly hierarchical values does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Architecture for managing query friendly hierarchical values, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Architecture for managing query friendly hierarchical values will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2438591