Apparatus and method for aggregate indexes

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06487546

ABSTRACT:

I. BACKGROUND OF THE INVENTION
A. Field of the Invention
The present invention relates generally to database systems, and in particular to a database system using an aggregate index.
B. Description of the Prior Art
Relational databases typically store information in tables comprised of one or more rows, each row consisting of one or more fields, and each field storing a value for a respective column. The tables are managed by a database management system (DBMS). The DBMS receives queries for the database, and retrieves information from the database in response to the query. The DBMS also updates information in the database.
In order to process a query, a DBMS may require information stored in one or more rows from a database table. To find the required rows in a table, the DBMS either scans all the rows of the table or uses an index to locate the required rows. Some database queries (hereinafter “aggregate queries”) request information that is based on information related to one or more rows in the database tables. Such information is referred to as aggregate information. The requested aggregate information could be, for example, the average or sum of the values stored in the second field of the rows whose first field stores a particular value.
FIG. 1
shows a typical database table T
110
containing two columns, x and y. Consider an aggregate query which requests, for each respective value for column x stored in T, the average of the values stored for column y in the rows of T that store the respective value for column x.
select x, avg (y)
from T
group by x;
A DBMS typically would perform the above query by sorting the rows of T on column x, and then determining the average value for the group of column y values corresponding to each value of column x.
For example, for column x, value 1, the average of corresponding y values (i.e., 2, 5, 8, and 5) is equal to (2+5+8+5)/4=5; for column x, value 2, (6+3)/2=4.5; for column x, value 3, (3+9+9)/3=7; and for column x, value 4, (8+1)/2=4.5. Determining aggregate information in this manner, however, requires the DBMS to scan the entire table, which can be extremely CPU and I/O intensive.
To reduce the time it takes to locate rows in a table, an index to the table may be used. An index for a table is based on one or more columns of the table (referred to as the “key”) and contains one or more entries, each entry corresponding to a respective value of the key and storing information (typically location) relating to one or more rows in the table that store the respective value of the key. One commonly used type of index, a B-tree index, includes a tree data structure of branch nodes and, at the bottommost level, leaf nodes. Each branch node comprises one or more entries, each entry specifying a range of key values and a pointer to another branch node or to a leaf node. Each leaf node comprises one or more entries, each entry specifying the locations (e.g., rowIDs) of one or more rows storing a particular key value.
In the most common implementation of a B-tree, there is a separate entry for each rowID corresponding to a particular key value. In other implementations, a single entry stores all the rowIDs corresponding to a particular key. Each branch node contains information that is used by the DBMS to traverse the tree in order to arrive at the leaf node containing an entry that corresponds to a particular requested key value.
FIG. 2
is an example of such a B-tree index for the table T
110
shown in FIG.
1
. The key for the index is column x of Table T
110
. The tree is comprised of nodes
210
,
212
and
214
. Node
210
is a branch node. Branch nodes point to other branch nodes or to leaf nodes.
FIG. 2
shows a single branch node, node
210
, but more may be used. Each branch node includes at least one range field. The desired key or key range determines which branch should be followed from the node. In
FIG. 2
, for example, node
210
specifies two key ranges: <=2 and >2. Thus, a query looking for all values associated with column x values of 1 require traversal through the uppermost branch from node
210
because the value 1 is in the range of <=2.
The lowest level of nodes, nodes
212
and
214
, are leaf nodes that point to or contain a linked list of rowIDs. The leaf nodes are connected in a doubly-linked list, indicated by link
216
. Each rowID specifies a location of a row in the database. Nodes
212
and
214
contain key values corresponding to values in column x of table T
110
of FIG.
1
. Node
212
, value 1, points to a linked list of four rowIDs. The four rowIDs point to respective physical locations of the four rows of table T
110
that store a value of 1 in the field for column x. Node
212
with key value 2, node
214
with key value 3, and node
214
with key value 4 each point to respective linked lists of rowIDs pointing to corresponding rows.
The doubly-linked list can be used for operations that require obtaining values from a series of leaf nodes. A range scan is an example of such an operation. A range scan is performed by finding rows with keys in a given range. To perform a range scan using the doubly-linked list, the index is traversed to the leaf node associated with the beginning of the range. From that leaf node, the doubly-linked list is traversed through the leaf nodes until the leaf node at the end of the range is reached. Thus, the doubly-linked list is used instead of traversing the index from top to bottom to find each key value in the range.
An example of determining aggregate information is illustrative of how the index of
FIG. 2
could be used. If the query on the database requests the average of y values for the rows of table
110
having an x value of 3, the DBMS first visits branch node
210
, and determines that the desired key value 3 falls within the range “>2.” The branch node corresponding to a key range of “>=2” points to leaf node
214
. The DBMS then visits leaf block
214
, which contains a leaf node corresponding to a key value of 3. This leaf node points to a linked list of rowIDs specifying the physical locations of rows of table T
110
that store a value of 3 for column x. The DBMS then uses the rowIDs to access the rows, and determine their average y value. The index, therefore, eliminates the need for scanning each row in the table, and provides a structure for accessing particular rows in the database more quickly.
The following is another example of an aggregate query:
select x, avg (y)
from T
where 1<=T.x<=3
group by x;
This aggregate query can be processed by a range scan of the index. The range scan is basically performed in the same manner as a table scan, but in this case the value is constrained to be a value of column x between 1 and 3 (i.e., “where 1<=T.x<=3”). At node
210
, the criteria “1<=T.x<=3” is met by range <=2, which results in DBMS traversing the index to node
212
. At node
212
, the criteria 1<=T.x<=3 is met by the value 1, which points to a linked list of leaf nodes containing rowIDs pointing to the physical location of the values in column y corresponding to the column x, value 1. The corresponding y values are then accessed using the rowIDs. The DBMS repeats this process for column x, value 2, then traverses link
216
and repeats the process for column x value 3. Because value 3 is the end of the range, the DBMS stops accessing the values at the rowIDs associated with the column x values. The DBMS then determines the average of the accessed y values, thus completing the range scan. The process of stepping through the linked list of column x values, and accessing the physical locations corresponding to the rowIDs is extremely time consuming.
As the above examples indicate, traditional techniques for processing aggregate queries involve reading several index and table blocks to determine aggregate information. There is, therefore, a need for an apparatu

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Apparatus and method for aggregate indexes does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Apparatus and method for aggregate indexes, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for aggregate indexes will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2970010

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.