Storage format for encoded vector indexes

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06725223

ABSTRACT:

FIELD OF THE INVENTION
This invention generally relates to a database management system performed by computers, and more specifically relates to the storage format of an encoded vector index (EVI).
BACKGROUND OF THE INVENTION
An index in a book facilitates locating information on a specific topic quickly and without blindly paging through the book. Database indexes provide similar benefits by providing a method to quickly locate data of interest. Without an index, a database performs a full table scan, blindly searching through every row in a database table until the target data is located. Thus, depending upon where data resides in a database table, such a scan can be a lengthy and inefficient process.
Indexed scans of database tables are more efficient than full table scans since the length of database index entries are in most cases shorter than the database table entries. Shorter entries mean that more index entries can be stored in a single computer page. Indexed scans can therefore result in a considerable reduction in the total number of computer pages that must be processed in order to locate the requested data.
While indexed scans of database tables can improve performance, the complexity of the data being scanned and of the nature of the database query still determine how effectively a query can be implemented. Different queries place differing levels of processing demands on the database in unique ways. As a result, different index types are needed to cope with a users' ever-changing workloads. One type of index is the encoded vector index (EVI), disclosed U.S. Pat. No. 5,706,495, Chadha et al., Jan. 6, 1998, Encoded-Vector Indices For Decision Support and Warehousing (hereinafter “Chadha”), which is incorporated by reference.
An encoded vector index (EVI) is a variation of the bitmap index concept. A bitmap index indicates whether a specific value exists for each row in a particular column. One bit represents each row. Thus, in the bitmap index for the value “MN” in the column “LOCATION,” the nth bit equals 1 if the nth row of the data table contains “LOCATION”=“MN,” or 0 if that row holds a value other than “MN.” An EVI serves a similar purpose, but only one index is necessary to account for all the values occurring in the column (whether they be “NY,” “MN,” or any other). So in an EVI on the “LOCATION” column, the nth position of the EVI contains a bit code that identifies the value of “LOCATION” in the nth row of the table. Thus, whereas a separate bitmap index is required to map each particular key value in a database field, only one EVI is required to represent the same information. Thus, an EVI saves computer memory by including all possible key values for a given field in one database index. Chadha discloses a method to efficiently scan relational database information by performing bit-vector operations on EVI's, instead of performing analogous operations on the relational database table itself.
Referring now to
FIG. 1
, a diagram explaining the components of a typical encoded vector index is illustrated. In this example, the EVI indicates which key value exists in the encoded database field for each relative database record number in a database table 40. Database table 40 is an exemplary database identifying locations and departments, for example, of a corporate organization. The EVI is formed over the location field
42
of the database
40
, which may include a large number of other fields (not shown).
The EVI
45
is made up of two tables: EVI symbol table 50 and EVI vector
60
. EVI symbol table 50 has an entry for each particular key value that can be found in the database field (in this case, the LOCATION field) of the particular database or database subset
40
for which the EVI is an index.
FIG. 1
illustrates an EVI for a subset of the database
40
, namely, the LOCATION field. Since only three different values appear in the LOCATION field of the database, EVI symbol table 50 contains three rows, one for each particular key value: “MN,” “ND,” and “WY.” Notably, the key values in the EVI symbol table 50 are stored in a sorted order, so that a given key value can be found in the table using an alphabetic binary search through the table. The EVI symbol table 50 provides a code for each of these key values, and further provides a count for each of these key values, indicating how many records in the database table contain the key value. The codes in the EVI symbol table 50 are used to decode EVI vector
60
, as described below. Notably, the codes are assigned to keys so that the codes are also in sorted order. The purpose of this will be explained below.
EVI vector
60
contains a row for every record in the database
40
for which the EVI is an index. Each vector row corresponds to a database record, and contains a code for the key value contained in the EVI indexed field. EVI vector
60
contains 20 rows, because there are 20 records in the database for which EVI is an index. Each code stored in EVI vector
60
corresponds to the value that exists in the EVI field in the corresponding database record.
The translation of the code is made possible by EVI symbol table 50. For example, for the first record in the database
40
, the LOCATION field
42
in database
40
has a value of “MN,” which corresponds to a code of 1 in EVI symbol table 50. The first relative record in EVI vector
60
. By looking at EVI symbol table 50, it can be seen that code
0
equates to a key value in the EVI field of “MN.”
Use of the EVI proceeds as outlined above. To search for a particular key value, that key value is converted to a code, and then the EVI vector is scanned, identifying each row in which there is a match to the desired code. Often the results are represented as a bit vector index, which can then be combined with other bit vector indexes to identify the results of a complex query. To search for records using a key range, the EVI symbol table is used to convert keys at the beginning and end of the range into codes. Then, the EVI vector is scanned, identifying each row having a code that falls between the codes for the range endpoints. As noted above, the codes are assigned to the keys in a sorted order, that is, so that the codes sort into the same order as the keys; thus, codes may be numerically compared to the range endpoint codes as the EVI vector is scanned to identify rows that meet a key range.
As noted above, EVI's are built to reflect the counts and key values in one or more particular database fields, as those values exist in a database at the time that the EVI is built. Unfortunately, databases are frequently updated. In order for an EVI to stay current and accurately reflect a database, the EVI must he updated whenever the value of the one or more field(s) over which the EVI is built changes. This also applies when new records are added to the database and when new records are deleted.
Changes to the relational database table can affect an EVI symbol table in two ways. First, a change to the database may only require a change in key count for one or more key values. An example of this first type of change is deleting a database record, adding a record having a key value that already exists in the EVI symbol table 50, or changing a database field from one key value to a second key value that also exists in the EVI symbol table 50. In this first type of change, the EVI symbol table 50 is updated by updating the key count(s) to reflect the changes made to the database fields itself, without requiring a new EVI symbol table entry. At the same time, the EVI vector
60
is updated by deleting, adding or modifying the record corresponding to the changed record.
The second type of change to a database requires a new EVI symbol table entry. An example of this type of change is when a new key value is added to a database field, that is, a record that is modified or created is given a key value that does not exist in that particular database field in any other record in the database. A new key value might replace an old key value, as p

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Storage format for encoded vector indexes does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Storage format for encoded vector indexes, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Storage format for encoded vector indexes will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3188315

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.