Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-11-09
2001-08-14
Gregory, Bernarr E. (Department: 3662)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
06275822
ABSTRACT:
FIELD OF THE INVENTION
The present invention is directed to an improvement in relational database systems and in particular to the maintenance of very large indexes that support efficient relational queries on databases.
BACKGROUND OF THE INVENTION
In relational database systems, it is important to create indexes on columns of the tables in the database. It is well-known that the efficiency of relational operations such as the JOIN operation or the evaluation of query constraints (SELECTION) is improved if the relevant columns of the table across which the operation take place are indexed. From the point of view of speeding query processing, it is desirable to have available indexes for all columns (and combinations) of all tables in a relational database.
In relational database systems, it is important to be able to efficiently add new data to an existing database. The addition of data to the table space may involve not only the addition of data, but the maintenance of indexes. The indexes for the existing data will be merged with the indexes for the data to be added.
For very small collections of data a merger of relational data tables and indexes may not pose a significant problem. For large data sets, however, the merging of the new data with the existing data, and the merging of associated indexes into the resultant table may be constrained by system limitations. The internal machine memory may be too small to accommodate the data structures that are being merged, resulting in time-expensive swapping of data from larger but slower data storage devices (typically magnetic disks).
Relational database indexing schemes are developed to permit the rapid execution of relational operations on the relational data. The techniques for maintaining relational database indexes will depend on the structure of the indexes to be merged.
One type of data structure which is of use in indexing relational data is the sort vector data structure. In the co-pending application entitled “Indexing Databases For Efficient Relational Querying”, naming Mariano Consens as inventor, and Joint Technology Corpomdon as assignee, a collection of data structures is described that indexes all the columns on all the tables of a database in a space and time efficient manner. The co-pending application describes the use of sort vector files, and associated join bit files in art index structure for relational databases. The use of sort vectors is not restricted to such applications, however, and issues relating to the merger of sort vector indexes have a wider application.
Because sort vectors are by definition self-referential, the merging of such data structures presents efficiency issues. To follow the self-referential pointers in large sort vector data structures may result in slow performance for database systems which rely on sort vector structures in the indexes. This is particularly true for large indexes.
It is therefore desirable to have a computer system which permits sort vector data structures to be merged efficiently. In particular, a method which avoids the need to follow the self-referential pointers in both the sort vectors to be is merged is desirable.
SUMMARY OF THE INVENTION
According to one aspect of the present invention, there is provided a system for the merger of two sort vector data structures.
According to a further aspect of the present invention, there is provided a method for merging a first sort vector and a second sort vector, the respective sort vectors each being represented by an appropriate data structure, each sort vector comprising a set of runs of contiguous entries comprising self-referencing pointers defining sequences of data, the method comprising the steps of
a) creating a location data structure comprising, for each entry in the first sort vector, a low limit pointer and high limit pointer pair,
b) for each entry in the first sort vector, when there is a run in the second sort vector corresponding to the entry in the first sort vector, initializing the location data structure by defining the low limit pointer and the high limit pointer to point to the entire range of the first run in the second sort vector corresponding to the entry in the first sort vector, and where there is no run in the second sort vector corresponding to the entry in the first sort vector setting the low limit pointer and the high limit pointer to values representing a no match condition,
c) making successive refinement passes through the first sort vector and the second sort vector until each low limit pointer and high limit pointer pair has converged, each refinement pass comprising the steps of
i) selecting successive entries in the first sort vector,
ii) for the selected entry in the first sort vector, following the entry's self-referential pointer to a following first sort vector entry,
iii) defining as target values the location data structure low limit pointer and high limit pointer for the location data structure entry corresponding to the following first sort vector entry,
iv) defining as current second sort vector entry values the location data structure low limit pointer and high limit pointer corresponding to the selected first sort vector entry,
v) in the range in the second sort vector defined by the current second sort vector entry values, locating the insertion points of the target values,
vi) replacing the target values in the location data structure entry corresponding to the following first sort vector entry with the insertion points of the target values, and
d) using the corresponding converged low limit and high limit pointer pair to determine an insertion point into the second sort vector for each entry in the first sort vector and to thereby merge the first sort vector ad the second sort vector into a merged sort vector.
According to a further aspect of the present invention, there is provided the above method further comprising a join bit merger for merger of a first join bit vector and a second join bit vector into a merged join bit vector, the first, second and merged join bit vectors corresponding to the first, second and merged sort vectors, respectively, the join bit merger comprising the steps of
a) initializing and equivalence flag for each entry in the first sort vector,
b) propagating the equivalence flags in the successive refinement passes of the method of claim 
1
 such that the equivalence flag for an entry in the first sort vector is set where the token sequence for the entry in the first sort vector is matched in the second sort vector, and
c) using the respective equivalence flags to determine the merged joint bit vector for the merged sort vector.
According to a further aspect of the present invention, there is provided the above method in which a first word list string data structure and a second word list string data structure are maintained for the tokens in the first sort vector and the second sort vector, respectively, the method further comprising
a) merging the first word list string data structure and the second word list string data structure to form a merged word list string data structure,
b) generating a first mapping data structure and a second mapping data structure, for recording the insertion points of entries in the first word list string data structure and of entries in the second word list string data structure, in the merged word list string data structure, respectively
c) generating a first inverse mapping data structure and a second inverse mapping data structure to record the inverse of the first mapping data structure and the second mapping data structure, respectively,
d) using the first and second mapping data structures, and the first and second inverse mapping data structures, to initialize the low limit pointer and the high limit pointer.
According to a further aspect of the present invention, there is provided the above method in which the values representing a no match condition are set to negative value pointers indicating the insertion point for the previous run in the second sort vector.
According to a further aspec
Consens Mariano Paulo
Snider Timothy
Beyer Weaver & Thomas LLP
Gregory Bernarr E.
Joint Technology Corporation
LandOfFree
Maintaining very large indexes supporting efficient... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Maintaining very large indexes supporting efficient..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Maintaining very large indexes supporting efficient... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2536222