Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-05-25
2003-09-16
Mizrahi, Diane D. (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06622141
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to a bulk loading method for use in a high-dimensional index structure; and, more particularly, to a bulk loading method of using some parts of dimensions to thereby accelerate an index construction and improve a search performance, and a computer readable recording medium in which a program implementing the method is recorded.
PRIOR ART OF THE INVENTION
In general, a bulk loading method is a computational method which constructs an index by inserting a substantial amount of static high-dimensional data at one time instead of inserting one-by-one data, thereby reducing an index construction time and improving a search performance.
Conventional bulk loading algorithms are classified according to whether or not employing a sort and further divided into a bottom-up and a top-down fashion.
First of all, as methods for performing the sort, there are following methods such as NX[Roussopoulos, “Direct Spatial Search On Pictorial Databases Using Packed R-Trees”, Proc. ACM SIGMOD, 1985], HS[Kamel, “On Packing R-Trees”, Proc. 2nd International Conference on Information and Knowledge Management, 1993], STR[Scott T. Leutenegger, “STR: A Simple and Efficient Algorithm for R-Tree Packing”, Proc. 13th ICDE, 1997], TGS[Yvan J. Garcia, “A Greedy Algorithm for Bulk Loading R-Trees”, University of Denver Computer Science Tech., 1997], BnC[Aggarwal et al., System and Method for Construction of a Data Structure for Indexing Multidimensional Objects, U.S. Pat. No. 5,781,906], and the like.
Among them, algorithms performing the index construction using the bottom-up fashion will be first explained.
It is easy to implement an NX algorithm that constructs an index after executing a sort only for an X-axis and this algorithm has a good search performance for a point query. However, since Y-axis values are neglected and thus perimeters of parent nodes become larger, there occurs a defect to deteriorate a performance for a region query.
Therefore, HS has introduced an algorithm using a storage sequence determined by a Hilbert curve. Since the Hilbert algorithm reduces a perimeter of an entire region as managing a size of the entire region as small, it can substantially advance a performance for a region query compared with the NX algorithm.
Meanwhile, it is easily implemented a STR(Sort-Tile-Recursive) algorithm clustering the rectangles in order to minimize the number of visiting nodes during the execution of query processing and this algorithm can obtain a substantially enhanced performance compared with conventional algorithms.
However, sometimes the Hilbert algorithm can accomplish a better performance for VLSI data compared with the STR algorithm. Therefore, it can be said that the performances of the Hilbert algorithm and the STR algorithm depend on the type of given data. The above NX, HS and STR algorithms construct the index in the bottom-up fashion by using a fixed pre-processing.
On the other hand, TGS has proposed a Top-down Greedy Split (TGS) algorithm constructing a tree in the top-down fashion. The TGS algorithm uses a pre-processing for rearranging the data contained in a database and can construct a tree requiring fewer disk accesses than trees made by conventional methods. However, since a multiple rearrangement should be considered, there is a drawback in which a longer time is required to construct a tree.
In order to prevent a search performance from being deteriorated by an increase of an MBR (Minimum Bounding Region) and an overlapped region of an algorithm such as the HS algorithm using a linear ordering method, a newly proposed BnC method constructs an index tree by using two stages, i.e., binarization and compression. However, this method consumes a longer construction time because of using a large amount of sort for split dimensions in the binarization stage.
Then, as bulk loading algorithms not performing the sort, there are BFT [Van Den Bercken, “A General Approach to Bulk Loading Multidimensional Index Structures”, 23rd Conf. VLDB, 1997] and UBBT[Christian Böhm, “Efficient Bulk Loading of Large High-Dimensional Indexes”, Int. Conf. on Data Warehousing and Knowledge Discovery Dawak, 1999]. The bulk loading algorithm of BFT constructs an effective index structure by using a split or merging scheme instead of a data sort scheme. Specially, this algorithm can reduce an index construction time by using a buffer tree including a buffer on an index node.
The UBBT is a method composed of the bottom-up fashion without performing the sort. This method has characteristics of using a binarization method when dividing a data set instead of the sort and applying an unbalanced split method for the binarization. The conventional algorithms except for the UBBT have a disadvantage of improving a performance only for one of an index construction time and a search time.
That is, although the methods employing the sort improve the search performance, it increases an index construction time by computations required for performing the sort. On the other hand, the methods not performing the sort can reduce the index construction time since it does not perform the sort, whereas it deteriorates the search performance.
SUMMARY OF THE INVENTION
It is, therefore, an object of the present invention to provide a bulk loading method, for use in a high-dimensional index structure using some parts of dimensions based on an unbalanced binarization scheme, for reducing the index construction time and improving the search performance, and a computer readable recording medium in which a program implementing the method is recorded.
In accordance with one aspect of the present invention, there is provided a bulk loading method employed in a high-dimensional index structure for constructing an index, which comprises the steps of: (a) calculating the topology of the index by recognizing the information for the index to be constructed with a given data set; (b) splitting the given data set into sub-sets of data by repeatedly performing an establishment of a split strategy and a binarization based on the calculated topology of the index; (c) if a leaf node is derived from the sub-sets of data divided through a top-down recursive split process, reflecting a MBR of the leaf node on a higher node; and (d) if a non-leaf node is generated, repeatedly performing the steps (a) to (c) for another sub-set of data to thereby produce a final root node.
In accordance with another aspect of the present invention, there is provided a computer program product for use in an indexing unit including a process, comprising: a computer readable medium; a first program instruction code for calculating the topology of the index by recognizing the information for the index to be constructed with a given data set; a second program instruction code for splitting the given data set into sub-sets of data by repeatedly performing an establishment of a split strategy and a binarization based on the calculated topology of the index; a third program instruction code for, if a leaf node is derived from the sub-sets of data divided through a top-down recursive split process, reflecting a MBR of the leaf node on a higher node; and a fourth program instruction code for, if a non-leaf node is generated, repeatedly executing the above processes performed by the first to third program instruction codes for another sub-set of data to thereby produce a final root node.
REFERENCES:
patent: 5781906 (1998-07-01), Aggarwal et al.
patent: 6154746 (2000-11-01), Berchtold et al.
patent: 6263334 (2001-07-01), Fayyad et al.
patent: 6408300 (2002-06-01), Bergman et al.
Efficient Bulk Loading of Large High-Dimensional Indexes, 10 pages.
A Generic Approach to Bulk Loading Multidimensional Index Structures, 14 pages.
A Greedy Algorithm for Bulk Loading R-trees, 19 pages.
Kim June
Lee Hun-Soon
Pee June-Il
Song Seok-Il
Yoo Jae-Soo
Blakely & Sokoloff, Taylor & Zafman
Electronics and Telecommunications Research Institute
Mizrahi Diane D.
Wu Yicun
LandOfFree
Bulk loading method for a high-dimensional index structure does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Bulk loading method for a high-dimensional index structure, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Bulk loading method for a high-dimensional index structure will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3009131