Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-06-30
2001-04-24
Amsbury, Wayne (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
Reexamination Certificate
active
06223182
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to computer systems and more particularly to efficient techniques for organizing data.
BACKGROUND OF THE INVENTION
Very large databases pose significant challenges in storing, accessing, and managing data. Often, large databases attribute their size to a few very large tables or other database objects as opposed to a very large number of smaller tables. For example, most data warehouses and decision support systems have a few very large fact tables and several smaller lookup tables. As another example, spatial databases typically store the shape and location of large numbers of objects such as houses, streets, and zip code areas.
Partitioning is a strategy employed to improve the storage, access, and administrative performance of very large database objects. More specifically, a database object such as a relational database table or index is partitioned by subdividing the database object into several smaller independent subsets of the database object based on a “partition key.” For “range partitioning” of non-spatial, relational database tables, the partition key is typically selected from among the date, primary key, or foreign key columns. The partition key is then used to divide the table into one or more ranges of values. When a query is issued to access and retrieve data from a range partitioned table, the database server can ignore partitions outside of certain ranges, thereby improving query performance and response time. For example, a very large on-line transaction processing (OLTP) database can be range partitioned based on the year. Thus, queries on the OLTP database that specify the year 1997 can safely ignore data in partitions for other years.
Spatial databases also benefit from partitioning. However, range partitioning on a spatial dimension is often inappropriate for spatial databases. For example, a common spatial query is: given a particular point in space, what objects are within a certain distance of the point? With this type of query, it is desirable to group data that is spatially close to each other, but range partitioning, on the other hand, groups data primarily according to a single dimension. For example, range partitioning along the x-axis may groups the point (1, 1) in the same partition as point (1, 100) even though point (2, 2) is much closer. Range partitioning is also inappropriate for multi-dimensional non-spatiotemporal data such as sales data (e.g., dollars and product codes) when no one dimension dominates over the other. Therefore, there is a need for a data partitioning methodology that can handle spatiotemporal data as well as non-spatiotemporal data in a way that does not cause one particular dimension to dominate the partitioning criteria.
Conventionally, data partitioning operates best if the entire set of data is provided at the time of partitioning, because an optimal size of the partitions is a function of the size of the data set and influences the performance of the query. In many real world situations, however, such as receiving spatial data from a satellite, the entirety of the data is not immediately available and therefore the data set needs to be incrementally partitioned.
Under conventional methodology, the data is originally partitioned based on a guess of the optimal partition size and then repartitioned after all the data has been collected. This process of repartitioning is very expensive and typically involves exporting the data from the database to a flat file outside the database system. The flat file is then analyzed to determine the partitions. While the data is being analyzed in the flat file, it is unavailable for queries. Furthermore, many of the high performance and reliability features that are characteristic of relational databases are unavailable to this extra-database partitioning process. For example, if the system goes down during the off-line repartitioning, the recovery features of the database system cannot be used to recover the repartition operation. Therefore, the repartition process is forced to start all over again. Consequently, the “downtime” of the data can be considerable if a large data set is partitioned. Therefore, there is a need for a data partitioning methodology that efficiently handles incremental partitioning and reduces the downtime during partitioning.
In conventional relational database systems, partitions are stored as tables organized into rows and columns. However, rows may be stored in the table in any order, so that rows that contain information about two points that are spatially close to each other could reside on separate disk blocks in the database, requiring multiple disk input/output operations to retrieve them. Therefore, there is a need for intra-partition physical clustering of spatial data so that the records for closely related spatial data will reside close to each other.
SUMMARY OF THE INVENTION
These and other needs are addressed by the present invention, in which a code for use as a partition key is generated by bit-interleaving values from at least two fields of a data container. In one aspect, bit-interleaving values to form a code for a partition key enables multi-dimensional non-spatial and non-temporal data to be used for partitioning even when no particular dimension is dominant.
Another aspect of the invention relates to a computer-implemented method and a computer-readable medium bearing instructions for organizing data in a data container with records and fields. Codes for records are determined based on bit-interleaving values from at least two of the fields of the corresponding records. A database object is created having rows, where each row corresponds to a record and has a first column for holding the codes for the record and a second column for holding a reference to the record. The database container is subdivided into subsets based on the database object. In one embodiment, another database object is created containing code prefixes corresponding to the subsets. Use of a database object within the database for partitioning reduces the downtime associated with conventional techniques that export the data outside the database.
Yet another aspect of the invention is directed to intra-partition physical clustering of data. Accordingly, data within a data container is organized by determining codes based on bit-interleaved field values and subdividing the data container into subsets based on the codes. The subsets are stored as a tree data structure, such as B-tree index, with entries corresponding to records in the data container and indexed on the codes for the corresponding records.
Still other objects and advantages of the present invention will become readily apparent from the following detailed description, simply by way of illustration of the best mode contemplated of carrying out the invention. As will be realized, the invention is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the invention. Accordingly, the drawing and description are to be regarded as illustrative in nature, and not as restrictive.
REFERENCES:
patent: 5495608 (1996-02-01), Antoshenkov
patent: 5568639 (1996-10-01), Wilcox et al.
patent: 5649181 (1997-07-01), French et al.
patent: 5652879 (1997-07-01), Harris et al.
patent: 5710915 (1998-01-01), McElhiney
patent: 5873097 (1999-02-01), Harris et al.
patent: 5893104 (1999-04-01), Srinivasan
patent: 5963956 (1999-10-01), Smartt
Agarwal Nipun
Feng Linda
Robertson Timothy
Amsbury Wayne
Ditthavong & Carlson P.C.
Oracle Corporation
LandOfFree
Dynamic data organization does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Dynamic data organization, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Dynamic data organization will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2537828