Coarse indexes for a data warehouse

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C706S012000, C705S052000, C705S054000, C379S220010

Reexamination Certificate

active

06216125

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to indexing of very large databases.
BACKGROUND OF THE INVENTION
Data warehouses allow an organization to gather, store and make use of operational data, that is, information collected during normal operations. For example, a large retail store chain may warehouse cash register transactions to identify trends, e.g., what products are popular in different regions of the country. A typical data warehouse will receive a feed of detail dam Because of the large volume of detail data in the warehouse, many data warehouse features involve computing aggregate statistics of the detail data, and mining the aggregate statistics.
In many cases, it is desirable to retrieve specific records from the stored detail data For example, a telecommunications provider must be able to provide records of phone calls originated by or received by individuals in response to law enforcement requests. Since records of phone calls involving a particular phone number are very sparse in the data set, indexing is necessary to retrieve these records efficiently.
The detail data is usually collected continuously in time. The large size of the detail data set requires the use of horizontal partitioning. One reason is the limit on the maximum file size in many operating systems (typically 2 Gbytes). Another reason to use partitioning is to simplify data management. Keeping a rolling 2-year window of data is made easier if each partition corresponds to one day's worth of data finally, partitioning can cluster data relevant to answering a query (i.e., sum sales by day). A very large data table will be composed of hundreds to thousands of data partitions. For example, a typical conventional database
100
is shown in FIG.
1
. Database
100
includes a number of data partitions, such as data partitions
102
A-C. Each data partition includes a number of rows of data, such as rows
104
A and
104
B. Each row includes a number of keys, such as keys
106
A and
106
B.
A typical database organization is to have separate indexes for each data partition. Such an organization simplifies data management (indexes are dropped with data), reduces index size (which can be a significant fraction of the data size), and increases index concurrency (which can be a bottleneck). For example, in
FIG. 1
, each data partition
102
A-C of database
100
has its own indexes
108
A-C. These prior art indexes are termed dense indexes, because each index refers to every record in its data partition.
Data warehouses allow users to make sense of large quantities of detail data, typically by extracting summaries small enough for convenient manipulation. While most queries can be answered through the summary data, some queries can only be answered by accessing the detail data. For example, after using the summaries to identify a set of “interesting” customers, it may be desired to extract all detail records that describe an interaction with those customers. If the number of customers in the database is very large, it is likely that most of the data partitions of the table do not contain records describing an interaction with a particular customer. In a conventional database architecture, searching for these records requires that every index be searched for the key value. The cost of opening the index files and searching them can be much larger than the cost of retrieving the records that match the key value. The problem is that many index searches return a “not found” answer.
A need arises for a technique by which a very large database can be searched more quickly and with lower cost than can be achieved using a conventional database access technique.
SUMMARY OF THE INVENTION
The present invention is a coarse database index, and system and method of use therefor, that will quickly indicate which data partitions of a table contain a given key. Once the target data partitions are located, the exact record locations can be found using traditional indexes. The coarse indexes take little space, can be updated quickly, and searched quickly.
A coarse index, according to a preferred embodiment of the present invention, is in conjunction with a database including a plurality of data partitions. Each data partition includes data, including a plurality of key values of at least one key, and at least one dense index referencing the data. The coarse index indexes the plurality of key values according to data partitions containing each key value. The coarse index includes a first bitmap, which includes a plurality of bits arranged in a matrix. The first axis of the matrix represents data partitions, while the second axis of the matrix represents key values. Each bit indicates whether a key value is present in a data partition. Preferably, the first bitmap is arranged in key value major format.
The coarse index may also include a second bitmap, which includes a plurality of bits arranged in a matrix. The first axis of the matrix represents data partitions, while the second axis of the matrix represents key values. Each bit indicates whether a key value is present in a data partition. Preferably, the second bitmap is arranged in data partition major format.
In one aspect of the present invention, the second bitmap may be transformed from data partition major format to key value major format. In another aspect of the present invention, the first bitmap is compressed and is partitioned into a plurality of blocks. The coarse index further includes an index referencing each block of the first bitmap to a portion of an uncompressed first bitmap corresponding to each block In still another aspect of the present invention, the second bitmap is compressed and is partitioned into a plurality of blocks. The coarse index further includes an index referencing each block of the second bitmap to a portion of an uncompressed second bitmap corresponding to each block.


REFERENCES:
patent: 3678461 (1972-07-01), Choate et al.
patent: 5551027 (1996-08-01), Choy et al.
patent: 5649181 (1997-07-01), French et al.
patent: 5710915 (1998-01-01), McElhiney
patent: 5794229 (1998-08-01), French et al.
patent: 5819256 (1998-10-01), Ozbutun et al.
patent: 5907297 (1999-05-01), Cohen et al.
patent: 5924088 (1999-07-01), Jakobsson et al.
patent: 5960194 (1999-09-01), Choy et al.
patent: 6003036 (1999-12-01), Martin
Dehmeshki, J. et al., “Classification of coal images by a multi-scale segmentation techniques”, Proceedings of the 1995 International Symposium on Computer Vision, Nov. 21-23, 1995, pp. 271-276.*
Dharanipragada, S. et al., “A fast vocabulary independent algorithm for spotting words in speech”, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal processings, May 12-15, 1998, vol. 1, pp. 233-236.*
Raisch, J., “Qualitative control with quantitative Models”, Second International Conference on Intelligent Systems Engineering, Sep. 5-9, 1994, Conference Publication No. 395, pp. 229-234.*
Yuen H. et al., “Efficient variable rate vector quantization using quadtree segmentation”, IEEE Transaction on Consumer Electronics, May 1996, vol. 42, Issue: 2, pp. 212-215.*
P. O'Neil and D. Quass, “Improved Query Performance With Variant Indexes,”SIGMOD'97,Tucson, Arizona, May 1997, pp. 1-12.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Coarse indexes for a data warehouse does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Coarse indexes for a data warehouse, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Coarse indexes for a data warehouse will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2444417

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.