Method and system for efficiently searching an encoded...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06285994

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to encoded vector indices, and more particularly to efficiently searching an encoded vector index.
BACKGROUND OF THE INVENTION
The increased focus on complex queries for data warehousing and OLAP (OnLine Analytical Processing) has resulted in a revival of interest in bitmap indexes. The basic idea behind a bitmap is to use a single bit (instead of multiple bytes of data) to indicate that a specific value of an attribute is associated with an entity. A bit-mapped index is simply a very, very long string of bits, commonly called bit vector or bitmap. Each bit in the bitmap represents each row in a table, and the bit is set to 1 if an associated entry is contained in the list represented; otherwise, the bit is set to 0. The relative position of the bit within the bitmap can be mapped to the relevant record ID of the row in the table.
This technique is particularly attractive when the set of possible values for the index key is small. Input/Output (I/O) is significantly reduced when a large fraction of a large table is represented using bitmap lists. However, when a large number of values exist in an index, it would require large number of bitmaps that are likely to be rather sparse (i.e., very few bits will be 1 in the bitmaps) and would result in heavy storage requirements for storing a lot of zeros.
Therefore, the bit mapped approach is not practical for large dimensions and fact tables. The impracticality leads to a better bitmap schema called Encoded Vector Index (EVI) that retains much of the processing advantages of bit-mapped indexing and can also support very large tables with larger cardinalities. An EVI consists of a Symbol Table and an Encoded Vector. The Symbol Table contains a sorted list of all the distinct values of a column in a table, a unique code assigned for each distinct value, and an occurrence count for each distinct value that indicates the number of rows in the table with that distinct value. The Encoded Vector is an array with a dimension equal to the number of rows in the table. Each entry in the Encoded Vector contains the code from the Symbol Table that corresponds to the value contained in the row of the table.
By way of example,
FIG. 1
illustrates a data table
10
, Table A, a symbol table
12
, and an encoded vector
14
. The data table
10
includes data identified in an ID column by appropriate symbols. The symbols include the alphabetic characters ‘A’, ‘E’, ‘I’, ‘K’ and ‘W’, as shown in the symbol column of symbol table
12
. The associated encoded value for each symbol is also included in the symbol table
12
. These encoded values, ‘0’, ‘1’, ‘2’, ‘3’, and ‘4’, are utilized to represent the data in the data table
10
in the encoded vector
14
, as shown.
FIG. 2
illustrates a prior art approach to data searching in an environment that utilizes an encoded vector index. When performing a search of the encoded vector index, the process initiates with the receipt of a search query (step
20
). The items to be searched, either range or point, are then used in conjunction with the symbol table. Thus, a sequential look-up of the symbol table is performed with each search key to develop a candidate code list (step
22
). Then, using the candidate code list, the candidates from the candidate code list are compared with each entry in the encoded vector (step
24
). When any one of the candidates in the candidate code list matches the entry data of the encoded vector, a bit is set in a temporary bitmap (step
26
). The temporary bitmap provides the results to the search query over the entire encoded vector.
To illustrate the prior art approach, the following search query is presented and performed using the example data table
10
, symbol table
12
, and encoded vector
14
from FIG.
1
:
Select * from TableA
where ‘A’<=TableA.Key<=‘F’ OR
TableA.Key=‘J’ OR
Table A.Key=‘K’
The resultant candidate code list
28
is shown in FIG.
3
and includes the range of values [0,1] and the value ‘3’ in accordance with the encoded values associated with the symbols that match the search query.
FIG. 3
further illustrates the use of the candidate code list in conjunction with the encoded vector
14
that results in the temporary bitmap of search results
30
. As described above, for each entry in the encoded vector
14
, the entire candidate code list
28
is compared against each entry to determine whether the entry meets the search criteria. For each entry that does meet the search criteria, a bit is set to a ‘1’ value in the temporary bitmap
30
.
While the temporary bitmap does provide sufficient search results, the process of producing the temporary bitmap may be quite time-consuming due to two possible problems. When the search query produces a long list of search keys and the symbol list is long, the sequential process that produces the candidate code list takes a significant amount of time. Further, when the candidate code list is long, the sequential process of comparing each candidate to the encoded v tor entry also takes a significant amount of time.
Accordingly, a need exists for more efficient vector index searching. The present invention addresses such a need.
SUMMARY OF THE INVENTION
The present invention provides a method and system for efficiently searching an encoded vector index. The aspects include the translation of a search query into a candidate bitmap, and the mapping of data from the candidate bitmap into a search result bitmap according to entry values in the encoded vector index. Further, the translation includes the setting of a bit in the candidate bitmap for each entry in a symbol table that corresponds to candidate of the search query. Also included in the mapping is the identification of a bit value in the candidate bitmap pointed to by an entry in an encoded vector.
Through the present invention, scanning of an encoded vector becomes straightforward and fast. For each code in the encoded vector, a simple bit lookup is performed, rather than looping through the entire candidate code list until a match is found. These and other advantages of the aspects of the present invention will be more fully understood in conjunction with the following detailed description and accompanying drawings.


REFERENCES:
patent: 4727354 (1988-02-01), Lindsay
patent: 5226135 (1993-07-01), Mishina et al.
patent: 5287494 (1994-02-01), Garcia et al.
patent: 5444488 (1995-08-01), Goubault et al.
patent: 5477221 (1995-12-01), Chang et al.
patent: 5706495 (1998-01-01), Chadha et al.
patent: 5761652 (1998-06-01), Wu et al.
patent: 5778361 (1998-07-01), Nanjo et al.
patent: 5779184 (1998-08-01), Fulton et al.
patent: 5781128 (1998-07-01), Shlomot
patent: 5822723 (1998-10-01), Kim et al.
patent: 5963935 (1999-10-01), Ozbutun et al.
patent: 6141656 (2000-10-01), Ozbutun et al.
patent: 6173281 (2001-01-01), Bestgen et al.
patent: 0446968A2 (1984-06-01), None
IBM Technical Disclosure Bulletin, “Means for Updating and Searching Sparse Tables”, vol. 32, No. 4A, Sep. 1989.
IBM Technical Disclosure Bulletin, “C++ Data Structures to Support Finite State Machine”, vol. 38, No. 12, Dec. 1995.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for efficiently searching an encoded... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for efficiently searching an encoded..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for efficiently searching an encoded... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2507824

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.