Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-09-28
2002-12-03
Metjahic, Safet (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
Reexamination Certificate
active
06490578
ABSTRACT:
COPYRIGHT NOTICE
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND OF THE INVENTION
The present invention relates generally to data processing environments and, more particularly, to system and methods for improved indexing and processing of date information present in data records in those environments.
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as “records” having “fields” of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
One purpose of a database system is to answer decision support queries and support transactions. A query may be defined as a logical expression over the data and the data relationships set forth in the database, and results in identification of a subset of the database. For example, a typical query might be a request, in SQL syntax, for data values corresponding to all customers having account balances above required limits. During query processing, a database system typically utilizes one or more indexes to answer queries. Indexes are organized structures associated with the data to speed up access to particular data values (i.e., answer values). Indexes are usually stored in the database and are accessible to a database administrator as well as end users. The basic operation of database systems, including the syntax of SQL (Structured Query Language), is well documented in the technical, trade, and patent literature; see, e.g., Date, C.,
An Introduction to Database Systems
, Volume I and II, Addison Wesley, 1990; the disclosure of which is hereby incorporated by reference.
“Data warehouse” systems represent a type of database system optimized for supporting management decision making by tracking and processing large amounts of aggregate database information—that is, the data warehouse. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time. Development of a data warehouse includes development of systems to extract data from operating systems plus installation of a warehouse database system that provides managers flexible access to the data. A well known example of a data warehouse system is Sybase® Adaptive Server® IQ (ASIQ), available from Sybase, Inc. of Emeryville, Calif.
The desire to support fast, ad hoc query processing in large data warehouse applications has led to the development of novel indexing schemes. Within ASIQ, for instance, there are several special-purpose indexing structures, including FastProjection (FP), LowFast (LF), HighNonGroup (HNG), LowDisk (LD), and HighGroup (HG). Each will be briefly described in turn.
FastProjection methodology entails vertical partitioning of the data into a single column stored as an array of data, where each cell in the array is as wide as the data column and the number of cells in the array matches the number of rows in the table. This index is used to provide fast access to the original data value given a row number. In LowFast methodology, an index is employed which comprises a B-Tree together with a set of bitmaps. The B-Tree has one node for each unique value in the column and each node has a bitmap associated with it. The associated bitmap has the nth bit on if the nth row of the table contains the value in the B-Tree. This index is used to provide fast access to a set of row numbers given a value among a small set of distinct values (under 1000). Without further optimization or compression, the technique requires a fair amount of disk space to do this.
In HighNonGroup methodology, an index which comprises a set of bitmaps is employed. The number of bitmaps in the set (i.e., 8×width-in-bytes) depends on the maximum width of the data in the column. The value for each row in the table is broken into its component bits. The nth bitmap has the mth bit on if the nth bit of the value (taken left to right) at the mth row of the table is on. This index is used to provide fast access to a set of row numbers given a range of values among a large set of distinct values (over 1000), when query grouping operations are not needed. It uses a moderately small amount of disk space to do this.
In the LowDisk methodology, an index which comprises a B-Tree and an HNG index is employed. The B-Tree has one node for each unique value in the column and each node has a small unique integer assigned to it. The small unique integer assigned to the value for each row in the table is broken into its component bits. The nth bitmap has the mth bit on if the nth bit of the small unique integer (taken left to right) assigned to the value at the mth row of the table is on. This index is used to provide moderately fast access to a set of row numbers given a value among a small set of distinct values (e.g., under 1000). It uses a very small amount of disk space to do this but is typically not as fast as the LowFast index.
With HighGroup methodology, an index is employed comprising a B-Tree, an HNG index, and a variation on the FP index. The B-Tree has one node for each unique value in the column and each node has a location in the modified FP index. The embedded HNG index is a normal HNG index and is used to satisfy wide range queries and aggregation functions (i.e., SUM). The B-Tree and modified FP are used to provide fast access to a set of row numbers given a value among a large set of distinct values (over 1000). They are also used to provide efficient handling of grouping queries. The technique uses a relatively large amount of disk space to do this.
For further description of the above-mentioned indexing methodologies, as well as a description of the ASIQ system and architecture, see, e.g., U.S. Pat. Nos. 5,649,181, 5,852,821, 5,794,229, 5,794,228, and 5,924,091, the disclosures of which are hereby incorporated by reference.
During database system operation, query processing depends on the ability to have an index perform comparison operations (e.g., <, <=, =, !=, >, and >=operations) and range operations (e.g., A<=×<B). The cost of performing any one of these operations directly affects how well suited the index is for resolving a given query. Often, data warehouses track historical (i.e., date-based) information, such as sales information gathered over a period of time. Expectedly, date processing within queries tends to be quite common in the typical data warehouse. Representative examples include the following, shown as sample SQL “select” queries.
select . . . from table where 1_shipdate>=‘1993-07-01’ and 1_shipdate<‘1993-10’01
select . . . from table where o_orderdate<‘1993-03-10’
These examples may serve to illustrate some of the indexing operations performed on date columns. Choosing an appropriate index for the date column will directly affect how quickly the query can be resolved. For a low number of date values (i.e., low cardinality data), one would typically select an HNG index type. These indexes are good for a wide range of queries, including aggregations, since query results can be obtained by reducing a given query into a sequence of logical AND, OR, and XOR operations performed on these bit vectors or bitmaps.
In gauging query processing efficiency, a good measure of the performance of any given operation (e.g., comparison operation) can be thought of as the number of bitmap-logical operations the system has to perfor
Alaubaidi Haythim J.
Metjahic Safet
Smart John A.
Sybase Inc.
LandOfFree
Database system with methodology for high-performance date does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Database system with methodology for high-performance date, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Database system with methodology for high-performance date will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2930809