Database system providing optimization of group by operator...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06691101

ABSTRACT:

COMPUTER PROGRAM LISTING APPENDIX
A Computer Program Listing Appendix is included with this application.
COPYRIGHT NOTICE
A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to data processing environments and, more particularly, to system and methods for improved optimization and execution of queries accessing information stored in multiple physically separate database tables.
2. Description of the Background Art
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as “records” having “fields” of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
One purpose of a database system is to answer decision support queries and support transactions. A query may be defined as a logical expression over the data and the data relationships set forth in the database, and results in identification of a subset of the database. For example, a typical query might be a request, in SQL syntax, for data values corresponding to all customers having account balances above required limits. During query processing, a database system typically utilizes one or more indexes to answer queries. Indexes are organized structures associated with the data to speed up access to particular data values (i.e., answer values). Indexes are usually stored in the database and are accessible to a database administrator as well as end users. The basic operation of database systems, including the syntax of SQL (Structured Query Language), is well documented in the technical, trade, and patent literature; see, e.g., Date, C.,
An Introduction to Database Systems
, Volume I and II, Addison Wesley, 1990; the disclosure of which is hereby incorporated by reference.
“Data warehouse” systems represent a type of database system optimized for supporting management decision making by tracking and processing large amounts of aggregate database information—that is, the data warehouse. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time. Development of a data warehouse includes development of systems to extract data from operating systems plus installation of a warehouse database system that provides managers flexible access to the data. A well-known example of a data warehouse system is Sybase® Adaptive Server® IQ (ASIQ), available from Sybase, Inc. of Dublin, Calif.
A data warehousing system typically handles large pools of historical information representing different portions of a business. These different portions may, for example, represent a number of different geographical areas, such as information relating to sales in France, Italy, Germany, and other countries or regions. The information may also represent different periods of time, such as sales in October, November, December, and so forth. A number of different products and/or business units may also be involved. Frequently, such a large quantity of information is involved that in order to store the information it must be broken down into segments that are maintained in a number of separate database tables. For example, one table may contain information regarding sales in France in December. Another table may have all sales in Italy in January. Separate tables may be created based upon geography, time period, business unit and various other factors.
Because of the large quantities of information involved in a typical data warehouse, a database administrator will typically divide the information among multiple database tables rather than including all of the information in a single table. One reason an administrator will usually avoid placing a large quantity of information into a single table is that the time taken to load to an index is a logarithmic function of the size of the table. Adding one million rows into an empty table is 100 times quicker than adding one million rows into a table that already has 100 million rows in it. An index on a 100 million-row table is quite large and adding to this index requires changing a considerable number of database pages. Thus, if an administrator is continuously loading large quantities of information on a weekly or monthly basis, it is much more efficient for the administrator to load information into a number of separate, physically smaller tables (with physically smaller indexes) than to load the information into a single large table which requires a large index to be updated. Besides enabling faster loading and indexing, the use of multiple tables also enables back up and restore operations to be handled more efficiently. For example, a manager may wish to maintain information in a data warehouse for a certain period of time (e.g., 12 months of sales data). Once a new month has been loaded in to the warehouse, an old month (e.g., the same month of the prior year) is dropped. Dropping one-twelfth of the rows from a large physical table is a very expensive operation, in terms of system performance, because it requires updating a very large index containing a large quantity of data. On the other hand, if the information is maintained in monthly tables, then the warehouse can be updated by simply dropping the tables for the old month.
Storing information in a number of separate, physically smaller tables means that one frequently has to bring multiple tables together in order to generate reports for business analysis, data mining, financial planning, or similar purposes. For example, if one stores a physical table of sales in France by week, a report that provides a view of an entire calendar quarter requires the joining of approximately thirteen weekly tables. This is typically accomplished by a SQL UnionAll operation to merge together these separate tables. An administrator may simply create a database view called “Q
1
” or “First Quarter” that looks and feels to a user of the data warehouse as if the information is stored in a single table. A user may have no knowledge that behind the scenes the administrator has chosen to store the underlying information in multiple separate tables. The user simply interacts with an object called Q
1
to obtain access to the information. The problem is that if the user is summarizing information (which is implemented by the SQL GroupBy clause), the process of summarizing all of the rows that fall within a particular group requires bringing all the rows together into the summary (a physical group) before the results may be returned to the user. If the information is stored in separate tables, this means that one is required to sequentially drain all of the separate physical tables into this single summary (i.e., scanning each physical table in turn row by row to generate the result) before the summarized result is returned to the user. In other words, a serial operation is required in order to build the summary table.
In a data warehouse environment that is processing large quantities of information, this type of serial operation results in a considerable performance penalty. Analysts or other users of the data warehouse system may be delayed considerably in receiving answers to their queries. This delay makes the type of ad-hoc analysis that is typical in a data warehouse environment much less efficient and, therefore, more difficult to conduct. Adding to this problem is the fact that many businesses purchase lar

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Database system providing optimization of group by operator... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Database system providing optimization of group by operator..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Database system providing optimization of group by operator... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3285886

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.