Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-11-05
2001-08-14
Breene, John (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06275818
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates in general to computer-implemented database management systems, and, in particular, to cost-based optimization of queries by identifying and eliminating redundant execution steps in processing the queries and to identifying generic views for use in executing the queries.
2. Description of Related Art
Next generation decision support applications are typically capable of processing huge amounts of data, and they may have the ability to integrate data from multiple, heterogeneous data sources. Such data sources may include traditional database systems, repositories on the Internet/World Wide Web (“the Web”), semi-structured documents, and file systems. These data sources often differ in a variety of aspects, such as their data models, the query languages they support, and their network protocols. Additionally, they are frequently spread over a wide geographical area. Decision support queries may be used to analyze and compare information from diverse sources. Processing decision support queries in this setting often involves redundant processing because comparing information requires comparing the same data with different GROUP BY operations. A GROUP BY operation causes rows in an intermediate query answer set to be grouped according to the values in the column(s) specified in the GROUP BY operation. Exemplary redundancies may include repeated access of the same data source and multiple executions of similar processing sequences. Thus, the cost of processing decision support queries in this setting can be quite high.
This problem of efficiently processing heterogeneous decision support queries has recently received considerable attention from database researchers: Ahmed, R., Smedt, P., Du, W., Kent, W., Ketabchi, A., and Litwin, W, The Pegasus Heterogeneous Multidatabase System,
IEEE Computer
, December 1991, [hereinafter “[ASD+91]”]; Chawathe, S., Garcia-Molina, H., Hammer, H., Ireland, K., Papakonstantinou, Y., Ullman, J. D., and Widom, J.; The TSIMMIS Project: Integration of Heterogeneous Information Sources, In
Proc. of IPSJ
, Tokyo, Japan, 1994, [hereinafter “[CGH+94]”]; Christophides, V., Cluet, S. Abiteboul, S., and Scholl, M., From Structured Documents to Novel Query Facilities, In
ACM SIGMOD Intl. Conf on Management ofData
, 1994, [hereinafter “[CAS94]”]; Papakonstantinou, Yannis, Garcia-Molina, H., and Widom, Jennifer, Object Exchange Across Heterogeneous Information Sources, In
Proc. Intl. Conf on Data Engineering
, Taipei, Taiwan, February 1995, [hereinafter “[PGW95]”]; Subrahmanian, V. S., Adali, S., Brink, A., Emery, R., Lu, J. J., Raiput, A., Rogers, T. J., Ross, R., and Ward, C. Hermes, Heterogeneous Reasoning and Mediator System, Tech.report, submitted for publication, Institute for Advanced Computer Studies and Department of Computer Science University of Maryland, College Park, Md. 20742, 1995, [hereinafter “[SAB+95]”]; Levy, A. Y., Rajaraman, A., and Ordille, J. J., Querying Heterogeneous Informnation Sources Using Source Descriptions, In
Proc
. 22
nd VLDB Conf
, pages 251-262, 1996, [hereinafter “[LRO96]”]; Tomasic, A., Raschid, L., and Valduriez, P., Scaling Heterogeneous Databases and the Design of Disco, In Proc. IEEE Intl. Conf on Distributed Computing Systems, 1996, [hereinafter “[TRV96]”]; Lakshmanan, L.V.S., Sadri, F., and Subramanian, I. N. SchemaSQL—a language for querying and restructuring multidatabase systems, In
Proc. IEEE Int. Conf on Very Large Databases
(VLDB'96), pages 239-250, Bombay, India, September 1996, [hereinafter “[LSS96]”]; Atzeni, Paolo, Mecca, Giansalvatore, Merialdo, Paolo, and Tabet, Elena. Structures in the Web, Technical Report, DDS, Sezione Informatica, Universita di Roma Tre, 1996, [hereinafter “[ANIMT96]”]; L. M. Haas, D. Kossmann, E. L. Wimmers, and J. Yang, Optimizing Queries Across Diverse Data Sources, In
Proceeding of the VLDB Conference
, Aug. 1997, “[HKWY97]”]; and Abiteboul, Serge, Querying Semi-Structured Data,
In
6
th International Conf on Database Theory
, Delphi, Greece, January 1997, [hereinafter “[Abi97]”], which are incorporated by reference herein.
The majority of the approaches are based on the idea of developing a database-like ‘wrapper’ for data sources and implementing queries against these sources [CGH+94], [HKWY97], and Tork R. M. and P. Schwarz, Dont Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources, In
Proceeding of the VLDB Conference
, Aug. 1997, [hereinafter “[RS97]”], which are incorporated by reference herein. Typically, wrappers provide a relational or object-relational view of the data in the non-traditional sources and enable the user to use a common language/interface to query data from the diverse sources. Systems that provide end users with an integrated view of data in multiple data sources are referred to as Heterogeneous Database Systems (HDBS) and Multi-database Systems (MDBS) and are increasingly becoming relevant in the context of real-life business applications.
As an illustration, consider an application in which an investment broker manages the investment portfolios of his clients. The portfolio information may be stored in a relational database, which also contains other information about the clients such as their address, profession, etc. The broker obtains the latest stock price, as well as historical stock price information from the stock exchange servers on the Web. The broker also maintains account information in a spreadsheet for each client. In order to make complex decisions involving the buying and selling of stocks for the clients, the broker would have to use decision support queries to analyze and compare information from all of these sources.
Decision support queries analyze and compare information from diverse sources. Comparing information from diverse sources may require comparing the same data with different GROUP BY operations. Such a comparison may result in a query specification that contains computational redundancies. An analysis of TPCD benchmark queries, which were modeled after conventional decision support queries, reveal that redundancies even exist in computations of answers for simple queries, TPC, TPC Benchmark® D (Decision Support), Working draft 6.0, Transaction Processing Performance Council, August 1993, [hereinafter “[TPC93]”], which is incorporated by reference herein (see examples in Appendix A). Conventional database query optimizers generally lack the capability of identifying these redundancies. Hence, the results of one executed query segment are rarely used for processing another query segment. Since decision support queries are typically time consuming to run, especially in a HDBS setting, identifying and sharing computational results judiciously could lead to significant improvements in performance. The example that follows illustrates the kind of redundant computation that is typical of decision support queries.
Consider the following decision support query of the investment broker discussed above: list techno stocks owned by computer engineers that had a higher average sales volume over the past year than the maximum sales volume, which was reached in the first six months of the year, of any oil stock owned by a chemical engineer; and list the name of the computer engineer.
For this example, a relational wrapper is implemented which enables the user to utilize a common language/interface, e.g., a Structured Query Language (SQL) interface. Accordingly, a representative SQL query is shown below. In the example below, Rinvest in a relational database, represented as Rinvest(name, profession, ticker, qty, buyDate, buyPrice). Wstock is a Web data source, represented as Wstock(ticker, category, date, volume, endprice)
SELECT Rinvest.name,
Subramanian Narayana Iyer
Venkataraman Shivakumar
Breene John
Gates & Cooper LLP
International Business Machines - Corporation
Rayyan Susan F.
LandOfFree
Cost based optimization of decision support queries using... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Cost based optimization of decision support queries using..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cost based optimization of decision support queries using... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2498200