System and method for query processing using virtual table...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06694306

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to data processing, particularly systems and methods for query processing which realize integrated access to a plurality of databases.
2. Description of the Prior Art
Today, with the increasing tendency towards reorganization of in-house information systems and tie-ups between companies, information systems which can cope with this rapidly changing social situation are in growing demand. Usually each company has a number of databases, each of which stores a huge volume of data in many files or tables. Such data consists of heterogeneous data created under different conditions over a long time and thus lacks consistency. Therefore, it has been pointed out that there are two problems to be solved: (1) different kinds of data have to be accessed using different applications; (2) when starting a new service or modifying a service, it is necessary to develop a new application or modify an existing application. This approach, which uses a specific application to access a specific type of data, is clearly inefficient because of the following disadvantages: since many different applications must be handled, the management task becomes more complicated; considerable cost is required in developing and maintaining applications; and sometimes a delay in service occurs due to application development time.
Methods to realize transparent access to a plurality of databases which really exist (hereinafter called “real databases”) from application programs by creating virtual tables and utilize mappings from columns in the virtual tables to columns in databases in order to conceal a plurality of databases from an application programs are disclosed in U.S. Pat. No. 5,873,088, method (1) and U.S. Pat. No. 5,675,785, method (2). In method (1) , logical definition of real databases is used to achieve transparent access to a plurality of databases, while in method (2), a query issued to a schema composed of virtual tables is converted to access real databases. Both the methods are characterized in that real databases are concealed and accessed from application programs using virtual tables and queries that are issued from the application programs to the virtual tables are transformed to access real databases. The approach of accessing real databases through a virtual schema, which is called database integration or schema integration, has been studied by many researchers in the academic society since around 1980. In fact, various integration methods have been proposed, as typically shown by federated database systems which have been introduced in A. Sheth and J. Larson, “Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases,” ACM Computer Surveys, Vol. 22, No.3, pp.183-236. All these methods use mapping from virtual schemata or virtual tables to real databases to conceal real databases from users or application programs (hereinafter called “application”) for logical integration. However, this prior art has not taken the approach of assigning multiple mappings to a single virtual table and selecting one mapping among the mappings according to access conditions. The reason for this is that in the prior art, the advantage of assigning multiple heterogeneous data to one virtual table was not clear and no criteria and system for selection among assigned mappings were studied. Recently, as the need to cope with the increasing complexity and diversity of information systems has arisen, there has been a growing demand for a technique to allow a virtual table to be shared by a plurality of applications and make columns in different real databases accessible according to conditions for access from applications to the virtual table. However, conventional methods which realize a system for mapping switching as mentioned above simply by using an application, have a problem that a structurally complicated application is needed.
In constructing a large-scale corporate information system based on transparent access through virtual tables, another major problem is that realistic performance to execute service cannot be obtained. This problem is particularly evident in case that complicated queries, typically OLAP (OnLine Analytical Processing), are executed in environments where distributed query processing is done to a plurality of databases or data warehouses. In terms of data scale, there are already terabyte (10
12
bytes) order data warehouses as of July 1999; it is reported that in challenging U.S. companies, users who issue a complicated query that takes one day or more as time from query input to receive an answer are emerging. Such a complicated query contains heavy-load processing jobs like join processing of many large-scale tables. Join processing refers to a process of joining tables that frequently occurs in ad-hoc analytical query processing. If the target table is present in a different database (processing in this condition is hereinafter called distributed query processing), data transfer occurs between databases, which leads to a serious inefficiency.
There are two possible methods for improving the efficiency of distributed query processing: (1) data transfer and processing volumes are reduced by optimizing queries to push down processing jobs which individual real databases can handle, to the individual databases; and (2) data to be processed is cached and the cached data is used to omit data transfer for quicker processing. Regarding the push-down method (1), U.S. Pat. No. 5,590,321 discloses one possible approach. In this approach, query processing is pushed down to real databases holding data and capabilities necessary for query processing where each push-down takes place on a per-query basis or on a per-subquery basis. Therefore, the approach cannot be applied to complicated queries as typified by OLAP mentioned above where such push-down is possible and effective only when a query or subquery is further divided into smaller query units for each push-down. In join processing between different databases (hereinafter called “distributed join”) processing volumes such as data transfer and database internal processing volumes, which affect the entire system, considerably vary depending on where and how the join processing is performed. However, conventional methods have not incorporated any means to minimize the total processing volume by properly selecting the method and location for executing said distributed join in consideration of the above-said data transfer volume or database internal processing volume.
To cache data as mentioned in method (2), there are three types of cache means: (i) cache memories as built in conventional computer systems; (ii) WEB caches, which have been studied by many researchers recently; and (iii) caches specially developed for databases. In case of (i), address-data sets are stored in a cache memory and when data at a certain address is requested, if the address is in the cache, the data corresponding to the address is returned from the cache. In case of (ii), the cache stores address-data sets, each set consisting of data and a URL (Universal Resource Locator), an address which uniquely identifies specific WEB data in the cache, and when a URL is requested, if the URL is in the cache, the data corresponding to the URL is returned from the cache. In other words, in case of (i) and (ii), a unique address which identifies specific data is given and a decision as to whether the cached data is usable or not is made only depending on whether the address is present in the cache or not. Therefore, no attention has been paid to the fact that if part of the cached data is usable for another request, the data may be usable. In analytical processing of huge volumes of data as typically seen in OLAP, analysis of huge volumes of data are performed in various ways while gradually changing conditions so that issuance of the exact same query as a previous one rarely occurs and thus it is difficult to use methods (i) and (ii) for such analytical processing.
As an ex

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for query processing using virtual table... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for query processing using virtual table..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for query processing using virtual table... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3292778

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.