Fast refresh of snapshots containing subqueries

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06289335

ABSTRACT:

FIELD OF THE INVENTION
This invention relates to snapshots for database systems and more particularly to a method and system that allows for improved “fast refreshes” of snapshots.
BACKGROUND OF THE INVENTION
A snapshot is a body of data constructed of data from a “master” table. The master table may be local or remote relative to the snapshot. The data contained within a snapshot is defined by a query that references the master table and optionally other tables, views or snapshots. A snapshot can be refreshed on a periodic basis to reflect the current state of its corresponding base tables. An “updatable snapshot” is a snapshot to which updates may be directly made. Such updates are propagated from the snapshot back to the master table.
One method of refreshing snapshots is to reissue the defining query for the snapshot and simply replace the previous snapshot data with the results of the reissued query. This method is referred to as a “complete refresh.” Complete refreshes are particularly disadvantageous when the master and snapshot tables are located at different sites in a network, because data for the entire refreshed snapshot table must be sent through the network.
Alternatively, another method known as a “fast refresh” can be performed to expedite the refresh operation, by transferring to the snapshot only those changes to the master table which have been made since the last refresh of the snapshot. A log file (referred to as a “master log”) can be employed to track and record the rows that have been updated in the master table. When a snapshot is refreshed, only the appropriate rows in the master log need to be applied to the snapshot table. In a networked environment, only those modified rows found at the master site are transferred across the network and updated or inserted into the snapshot. Rows deleted in the master table are also deleted in the snapshot. Fast refresh is typically faster, more efficient, and involves less network traffic than a complete refresh.
For example, a company with
100
sales associates maintains three tables, shown in FIG.
2
(
a
) to keep track of its sales information. Customer table
204
is a table that contains information about the company's customers. One of the columns in the table is a customer identifier (CID), which is a primary key, uniquely assigned to each customer. In the figures, primary key columns are indicated by an asterisk (*). Another column in table
204
is ZIP, which stores the zip code of the customer. Other columns, not shown, include street addresses, telephone numbers, and so forth. Order table
202
contains information about a customer's order. Each row in the order table has an order identifier (OID) and a CID. Each customer may place several orders, so there is a many-to-one relationship between the CID column of customer table
204
and the CID column of order table
202
. In the figures, the many-to-one relationship is designated by a line connecting two tables with an inverted “V” on the “many” side. Order line table
200
stores an order line identifier (OLID) and an OID. Since each customer can have several orders, and each order several order lines, order line table
200
is usually much larger than customer table
204
.
In this example, sales associates are assigned their territories by zip code, and they would like to keep a copy of only the relevant sales information on their laptop computers. Thus, if sales associate Smith is assigned only to zip codes 19555 and above, then Smith is only interested in the customers, orders, and order lines for zip code 19555 and above. Consequently, Smith creates a snapshot with a snapshot definition query for each table to retrieve that information in the snapshot. Since a snapshot also stores administrative information, the database system presents to Smith a snapshot view, which hides that administrative information. In FIG.
2
(
b
), customer snapshot view
214
is the result of creating the snapshot with the following SQL query:
[QUERY 1]
select * from customer where ZIP>=19555.
Order snapshot view
212
is more complicated, because order table
202
is partitioned on data found in customer table
204
. The select statement in the order snapshot definition query:
[QUERY 2]
select * from order
where exists (select CID from customer
  where order.CID = customer.CID and ZIP >= 19555)
is a subquery. The table outside the subquery is an “outer table,” and the table inside the subquery is an “inner table.” In this example, order table
202
is the outer table, and customer table
204
is an inner table. This subquery includes an “equijoin” predicate, order.CID=customer.CID, and a filter predicate, ZIP>=19555. An equijoin predicate connects a unique key of an inner table to an “equijoin” column of an outer table, in this case, column CID. A filter predicate refers to a “filter” column that is used in partitioning the data. In this case, since sales territories are assigned according to zip codes, the ZIP column of customer table
204
serves as a filter column.
A snapshot definition query may include nested subqueries. For example, order line snapshot view
210
, containing the relevant rows of order line table
200
for sales associate Smith, is defined by the following query containing subqueries:
[QUERY 3]
select * from order_line
where exists (select OID from order
  where order_line.OID = order.OID
    and exists (select CID from customer
      where order.CID = customer.CID and ZIP >= 19555)).
The order line table for a company is often too large to fit on a sales associate's laptop, but an order line snapshot defined for a particular sales associate is typically small enough for the portable laptop computer.
Snapshots are useful for sales associate Smith, because Smith can keep information about Smith's customers on a portable laptop computer. Before each sales trip, Smith requests a refresh of the snapshots on the laptop. In this situation, fast refresh is preferable to complete refresh, because fast refresh is much faster. Relatively few changes need to be brought over to the snapshot because customers rarely change their addresses, and most of the changes are new orders and order lines. However, in prior systems only customer snapshot
214
is fast refreshable, because prior systems cannot fast refresh snapshots whose snapshot definition queries use subqueries.
To circumvent this restriction, database users are forced to “denormalize” their tables if they want the performance benefits of fast refresh. For example, order table
202
is denormalized by appending the information contained in customer table
204
, resulting in denormalized order table
222
in FIG.
2
(
c
). Consequently, the zip code information for a particular customer is now found in two separate tables, in customer table
204
and denormalized order table
222
. Order line table
200
is denormalized by appending information found in customer table
204
and order table
202
to create denormalized order line table
220
. In denormalized order line table
220
, the zip code information for a particular customer is now found in separate rows of the same table as well as in three separate tables.
Denormalization imposes significant database maintenance costs. For example, if customer
2
moves to zip code 19554, then rows in denormalized order table
222
and denormalized order line table
220
must be updated in addition to the row in customer table
204
. Every point of change of a particular piece of information carries a risk of error, and it is desirable to reduce the number of points of change.
Furthermore, a company may change the criteria according to which sales associates are assigned to customers. For example, a company may decide to assign sales associates on an individual basis, storing in the assignments in assignment table
300
of FIG.
3
. Assignment table
300
lists the assignments o

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Fast refresh of snapshots containing subqueries does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Fast refresh of snapshots containing subqueries, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Fast refresh of snapshots containing subqueries will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2470790

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.