Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-06-30
2003-06-10
Rones, Charles (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C709S219000
Reexamination Certificate
active
06578041
ABSTRACT:
TECHNICAL FIELD
This invention relates to database computer systems and applications that execute on them. More particularly, this invention relates to methods for backing up a database so that the data therein is recoverable from media failures.
BACKGROUND OF THE INVENTION
Computer systems occasionally crash. A “system crash” is an event in which the computer quits operating the way it is supposed to operate. Common causes of system crashes include power outage, application operating error, and other unknown and often unexplained malfunctions that tend to plague even the best-devised systems and applications. System crashes are unpredictable, and hence, essentially impossible to anticipate and prevent.
A system crash is at the very least annoying, and may result in serious or irreparable damage. For standalone computers or client workstations, a local system crash typically results in loss of work product since the last save interval. The user is inconvenienced by having to reboot the computer and redo the lost work. For servers and larger computer systems, a system crash can have a devastating impact on many users, including both company employees as well as its customers.
Being unable to prevent system crashes, computer system designers attempt to limit the effect of system crashes. The field of study concerning how computers recover from system crashes is known as “recovery.” Recovery from system crashes has been the subject of much research and development.
In general, the goal of redo recovery is to return the computer system after a crash to a previous and presumed correct state in which the computer system was operating immediately prior to the crash. Then, transactions whose continuations are impossible can be aborted. Much of the recovery research focuses on database recovery for database computer systems, such as network database servers or mainframe database systems. Database system designers attempt to design the database recovery techniques which minimize the amount of data lost in a system crash, minimize the amount of work needed following the crash to recover to the pre-crash operating state, and minimize the performance impact of recovery on the database system during normal operation.
FIG. 1
shows a database computer system
20
having a computing unit
22
with processing and computational capabilities
24
and a volatile main memory
26
. The volatile main memory
26
is not persistent across crashes and hence is presumed to lose all of its data in the event of a crash. The computer system also has a non-volatile or stable database
28
and a stable log
30
, both of which are contained on stable memory devices, e.g. magnetic disks, tapes, etc., connected to the computing unit
22
. The stable database
28
and log
30
are presumed to persist across a system crash. The stable database
28
and log
30
can be combined in the same storage, although they are illustrated separately for discussion purposes.
The volatile memory
26
stores one or more applications
32
, which execute on the processor
24
, and a resource manager
34
. The resource manager
34
includes a volatile cache
36
, which temporarily stores data destined for the stable database
28
. The data is typically stored in the stable database and volatile cache in individual units, such as “pages.” A cache manager
38
executes on the processor
24
to manage movement of data pages between the volatile cache
36
and the stable database
28
. In particular, the cache manager
38
is responsible for deciding which data pages should be moved to the stable database
28
and when the data pages are moved. Data pages that are moved from the cache to the stable database are said to be “flushed” to the stable state. In other words, the cache manager
38
periodically flushes the cached state of a data page to the stable database
28
to produce a stable state of that data page which persists in the event of a crash, making recovery possible.
The resource manager
34
also has a volatile log
40
that temporarily stores computing operations to be moved into the stable log
30
. A log manager
42
executes on the processor
24
to manage when the operations are moved from the volatile log
40
to the stable log
30
. The transfer of an operation from the volatile log
40
to the stable log
30
is known as a log flush.
During normal operation, an application
32
executes on the processor
24
. The resource manager
34
receives requests to perform operations on data from the application. As a result, data pages are transferred to the volatile cache
36
on demand from the stable database
28
for use by the application. During execution, the resource manager
34
reads, processes, and writes data to and from the volatile cache
36
on behalf of the application. The cache manager
38
determines, independently of the application, when the cached data state is flushed to the stable database
28
.
Concurrently, the operations being performed by the resource manager on behalf of the application are being recorded in the volatile log
40
. The log manager
42
determines, as guided by the cache manager
38
and the transactional requirements imposed by the application, when the operations are posted as log records on the stable log
30
. A logged operation is said to be “installed” when it does not need to be replayed in order to recover the database state. This is usually accomplished by flushing the versions of the pages containing the changes made by the operation to the stable database
28
.
When a crash occurs, the application state (i.e., address space) of any executing application
32
, the data pages in volatile cache
36
, and the operations in volatile log
40
all vanish. The computer system
20
invokes a recovery manager which begins at the last flushed state on the stable database
28
and replays the operations posted to the stable log
30
to restore the database of the computer system to the state as of the last logged operation just prior to the crash.
One prior art approach to database recovery is to require the cache manager to flush the entire cache state periodically. The last such flushed state is identified in a “checkpoint record” that is inserted into the stable log. During recovery, a redo test is performed to determine whether a logged operation needs to be redone to help restore the system to its pre-crash state. The redo test is simply whether an operation follows the last checkpoint record on the log. If so (meaning that a later operation occurred and was posted to the stable log, but the results of the operation were not installed in the stable database), the computer system performs a redo operation using the log record.
This simple approach has a major drawback in that writing every change of the cached state out to the stable database
28
is practically infeasible because it involves a high volume of input/output (I/O) activity that consumes a disproportionate amount of processing resources and slows the system operation. It also requires atomic flushing of multiple pages, which is a troublesome complication. This was the approach used in System R, described in Gray, McJones, et al.,
The Recovery Manager of the System R Database Manager
, ACM Computing Surveys 13,2 (June, 1981) pages 223-242.
Crash recovery requires that the stable database
28
be accessible and correct. Media recovery provides recovery from failures involving data in the stable database. It is also a last resort to cope with erroneous applications that have corrupted the stable database. In some systems, to guard against stable database failures, the media recovery system provides an additional copy of the database called a backup database
29
, and a media recovery log (e.g., stable log
30
) is applied to the backup database
29
to roll its state forward to the desired state, usually the most recent committed state. To recover from failures, the media recovery system first restores the stable database
28
by copying the backup database
29
, perhaps stored on tertiary storage, to the usual seco
Mahmoudi Hassan
Microsoft Corporation
Rones Charles
Woodcock & Washburn LLP
LandOfFree
High speed on-line backup when using logical log operations does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with High speed on-line backup when using logical log operations, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and High speed on-line backup when using logical log operations will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3094097