Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1998-09-17
2001-09-25
Le, Dieu-Minh (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C707S793000
Reexamination Certificate
active
06295610
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to database management systems (DBMS). More specifically, the present invention relates to a method and apparatus for providing a parallel recovery mechanism for recovering dead transactions.
BACKGROUND OF THE INVENTION
In typical database systems, users store, update and retrieve information by submitting commands to a database application. To be correctly processed, the commands must comply with the database language that is supported by the database application. One popular database language is known as Structured Query Language (SQL).
A logical unit of work that is atomic and comprised of one or more database language statements is referred to as a transaction. In a database sever, an area of system memory is allocated and one or more processes are started to execute one or more transactions. The database server communicates with connected user processes and performs tasks on behalf of the user. These tasks typically include the execution of transactions. The combination of the allocated system memory and the processes executing transactions is commonly termed a database instance.
A buffer cache resides in a portion of the system memory and holds database information. Buffers in the buffer cache hold copies of data blocks that have been read from the data files residing on disk. The buffers are typically shared by all user processes that are concurrently connected to the instance. When a transaction desires to make a change to a data block, a copy of the data block is loaded into a buffer and the change is made to the copy of the data block stored in the database buffer cache in dynamic memory. Afterwards, the modified blocks of data are written from the database buffer cache back into the data files on disk.
The system memory also contains a redo log buffer. A redo log buffer is a circular buffer that holds information about update operations recently performed by transactions. This information is stored in redo entries. Redo entries contain the information necessary to reconstruct, or redo, changes made by operations such as INSERT, UPDATE, DELETE, CREATE, ALTER, or DROP, for example. Redo entries are generated for each change made to a copy of a data block stored in the database buffer cache. The redo log buffer is written to an active online redo log file group on disk by a background process. The records in the online redo log file group on disk are commonly referred to as redo logs.
An instance failure can occur when a problem arises that prevents an instance from continuing work. Instance failures may result from hardware problems such as a power outage, or software problems such as an operating system or database system crash. Instance failures can also occur expectedly, for example, when a SHUTDOWN ABORT or a STARTUP FORCE statement is issued.
Due to the way in which database updates are performed to data files in some database systems, at any given point in time, a data file may contain some data blocks that (1) have been tentatively modified by uncommitted transactions and/or (2) do not yet reflect updates performed by committed transactions. Thus, an instance recovery operation must be performed after an instance failure to restore a database to the transaction consistent state it possessed just prior to the instance failure. In a transaction consistent state, a database reflects all the changes made by transactions which are committed and none of the changes made by transactions which are not committed.
A typical DBMS performs several steps during an instance recovery. First, the DBMS rolls forward, or reapplies to the data files all of the changes recorded in the redo log. Rolling forward proceeds through as many redo log files as necessary to bring the database forward in time to reflect all of the changes made prior to the time of the crash. Rolling forward usually includes applying the changes in online redo log files, and may also include applying changes recorded in archived redo log files (online redo files which are archived before being reused). After rolling forward, the data blocks contain all committed changes as well as any uncommitted changes that were recorded in the redo log prior to the crash.
Rollback segments include undo records for undoing uncommitted changes made during the roll-forward operation. These undo records contain data block change information. When a transaction updates a copy of a data block that has been read into a buffer, the update is stored as undo change information in an undo record. The updating of the copy of the data block and the storing of the undo change information is performed as an atomic unit of change such that either both events occur or none of the events occur. During database recovery, the information contained in the rollback segments is used to undo the changes made by transactions that were uncommitted at the time of the crash. The process of undoing changes made by the uncommitted transactions is referred to as “rolling back” the transactions.
FIG. 1
illustrates an example of rolling forward and rolling back after a crash. Database
110
is a database requiring recovery at time t
1
. Database
120
represents the database after a redo log is applied at time t
2
. The database
120
comprises both changes made by committed transactions
121
and changes made by uncommitted transactions
122
. Database
130
represents the database at time t
3
after a rollback segment is applied. The database
130
comprises only changes made by committed transactions
121
.
When rolling back a transaction, the DBMS releases any resources (locks) held by the transaction at the time of failure. Lastly, the DBMS resolves any pending distributed transactions that were undergoing a two-phase commit coordinated by the DBMS at the time of the instance failure.
Conventionally, to recover after a database instance crash, the DBMS invokes a single process to rollback any changes that were made by dead transactions. Once invoked, the single process performs the necessary recovery tasks. In general, using a single process to rollback all changes can help ensure proper consistency of the database once the changes have been applied.
A drawback associated with using a single process to recover after a database instance crash is that it can induce a bottleneck in the system as changes that were made by uncommitted transactions are forced to be rolled back serially. In certain cases, a large number of transactions may be active when an instance crashes, and therefore, a large number of uncommitted transactions may need to be rolled back. This large number of uncommitted transactions may include changes to portions of the database which are of immediate need. Thus, access to certain portions of the database may be delayed for a relatively long period of time, as serially rolling back a large number of uncommitted transactions can be very time consuming.
One disadvantage of the prior method of recovering after a crash of an instance of a database is that undo records for all uncommitted (dead) transactions must be applied before a user is allowed to execute a new transaction to access the database system. This can take a long period of time if there are a large number of undo records that need to be applied, even if many of the undo records contain undo change information that corresponds to resources that are not of interest to the new transaction.
One method that can be used to help reduce the period of time that a user must wait before executing a new transaction is by rolling back changes on an as needed transaction-by-transaction basis depending on the particular data blocks that are of interest to the new transaction. For example, if a new transaction requires access to a data block that is locked by a particular dead transaction, the recovery process can cause the particular dead transaction to be recovered first in order to make the data block available for the new transaction.
However, this method also suffers from inefficiencies if the recovery of the particular dead trans
Ganesh Amit
Ngai Gary C.
Baderman Scott
Brandt Carl L.
Hickman Palermo & Truong & Becker LLP
Le Dieu-Minh
Oracle Corporation
LandOfFree
Recovering resources in parallel does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Recovering resources in parallel, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Recovering resources in parallel will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2473240