Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1998-09-30
2002-01-08
Ray, Gopal C. (Department: 2181)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S013000
Reexamination Certificate
active
06338146
ABSTRACT:
BACKGROUND OF THE INVENTION
This invention relates generally to database transactions on fault-tolerant multi-processor systems. In particular, this invention relates to methods for flushing in the commit phase of database transactions on cluster computer systems.
FIG. 1
illustrates a network node
100
in a multi-node system of the prior art. In
FIG. 1
, the node
100
includes loosely coupled processors
110
containing execution spaces
120
connected by a bus
130
. The system
100
is a flat arrangement of the processors
1
10
.
This bus-and-processor arrangement constitutes a single network node
100
on a network
140
. The constituent processors
110
of the network node
100
have no shared memory processor (SMP) characteristics, e.g., memory sharing between some of the processors
110
, and have no separate network presence.
The systems
100
and a subset of the processes thereon cooperate to provide a transaction service. The transaction service includes three elements: a commit coordinator, a resource manager and a Log. Each of the elements is a fault-tolerant process pair having primary and backup processes.
The primary and backup of each process pair are located at the same network address, i.e., at the address of the single network node
100
running both processes. Thus, for example, if the node
100
of the primary commit coordinator process becomes unavailable to the network
140
, the backup commit coordinator process becomes offline as well. Process pairs implementing transaction services are described in the book entitled “TRANSACTION PROCESSING: CONCEPTS AND TECHNIQUES”, by Gray et al., 1993, Morgan Kaufmann Publishers, Inc, San Mateo, Calif., at pages 132-138.
A standard two-phase commit algorithm is described pages 562-568 of the above referenced book by Gray et al. The two-phase commit algorithm involves the following steps:
PREPARE: send a flush broadcast invoking each resource manager involved in the transaction to vote on whether to commit;
DECIDE: collect flush results of voting, if all vote yes write the transaction commit log record;
COMMIT: invoke each involved resource manager telling it the commit decision; and
COMPLETE: when all acknowledge the commit message force-write a commit completion record to the log.
The prepare phase is also called phase 1 of the commit and commit phase is called phase 2.
In a prior art system a primary and backup commit coordinator are both located on a single network node. Any processor failure of other node related failure causes the entire node to become inoperative, i.e., the granularity of failure is the entire node. The sharing of a network address between primary and backup commit coordinator processes in the prior art system
100
prevents that system from being non-blocking because a failure of the node at shared network address disables the commit operation. The flushing of resource managers in such an arrangement is not truly non-blocking in the classic network sense.
SUMMARY OF THE INVENTION
Accordingly, one goal of the invention is a transaction processor in which processors are either connected to each other using SMP memory sharing with tightly-coupled synchronization primitives (first tier) or connected across the network (second tier).
Such a configuration is two-tiered, with “near processor” and “far processor
ode” relationships. The prior art configuration has two execution space contexts: here and there. The new configuration has three execution contexts: here, near-there, and far-there.
According to one aspect of the invention, a transaction service includes a three-phase algorithm requiring a backup commit coordinator process at a different network location than the primary.
According to one aspect of the invention, the primary and backup commit coordinator processes in the process pair are executing on different nodes having different network processes. Upon receiving the flush results the primary commit coordinator synchronizes the results to the backup commit coordinator utilizing a network message system so that the flush results are durably recorded at separate network nodes. Thus, the failure of any systems on either node will not result in a loss to the flush results.
According to another aspect of the, all processors in the node are coupled to a shared memory. Messages between processors in a node are implemented by memory copying. Each processor has an associated execution space in the shared memory with processes being attached to an execution space. During synchronization the messages are transferred from the execution space having the primary commit coordination attached in a first node to the execution space having the backup commit coordinator attached in a second node.
According to another aspect of the invention all processes of a transaction service are implemented as process pairs having primary and backup processes executing on different nodes having a different network presence.
Other features and advantages of the invention will be apparent in view of the following detailed description and appended drawings.
REFERENCES:
patent: 4683563 (1987-07-01), Rouse et al.
patent: 5757526 (1998-05-01), Shiragaki et al.
patent: 0 295 424 (1988-10-01), None
“Transaction Processing: Concepts and Technologies,” Gray et al., 1993, Morgan Kaufmann Publishers, Inc., San Mateo, CA, pp. 132-138 and 562-568.
ACM Transactions on Database Systems, vol. 17, No. 1, Mar. 1992, New York, N.Y., pp. 94-162.
Sigmod Record, vol. 21, No. 2, Jun. 1992, New York, N.Y., pp. 371-380, C. Mohan et al. “ARIES/IM: An Efficient and High Concurrency Index Management Method Using Write-Ahead Logging”.
Cheung Y. C.
Johnson Charles S.
Shariq Muhammad
Tung Shang-Shen
Compaq Computer Corporation
Oppenheimer Wolff & Donnelly
Ray Gopal C.
LandOfFree
Method and apparatus for fault-tolerant, scalable and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for fault-tolerant, scalable and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for fault-tolerant, scalable and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2840628