Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1998-11-30
2001-12-11
Iqbal, Nadeem (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C709S248000
Reexamination Certificate
active
06330686
ABSTRACT:
I. DESCRIPTION OF THE INVENTION
IA. Field of the Invention
This invention relates to computer operating systems. More specifically, this invention relates to techniques for handling all types of IMS messages including protected conversations between two processes in the same or different execution environments after a corresponding IMS system crashes during phase one of a commit procedure.
The invention is embodied in an apparatus and method for handling messages by either committing corresponding messages after restart of the IMS system or aborting the commit procedure altogether.
IB. Background of the Invention
The present invention can be used in a network of computer systems that form part of a distributed computer system. Such a distributed computer system typically includes a central host computer and a plurality of virtual machines or other types of execution environments. A real machine includes a central processor and associated virtual machines. Within each such real machine a central computer, that includes the central processor, manages central resources of the real machine including a large memory and communication facilities. The central processor controls the access between the virtual machines and the resources so that each virtual machine appears to be a separate computer. The real machines may in turn be interconnected through a network into a global network to enable communications between applications running in execution environments belonging to different real machines. Each virtual machine is provided with its own conversation monitor system (CMS) to interact with (i.e., receive instructions from and provide prompts for) users of the virtual machine. CMS is a portion of the system control program. Certain resources such as shared file system (SFS) and shared structured query language (SQL) relational databases may be accessed by any user of the virtual machine and the host.
Each such system is a real machine. Two or more real machines can be connected to form a network, and data can be transferred using communications between virtual machines belonging to different real machines. Such a transfer is made via communication facilities such as AVS Gateway and VTAM facilities (“AVS Gateway and VTAM” are trademarks of IBM Corp. of Armonk, N.Y.).
Application running on any of the virtual machines may communicate with the coupling facility as well as with other applications running on the same or different virtual machines. Applications communicate by sending a message to the coupling facility. Like files and databases, communications are also protected resources.
An application can make changes to a database, file resource, or state of communication by first making a work request defining the changes. In response to a request for a change, provisional changes are made in shadow files while the original database or file is unchanged. When changes are made to shadow files, they are not committed. The application have the option of requesting that the changes be committed to validate the shadow file changes. Thereby, the changes made to the shadow file is transferred to the original file.
A one-phase commit procedure is often utilized to commit changes to the original file. The one-phase commit procedure consists of a command to commit changes to the resource as contained in the shadow file. When resources such as SFS or SQL resources are changed, the commits to the resources can be completed in separate one-phase commit procedures. In the vast majority of cases, all resources will be committed using separate procedures without error or interruption. However, if a problem arises during a one-phase commit procedure, some of the separate commits may have already been completed while others may not, causing inconsistencies. Such a problem can be solved only by rebuilding resources. However, the cost of rebuilding non-critical resources is more than compensated by the improved efficiency of the one-phase commit procedure.
A two-phase commit procedure is required to protect critical resources and critical communications. For example, assume that a first person's checking account is represented in a first database and a second person's savings account is represented in a second database. If the first person writes a check to the second person and the second person deposits the check in his/her savings account, the two-phase commit procedure ensures that if the first person's checking account is debited then the second person's savings account is credited or else neither account is changed. The checking and savings accounts are considered protected, critical resources because it is very important that data transfers involving the checking and savings accounts be handled reliably.
An application program can perform the two-phase commit procedure using a single command. Such a procedure consists of the following steps, or phases: During a prepare phase, each participant (debit and credit) resource is polled by the sync point manager to determine if the resource is ready to commit all changes. Each resource promises to complete the resource update if all resources successfully complete the prepare phase i.e. are ready to be updated. During the commit phase, the sync point manager directs all resources to finalize the updates or back them out if any resource could not complete the prepare phase successfully.
The above described two-phase commit procedure ensures consistency of modification of critical resources in most cases. It is possible, however, that a message sent by the application to the coupling facility (by executing the common queues system (CQS) PUT command) fails during the last stage of the commit procedure, when all the other participants of the protected conversation already committed changes. In such a case, the changes that have already been made can not be backed out because the protected resources are polled for readiness during the first phase of the commit procedure. This problem can be solved by retrying CQS PUT command for the failed message. If this retry succeeds, the consistency of the protected resources will be restored. However, the conventional techniques fail to provide a method for retrying CQS PUT procedure to restore consistency in the state of protected system resources.
II. SUMMARY OF THE INVENTION
It is therefore an object of this invention to provide a method for handling IMS messages including protected conversation messages across the IMS system restart.
Specifically, it is an objective of the present invention to provide a method for handling IMS messages including protected conversation messages across the IMS restart using corresponding unit of work elements.
It is another objective of the present invention to provide a system for handling IMS messages including protected conversation messages across the IMS system restart.
To achieve the objectives and the advantages of the present invention there is provided a distributed computer system comprising a plurality of execution environments and a coupling facility, wherein: each of said plurality of execution environments comprises a private storage memory for storing unit of work elements and a log data set for logging data related to the activity of each of said execution environments; and said coupling facility comprises a staging queue for storing messages that are not ready to be committed and a ready queue for storing messages that are ready to be committed.
Further improvements include the above distributed computer system wherein each of said stored unit of work elements comprises a recovery token containing information on all resources participating in a commit procedure.
Still further improvements include the above distributed computer system wherein each of said stored unit of work elements has an abort indicator, wherein the abort indicator indicates that a corresponding abort procedure is designated for abort when the abort indicator is set.
Still further improvements include the above distributed computer system wherein each of said stored unit of wo
Denny George Steven
Hughes Gerald Dean
Kennedy Michael Bruce
Nguyen Khiet Quang
International Business Machines Corp.
Iqbal Nadeem
Sughrue Mion Zinn Macpeak & Seas, PLLC
LandOfFree
Handling protected conversation messages across IMS restart... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Handling protected conversation messages across IMS restart..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Handling protected conversation messages across IMS restart... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2581481