Checkpointing for recovery of channels in a data processing...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S043000

Reexamination Certificate

active

06687853

ABSTRACT:

The present invention is related to checkpointing for recovery of channels in a data processing system, and is more particularly related to checkpointing for recovery of channels using a protocol which allows for multiplexing operations at the frame level and streaming of commands and data.
BACKGROUND OF THE INVENTION
In a data processing system, such as the IBM S/390 system having channels whose operation in controlled by Channel Command Words (CCWs), and whose Input/Output (I/O) links are fiber optics using the IBM FICON connectivity architecture, when a channel is attempting to recover from interface errors on the fiber link and the subchannel is in the active state, the channel can attempt retry of the operation from the point of failure by issuing a selective reset with request for retry, specifying which CCW to retry. When, as a conclusion to an unsuccessful retry recovery action, the Interface Control Check (IFCC) status is presented to the S/390 operating system, fields in the Extended Status Word/Extended Report Word (ESW/ERW) must be set up, as explained in IBM Enterprise Systems Architecture/390 Principles of Operation, SA22-7201-06, available from International Business Machines Corporation of Armonk, N. Y. Among these is the primary CCW address which communicates back to the operating system the progress the channel has made through the CCW chain at the time of the error. Based on this information the operating system can determine what storage has been updated for use in its error recovery procedures. On S/390 channels prior to FICON, the protocols only allowed the channel to send the next command in a CCW chain upon receipt of an explicit indication (status or data) that the prior command execution was complete. However FICON protocols allow the channel to stream commands and/or data out to a single device, while simultaneously doing the same for multiple devices.
U.S. Pat. No. 5,392,425 issued Feb. 21, 1995 to Elliott et al for CHANNEL-INITIATED RETRY AND UNIT CHECK FOR PERIPHERAL DEVICES, discloses retrying a command from a CCW in a data processing I/O system having a channel connected to a control unit in which the channel detects an error condition and requests the control unit to retry the current command of an I/O operation.
SUMMARY OF THE INVENTION
The present invention provides a method, program product and apparatus which allows the channel to: 1) manage the data necessary for the recovery of an operation for a single device while multiple devices are active (checkpointing) and 2) determine the correct primary CCW address to report in the IFCC status by tracking and examining relevant checkpoints.
With the implementation of IBM FICON architecture, the channel is allowed to stream multiple commands out to a control unit without waiting for positive confirmation that any of the preceding commands are complete. In addition, this may occur for multiple devices simultaneously. An object of the present invention is to track within the FICON channel, the progress of CCWs through their various stages, so that when an error is detected and an operation is aborted, the channel can properly select which CCW to attempt to retry with the control unit and for unsuccessful retries to report back to software the correct primary CCW address indicating the extent to which the channel completed modifying and accessing S/390 storage. FICON architecture establishes two checkpointing events: if the CCW is a ‘Read’ with a non-zero byte count, or the CCW flags contain Program Controlled Interruption (PCI), a checkpoint is established between the channel and control unit for that CCW number.
It is also an object of the present invention to implement checkpointing concepts in a manner that has minimal impact on functional performance, tracking only the minimal data needed during normal operation and using that data in lengthier analysis performed during error recovery. This data is tracked on a ‘per operation’ basis so that many operations can be concurrently ongoing, and utilizes the architectural concept of CCW numbering for each CCW in a chain.


REFERENCES:
patent: 3688274 (1972-08-01), Cormier et al.
patent: 3736566 (1973-05-01), Anderson et al.
patent: 4688221 (1987-08-01), Nakamura et al.
patent: 4912707 (1990-03-01), Kogge et al.
patent: 5392425 (1995-02-01), Elliott et al.
patent: 6035424 (2000-03-01), Freerksen et al.
patent: 6519712 (2003-02-01), Kim et al.
American National Standard of Accredited Standards Commimittee NCITS working draft Fibre Channel Single-Byte Command Code Sets Mapping Protocol—2 (FC-SB-2) Rev. 1.4, May 23, 2000.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Checkpointing for recovery of channels in a data processing... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Checkpointing for recovery of channels in a data processing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Checkpointing for recovery of channels in a data processing... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3311162

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.