Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1998-10-29
2001-05-08
Ray, Gopal C. (Department: 2181)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C709S248000
Reexamination Certificate
active
06230283
ABSTRACT:
CROSS-REFERENCE TO RELATED APPLICATIONS
This application contains subject matter which is related to the subject matter of the following applications, each of which is assigned to the same assignee as this application and filed on the same day as this application. Each of the below-listed applications is hereby incorporated herein by reference in its entirety:
“METHOD FOR LOGICAL CONNECTION RESYNCHRONIZATION,” by Mark R. Gambino, Ser. No. 09/181,753; and
“SYSTEM FOR LOGICAL CONNECTION RESYNCHRONIZATION,” by Mark R. Gambino, Ser. No. 09/181,959;
TECHNICAL FIELD
The present invention relates in general to the operation of computerized data communication networks, and more particularly, to the recovery of communication network operations after a failure of one of the network components.
BACKGROUND OF THE INVENTION
Computer data communication networks are used to transmit information between geographically dispersed computers and between user devices such as computer terminals or workstations and host computer applications. A variety of communication architectures exist. Two such data communication architectures are the IBM System Network Architecture (SNA) and the International Standards Organization's (ISO) Open System Interconnection (OSI) architecture. One embodiment of IBM's System Network Architecture is described in a co-pending, commonly assigned United States patent application, Ser. No. 08/245,053, entitled “Virtual Route Resynchronization”, the entirety of which is hereby incorporated herein by reference.
High Performance Routing (HPR) is a recent enhancement to the IBM Systems Network Architecture. HPR uses rapid transport protocol (RTP), and the logical connection between two HPR-capable nodes is called an RTP connection. The ends of the connection are referred to as the RTP endpoints, while any intermediate nodes along the RTP connection route are called the automatic network routing (ANR) nodes. Error recovery on an RTP connection is done end-to-end rather than node-to-node, meaning that only the RTP endpoints are involved.
Many end-user sessions can flow on a given RTP connection. Also, data messages sent on an RTP connection can get lost in the network or might arrive out of order at the destination RTP endpoint. Each message that flows on an RTP connection is assigned a byte sequence number (BSN) which enables the destination node to determine when data is lost or arrives out of order. It is critical that the origin RTP endpoint fill in the correct BSN when sending out a message, otherwise the RTP connection will fail causing all the end-user sessions to also fail.
Because of the need to maintain the sequence of messages between the data host and other components, communications with a failing unit can only be restarted if the sequence number information is known or if the entire communications network is reinitialized. Reinitialization of a large network is highly undesirable because of the considerable time required. This lost time can be costly to a business that is dependent upon transaction processing for its operations. Various schemes have been proposed for retaining sequence information so that the network can be restarted without reinitialization. However, data host failure may occur unpredictably and may not afford an opportunity to save the necessary sequencing information. In these situations, a network reinitialization is required. There is therefore a need to have a system or method for resynchronizing data communications without reinitializing the network.
The present invention addresses the technical problems of recovering synchronization information lost during a network component failure. It is also directed to the problem of resynchronizing message traffic between adjacent communication components following a component failure.
DISCLOSURE OF THE INVENTION
Briefly summarized, this invention comprises in one aspect an article of manufacture which includes at least one computer usable medium having computer readable program code means embodied therein for synchronizing message traffic between a first data processing system and a second data processing system connected by a data communications network after a failure of the first data processing system. Within the network, message traffic travels over a logical connection linking the first and second data processing systems. Each message in the message traffic include a SYNC number and a byte sequence number. A recipient of the message tests to determine whether the message has a next expected byte sequence number and discards any byte sequence number older than the next expected byte sequence number. The computer readable program code means in the article of manufacture includes computer readable program code means for causing a computer to effect: (i) retrieving a stored SYNC number and byte sequence number (BSN) from external memory; (ii) incrementing the SYNC number by a predetermined amount to obtain a new SYNC number, the predetermined amount being sufficient to ensure that the new SYNC number comprises a current SYNC number; (iii) sending a status request message from the first data processing system to the second data processing system, the status request including the new SYNC number and the BSN read from the external memory; (iv) receiving at the first data processing system a response message to the status request message, wherein the response message contains a BSN of a next piece of data that the second data processing system is expecting; and (v) updating logical connection control information at the first data processing system with the BSN value for the next piece of data expected by the second data processing system.
To restate, provided herein is a technique for rapidly resynchronizing and recovering virtual network routes without reinitializing the communications network upon startup from a component failure. Further, the process described herein achieves resynchronization of message traffic quickly with low system processing overhead. The solution is described herein with reference to IBM's Transaction Processing Facility (TPF) operating system; however, is applicable to various systems as will be understood by those in the data communications art.
REFERENCES:
patent: 4926414 (1990-05-01), Baratz et al.
patent: 5021949 (1991-06-01), Morten et al.
patent: 5084816 (1992-01-01), Boese et al.
patent: 5212789 (1993-05-01), Rago
patent: 5235595 (1993-08-01), O'Dowd
patent: 5265103 (1993-11-01), Brightwell
patent: 5506955 (1996-04-01), Chen et al.
patent: 5517622 (1996-05-01), Ivanoff et al.
patent: 5894547 (1999-04-01), Baskey
patent: 6085248 (2000-07-01), Sambamurthy et al.
IBM Technical Disclosure Bulletin, “Multibus Synchronization for RAID-3 Data Distribution”, vol. 35, No. 5, (Oct. 1992), pp. 21-24.
IBM Technical Disclosure Bulletin, “Recovery Management in Trasaction Processing Systems”, vol. 39, No. 04, (Apr. 1996), pp. 27-29.
Esq. Lily Neff
Heslin & Rothenberg, P.C.
International Business Machines - Corporation
Ray Gopal C.
LandOfFree
Logical connection resynchronization does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Logical connection resynchronization, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Logical connection resynchronization will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2504508