Error detection/correction and fault detection/recovery – Pulse or data error handling – Digital data error correction
Reexamination Certificate
1998-05-29
2003-09-02
Baker, Stephen M. (Department: 2133)
Error detection/correction and fault detection/recovery
Pulse or data error handling
Digital data error correction
Reexamination Certificate
active
06615383
ABSTRACT:
The present invention relates generally to systems and methods for enabling a first computer to transmit messages and data to a second computer; and more particularly to a system and method for ensuring that each message is sent to the second computer once and only once while retaining a high level of message transmission reliability and using a “write only” message sending protocol to make such remote write operations efficient.
BACKGROUND OF THE INVENTION
In many multiple-processor computer systems it is important for processes or tasks running on one computer node (sometimes called the sender) to be able to transmit a message or data to another computer node (sometimes called the receiver), and to do so with absolute reliability. Also, it is extremely important that transmitted messages have a property called “idempotency,” which means that each message must be processed by the receiver exactly once. The reason message processing must be idempotent is best explained by example. If the message to be processed is “move the elevator up one floor,” and it is processed the wrong number of times, the elevator will go up the wrong number of floors, or a failure condition may be generated if the elevator is ordered to go past a topmost or bottommost floor. If the message to be processed is “transfer $1000 from account A to Account B,” and it is processed the wrong number of times, accounts A and B will have the wrong amount of money.
Message transmission reliability can be improved using both hardware and software mechanisms. An example of a hardware mechanism for improving reliability is to provide two parallel communication links between network nodes (or between each network node and the network medium), instead of just one. An example of a software mechanism for improving reliability is to verify each remote message write operation by performing a synchronous remote read operation after the remote message write operation. Another software mechanism for improving message transmission reliability is for the receiving system to explicitly acknowledge receipt of every message sent to it. In this latter example, the sending system may process the message acknowledgments asynchronously, allowing other messages to be sent before the acknowledgment of a prior message is processed.
Generally, transmitting messages between computer nodes is expensive in terms of latency and resources used if the successful transmission of each message is verified by performing a remote read operation after each such remote message write operation.
Alternately, instead of using remote reads to verify the successful transmission of each message, in some prior art systems a message is written locally to a local buffer, and then a “cookie” (which is primarily a data structure pointing to the memory location or locations where the message is stored) or other notification message is sent to the receiving system. The receiving system then performs a remote read operation to read the message from the remote memory location indicated in the notification message. In another implementation of this same basic prior art technique, both the message and the cookie are stored locally in the sending system and only a trigger message is transmitted to the receiving system. The receiving system responds to the trigger message by performing a first remote read operation to read the cookie and a second remote read operation to read the message at the location indicated by the cookie.
An advantage of the prior art techniques using remote read operations as an integral part of every message transmission is that remote reads are synchronous, and thus the system performing the remote read is notified immediately if the message transmission fails.
Another advantage of using remote read operations to transmit messages is that remote read operations make it relatively easy to ensure that each message is received and processed by the receiving system once and only once (i.e., idempotent). In most networked computer systems it is essential not to send the receiving system the same message twice. As already mentioned above, sending the same message twice could cause the receiving system to perform an operation twice that should only be performed once. Each message must be reliably received and processed by the receiving system exactly once to ensure proper system operation.
Remote write operations are relatively “inexpensive,” compared to remote read operations, in terms of system latency and system resources used, because the receiving CPU does not need to be involved in completing the write operation.
Referring to
FIG. 1
, there is shown a highly simplified representation of two prior art computer nodes herein called Node A
50
, and Node B
52
. The computer at each node can be any type of computer. In other words, the particular brand, architecture and operating system is of no importance to the present discussion, so long as each computer node is configured to operate in a networked environment. Each computer node
50
,
52
will typically include a central processing unit (CPU)
54
, random access memory
56
, an internal memory bus
58
and one or more communications interfaces
60
, often called network interface cards (NIC's). The computer nodes communicate with each other by transmitting messages or packets to each other via a network interconnect
62
, which may include one or more types of communication media, switching mechanisms and the like.
Each computer node
50
,
52
typically also has a non-volatile, random access memory device
64
, such as a high speed magnetic disk, and a corresponding disk controller
66
.
In this example, each computer node is shown as having two communications interfaces
60
for connecting that node to the network fabric. Providing two parallel communication links improves system reliability, since failure of a node's primary communication interface
60
, or failure or disconnection of its cabling to the network interconnect, does not prevent the node from participating in network communications. In many systems, failure of a node's communication link is tantamount to failure of the entire node, because the node is essentially useless to the system without its network connection. Providing a redundant network connections (herein called parallel links) is a well known strategy for addressing this problem.
A well known problem associated with the use of parallel links, is that the link failover mechanism must either avoid resending messages that have already been received and processed by the receiving system(s), or it must provide some other mechanism for ensuring idempotency (e.g., providing a receiver side mechanism for recognizing and discarding duplicate messages). The idempotency problem is not created by or unique to systems using parallel links; rather, the problem is exacerbated because the use of parallel links introduces additional opportunities for inadvertent retransmission of messages. For example, a link may fail after a message has been successfully transmitted, but before the receiving system has had the opportunity to acknowledge receipt or processing of the message. Alternately, the receiving system may have transmitted a message acknowledgment, but the acknowledgment may be lost due to improper operation of a damaged link. The present invention solves the idempotency problem in a manner that addresses the link failure problem.
FIG. 2
shows a simplified representation of a conventional communications interface (or NIC)
60
, such the ones used in the computer nodes of
FIG. 1
, showing only the components of particular interest. The NIC
60
typically includes two address mapping mechanisms: an incoming memory management unit (IMMU)
70
and an outgoing memory management unit (OMMU)
72
. The purpose of the two memory management units are to map local physical addresses (PA's) in each computer node to global addresses (GA's) and back. Transport logic
74
in the NIC
60
handles the mechanics of transmitting and receiving message packets, includin
Khalidi Yousef A.
Talluri Madhusudhan
Baker Stephen M.
Pennie & Edmonds LLP
Sun Microsystems Inc.
LandOfFree
System and method for message transmission between network... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for message transmission between network..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for message transmission between network... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3011309