Electrical computers and digital processing systems: multicomput – Computer-to-computer direct memory accessing
Reexamination Certificate
2000-07-18
2004-05-11
Maung, Zarni (Department: 2154)
Electrical computers and digital processing systems: multicomput
Computer-to-computer direct memory accessing
C709S245000
Reexamination Certificate
active
06735620
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention is directed to an efficient method and system for transmitting messages from a user's address space on one system directly into a user's address space in a second system using direct memory access. In addition, the present invention is also directed to a data transmission protocol to eliminate unnecessary retransmission of data packets.
The present invention is employed in a number of different circumstances. It is employed in data processing systems which are remote from one another and which communicate by means of data packet transmission from a source system to a receiving system. Additionally, the present invention is employed in SANs (system area networks) systems which represent nodes or clusters of processors which are packaged as a single unit (frame/rack in a frame). The other application, running on a possibly different physical system, may belong to the same user or to a different user.
The present application is not, however, directed to the usual protocols for message transmission of data from one system to another. In particular, the present invention is specifically directed to protocols that utilize direct memory access (DMA) hardware and techniques for directly (zero copy) transferring information from the address space of a user's application running on one system directly to the address space of another or the same user running on a second system. With respect to direct memory access, it is a method which provides the most efficient mechanism for transferring messages and information into specific memory locations of another process. In particular, direct memory access avoids passage of data through central processing units. This is an exceedingly fast mode of operation and it is the most efficient. When the DMA access is used to go directly to a user buffer rather than a system buffer in a message passing system, it is referred to as a zero-copy protocol.
The zero-copy protocol requires that the receiving system must be fully prepared to receive the amount of data sent and must also have a mechanism for specifically identifying the exact location for data storage. Since the transmission contemplated in the present invention is directly into the memory locations of another processing system, it is important that the receiving system be sufficiently prepared to receive such transmission. It is important because the right amount of data must be supplied to exactly the correct memory locations. If this is not the case it is possible that data is either lost or corrupted and in fact, this could conceivable be data belonging to a different user than the one who is transmitting the message packets. Clearly, corruption or loss of data in this manner is an unacceptable operating condition.
Because data transmission is directed to specified real memory locations which are not statically associated with fixed virtual memory locations via DMA procedures, error conditions which arise be particularly difficult to handle. In particular, if the sender never receives an acknowledgment from the receiver that a particular data packet has been received, it is undesirable for the sender to resend the same packet. In particular, if the sender were to wait for a given period of time (elapsed “time out” amount) and were not to receive an acknowledgment from the receiver, a retransmission of this sort could resort in a wasted transmission and/or the insertion of incorrect data into inappropriate memory locations (or real memory which is not associated with the intended virtual memory target location). Accordingly, in accordance with the present invention the sender negotiates retransmission with the receiver. This avoids unnecessary transmission of data particularly in the event that the only error that has occurred is a loss of the acknowledgment on its return from the receiver to the sender. Such circumstances do not warrant the retransmission of another data packet but rather only require an indication that the data has indeed been received. Nonetheless, if there is a more significant problem than the mere loss of an acknowledgment returned to the sender, the sender must ensure that receiver has prepared the zero copy buffers for DMA access prior to retransmitting the packet.
In Clustered systems (SANs), the processing power of each node in the cluster (CPU speeds) is increasing very fast and so are the speeds of the interconnects linking the various nodes in the cluster. However, the memory bandwidth (a function of how fast the CPU can move data from one region of its memory to another) is not keeping pace with the CPU speeds and the interconnect speeds. As a result the cost of protocol processing in reliable messaging systems is being increasingly dominated (bottlenecked) by the copy cost in the protocol path. The copy cost also increases with the size of the message being transported. Increasingly in clustered systems, with the emergence of technologies like clustered file systems, the size of data that needs to be transported from one node to another has been consistently increasing, hence the need for zero copy protocols. In order to reliably transport data in a zero copy fashion, the acknowledgments are used to ensure guaranteed delivery. It is wasteful to have to retransmit large data packets if the acknowledgment were lost. Hence the motivation for this invention where we use a small control message to negotiate with the receiver to ensure that a retransmit of the zero copy transported packet is required. We limit ourselves to the design of an efficient retransmission mechanism for reliable zero copy transport mechanism.
TERMINOLOGY
Reliable Transport: A transport mechanism where the transport protocol guarantees that messages submitted to be sent will be received by the target transparently to the application recovering from transient network and network adapter failures. This is typically accomplished in the art by ensuring that every packet sent is acknowledged by the receiver and the sender retransmit the packet if an acknowledgment is not received in a well defined interval of time. The interval of time is a function of the efficiency parameters of the system (node, processor, network, etc.).
Zero Copy Transport: A mechanism for message passing where the DMA (direct memory access) engines (possibly on the network adapter connecting the node to the network) are programmed to directly move data from system (node) memory into the network on the sending side and from the network into system memory directly on the receiving side without the involvement of the CPU (central processing unit) on the node in the movement of data at either end. This mechanism frees up the CPA on the node from the data movement aspects of protocol processing. This is also sometimes loosely referred to as Direct Memory Access method.
The present application also hereby incorporates by reference the entire contents of application Ser. No. 09/619,053 filed concurrently herewith.
SUMMARY OF THE INVENTION
In accordance with a preferred embodiment of the present invention, a method for transmitting a data packet stored in a first data processing system directly into a list of address in a second data processing system comprises a plurality of steps starting with providing the data packet in the first processor (sender) with a header which includes a tag which is associatable with a real address (possibly a list of real addresses) within the second processor (receiver). This data packet is transmitted with a header to the network adapter which is coupled to the receiver via an adapter which is coupled at the sender. This network adapter is provided with the mapping between the tag in the header with a real address (or possibly a list of real addresses) within the memory of the receiving system. Data in this data packet is transferred from the adapters to real address locations in the memory of the second system via direct memory access (DMA) (i.e. by programming the DMA engines typically on the network adapter to effect the movement of
Blackmore Robert S.
Govindaraju Rama K.
Shah Gautam H.
Cutter Lawrence D.
International Business Machines - Corporation
Maung Zarni
LandOfFree
Efficient protocol for retransmit logic in reliable zero... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Efficient protocol for retransmit logic in reliable zero..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Efficient protocol for retransmit logic in reliable zero... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3267821