Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2000-10-19
2004-06-08
Beausoliel, Robert (Department: 2113)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S004110, C709S226000
Reexamination Certificate
active
06748559
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates in general to computer networks and, in particular, to communications between nodes on a computer network. Still more particularly, the present invention relates to a method and system for determining timeout values for network processes and utilizing the timeout values to free network resources in a System Area Network.
2. Description of the Related Art
Distributed computer networks are known in the art. In a traditional network, individual components of the network are interconnected via a parallel bus, such as a PCIX bus. The parallel bus has a relatively small number of plug-in ports for connecting the components. The number of plug-in-ports is set (i.e., the number cannot be increased). At maximum loading, a PCIX bus transmits data at about 1 Gbyte/second.
The introduction of high performance adapters (e.g., SCSI adapters), Internet-based networks, and other high performance network components has resulted in increased demand for bandwidth, faster network connections, distributed processing functionality, and scaling with processor performance. These and other demands are quickly outpacing the current parallel bus technology and are making the limitations of parallel buses even more visible. PCIX bus, for example, is not scalable, i.e., the length of the bus and number of slots available at a given frequency cannot be expanded to meet the needs for more components, and the limitation hinders further development of fast, efficient distributed networks, such as system area networks. New switched network topologies and systems are required to keep up with the increasing demands.
The present invention recognizes the need for faster, more efficient computer networks offering the features demanded by the developments of technology. More specifically, the present invention recognizes the need for providing a management system by which resources on a distributed computer network are efficiently allocated to processes or operations on the network.
SUMMARY OF THE INVENTION
A method for managing allocation of network resources within a distributed computer system is provided. The invention is applicable to a distributed computing system, such as a system area network, having end nodes, switches, and routers, and links interconnecting these components. Each end node uses send and receive queue pairs to transmit and receive messages. A source end node segments a message into packets and transmits the packets over the links. The switches and routers interconnect the end nodes and route the packets to the appropriate target end node. The target end node then reassembles the packets into the message.
In the method of the invention, the network traversal time and the end node response time for requests and/or packets being routed in a switch-connected system area network are utilized to determine the total round trip time for completion of the particular network operation. The sum of the timeout values for all switches that participate in routing the request from a requester (source) to the receptor node (target) is provided to the requester's channel adapter (CA). The time-out values are provided by the switch manufacturer and are sent to a network Subnet Manager (SM) via SM packets (SMP). The timeout values added together represent the SubnetTimeout. The time-out value of the target channel adapter (CA), the ResponseTime, is also provided to the requester. The requester then utilizes one of two timeout equations to calculate the. overall response time required for the request to be completed. A timer is started and the elapsed time to complete the request is monitored and compared with the overall response time calculated. When the timer expires before a response is received at the requester, the operation is assumed to have failed and the network resources being utilized by the request may be reallocated to another network operation.
Another embodiment of the invention operates at an end node that is receiving packets of a message (i.e., a target end node). After a packet arrives, the target end node begins a time count and monitors the time for receipt of the next packet. When the next packet is not received by a pre-determined, time-out time value, the resources of the target end node are released for use by other network operations.
All objects, features, and advantages of the present invention will become apparent in the following detailed written description.
REFERENCES:
patent: 5819019 (1998-10-01), Nelson
patent: 5946465 (1999-08-01), Chmielewski et al.
patent: 6076114 (2000-06-01), Wesley
patent: 6405236 (2002-06-01), Nieratschker
patent: 6405337 (2002-06-01), Grohn et al.
Frazier Giles Roger
Neal Danny Marvin
Pfister Gregory Francis
Thurber Steven Mark
Beausoliel Robert
Bracewell & Patterson L.L.P.
Duncan Marc
International Business Machines - Corporation
McBurney Mark E.
LandOfFree
Method and system for reliably defining and determining... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for reliably defining and determining..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for reliably defining and determining... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3333766