Coordinated multinode dump collection in response to a fault

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S031000

Reexamination Certificate

active

06643802

ABSTRACT:

BACKGROUND
The invention relates to storing information in response to a fault occurring in a parallel processing system.
Software in a computer system may be made up of many layers. The highest layer is usually referred to as the application layer, followed by lower layers that include the operating system, device drivers (which usually are part of the operating system), and other layers. In a system that is coupled to a network, various transport and network layers may also be present.
During execution of various software routines or modules in the several layers of a system, errors or faults may occur. Such faults may include addressing exceptions, arithmetic faults, and other system errors. A fault handling mechanism is needed to handle such faults so that a software routine or module or even the system can shut down gracefully. For example, clean-up operations may be performed by the fault handling mechanism, and may include the deletion of temporary files and freeing up of system resources. In many operating systems, exception handlers are provided to handle various types of faults (or exceptions). For example, exception handlers are provided in WINDOWS® operating systems and in UNIX operating systems.
Software may be run on single processor systems, multiprocessor systems, or multi-node parallel processing systems. Examples of single processor systems include standard desktop or portable systems. A multiprocessor system may include a single node that includes multiple processors running in the node. Such systems may include symmetric multiprocessor (SMP) systems. A multi-node parallel processing system may include multiple nodes that may be connected by an interconnect network.
Faults may occur during execution of software routines or modules in each node of a multi-node parallel processing system. When a fault occurs in a multi-node parallel processing system, it may be desirable to capture the state of each node in the system. A need thus exists for a method and apparatus for coordinating the handling of faults occurring in a system having multiple nodes.
SUMMARY
In general, according to one embodiment, a method of handling faults in a system having plural nodes. Includes detecting a fault condition in the system and starting fault handling routine in each of the nodes. Selected information collected by each of the fault handling routines is communicated to a predetermined one of the plural nodes.
Other features and embodiments will become apparent from the following description, from the drawings, and from the claims.


REFERENCES:
patent: 5046068 (1991-09-01), Kubo et al.
patent: 5056091 (1991-10-01), Hunt
patent: 5253359 (1993-10-01), Spix et al.
patent: 5303383 (1994-04-01), Neches et al.
patent: 5371883 (1994-12-01), Gross et al.
patent: 5485573 (1996-01-01), Tandon
patent: 5537535 (1996-07-01), Maruyama et al.
patent: 5619644 (1997-04-01), Crockett et al.
patent: 5640584 (1997-06-01), Kandasamy et al.
patent: 5642478 (1997-06-01), Chen et al.
patent: 5664093 (1997-09-01), Barnett et al.
patent: 5699505 (1997-12-01), Srinivasan
patent: 5774645 (1998-06-01), Beaujard et al.
patent: 5845062 (1998-12-01), Branton et al.
patent: 5872904 (1999-02-01), McMillen et al.
patent: 5884019 (1999-03-01), Inaho
patent: 5961642 (1999-10-01), Lewis
patent: 6000040 (1999-12-01), Culley et al.
patent: 6000046 (1999-12-01), Passmore
patent: 6065136 (2000-05-01), Kuwabara
patent: 6105150 (2000-08-01), Noguchi et al.
patent: 6289379 (2001-09-01), Urano et al.
patent: 6430712 (2002-08-01), Lewis
patent: 6470388 (2002-10-01), Niemi et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Coordinated multinode dump collection in response to a fault does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Coordinated multinode dump collection in response to a fault, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Coordinated multinode dump collection in response to a fault will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3166680

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.