Input/output recovery method which is based upon an error...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S002000

Reexamination Certificate

active

06336193

ABSTRACT:

TECHNICAL FIELD
This invention relates, in general, to processing within a computer environment and, in particular, to determining error conditions within the computer environment and to recovering from those error conditions.
BACKGROUND ART
Increasing pressure to provide highly available and continuously available computer systems places a great deal of emphasis on error detection and recovery. It is very important for errors to be detected and for recovery to be performed before the computer system crashes or is otherwise seriously impacted.
There are various types of errors and even more types of recovery processes. For example, missing interrupts and hot input/outputs (I/Os) are just two types of error conditions recognized by the Multiple Virtual Storage (or OS/390) operating system offered by International Business Machines Corporation.
A missing interrupt is an error that indicates that an input/output request has been initiated, but no response has been received for the request. A missing interrupt can be symptomatic of many different types of problems and there are different recovery processes to cover those different types of problems.
A hot I/O condition occurs when there are continuous unsolicited I/O interrupts. These interrupts are typically caused by an I/O device, control unit or channel path. Thus, recovery processes are provided to isolate and try to recover the cause of the interrupts.
There are also other types of errors that do not fall within the above categories. These errors, as well as the above errors, may cause critical system resources to become exhausted, thereby causing the computer system to crash. This is particularly devastating when several computer systems are coupled to one another and all of the systems crash.
Therefore, a need exists for an enhanced recovery capability that takes into account different types of errors. Further, a need exists for a recovery capability that monitors critical system resources, and takes action to avoid exhaustion of those resources. A yet further need exists for a recovery capability that provides enhanced system availability.
SUMMARY OF THE INVENTION
The shortcomings of the prior art are overcome and additional advantages are provided through the provision of a method of recovering from errors in a computer environment. In one embodiment, the method includes determining whether an error rate is above a predefined threshold, determining whether there is at least a potential shortage of a resource of the computer environment, and performing a recovery action when the error rate is above the predefined threshold and there exists at least a potential for a shortage.
In one example, the resource is storage and the determining of whether at least a potential shortage exists comprises checking a storage indicator indicative of a level of available storage.
In a further embodiment, the error rate is associated with a subsystem of the computer environment and, in one example, the subsystem is an input/output subsystem.
In one example, the recovery action includes at least one of the following: simulating status of an error detected for the subsystem, in which the simulating is devoid of a need for a large amount of the resource, and slowing down activity to the subsystem.
In another embodiment of the invention, a method of recovering from errors in a computer environment is provided. The method includes, for instance, determining whether an error rate is above a predefined threshold, determining whether a resource of the computer environment is below a predetermined threshold, and performing a recovery action when the error rate is above the predefined threshold and the resource is below the predetermined threshold.
In one example, the recovery action to be performed is based upon a severity level of the predetermined threshold.
The error recovery capability of the present invention advantageously takes into account different types of error conditions. Additionally, it monitors critical system resources, and takes action to avoid exhaustion of those resources. The error recovery capability of the present invention advantageously uses a statistical threshold of the number of errors over time for deciding when a device is abnormally disrupting the computer environment. Further, the present invention is able to quiesce activity at a subsystem level. Additionally, the present invention advantageously limits any outages to those applications and subsystems using the devices in error. Thus, the present invention provides enhanced system availability.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention.


REFERENCES:
patent: 3916379 (1975-10-01), Dulaney et al.
patent: 4380067 (1983-04-01), Beardsley et al.
patent: 4878049 (1989-10-01), Ochiai et al.
patent: 5197069 (1993-03-01), Cook et al.
patent: 5231631 (1993-07-01), Buhrke et al.
patent: 5271011 (1993-12-01), McMullan, Jr. et al.
patent: 5377311 (1994-12-01), Carlock et al.
patent: 5383188 (1995-01-01), Shigemoto
patent: 5388254 (1995-02-01), Betz et al.
patent: 5491687 (1996-02-01), Christensen et al.
patent: 5541955 (1996-07-01), Jacobsmeyer
patent: 5568650 (1996-10-01), Mori
patent: 6049570 (2000-04-01), Fukunaga et al.
patent: 6072595 (2000-06-01), Yoshiura et al.
IBM Manual, Enterprise Systems Architecture/390, Common I/O-Device Commands and Self Description, SA22-7204-02.
IBM Manual, Enterprise Systems Architecture/390, Principals of Operation, SA22-7201-04.
IBM Manual,MVS/ESA, Component Logive: I/O Supervision, MVS/ESA System Product, JES2 Version 4, JES3 Version 4, ZZ28-7035-02.
IBM Manual, OS/390, MVS Data Areas, vol. 3 (IVT-RCWK), SY28-1166-04.
“Multiple Path Storage Director,” International Business Machines Corporation Technical Disclosure Bulletin, vol. 32, No. 11, pp. 153-154 (Apr. 1990).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Input/output recovery method which is based upon an error... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Input/output recovery method which is based upon an error..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Input/output recovery method which is based upon an error... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2821097

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.