Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1998-11-12
2002-02-05
Ray, Gopal C. (Department: 2181)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S025000, C702S132000
Reexamination Certificate
active
06345369
ABSTRACT:
FIELD OF THE INVENTION
This invention relates generally to the field of computer systems, and more particularly, to techniques for detecting environmental and power problems, including those of redundant system components, which can have an adverse effect on the operation of the computer system. Still more particularly, the invention relates to a method and apparatus for generating environmental and power warnings and providing this information to computer service repair personnel for fast and accurate diagnosis and correction of environmental and power errors.
BACKGROUND OF THE INVENTION
Complex computer systems require stable environmental and power conditions to ensure proper operation. When site environmental problems occur, such as air conditioning malfunctions, restricted air flow around the computer system, a/c power glitches, etc., the computer system may not properly perform, resulting in injury to important data stored on the computer by logical damage, e.g., disk sectors data corruption, or even complete hardware malfunction. To keep pace with increasing market demand for higher reliability and availability in computer systems, newer systems are being designed with redundant hardware components. For example, systems are being designed with redundant power supply and cooling components (i.e., fans/blowers). With such redundant components, the system is expected to maintain operations in the event of a power supply or fan/blower failure.
Typical, non-redundant systems are provided with various sensors for detecting environmental and power problems and providing appropriate error messages to inform users of these problems. Also, these error messages are used by computer repair service personnel to diagnose and correct the problem. One exemplary environmental and power warning system is provided in the PowerPC Common Hardware Reference Platform, (CHRP), and RS/
6000
Systems to inform the operating system of these types of events. The Common Hardware Reference Platform is described in detail in “PowerPC Microprocessor Common Reference Platform: A System Architecture,” ISBN
1
-
558603948
, available from IBM.
In general, the CHRP employs a variety if sensors which detect and measure environmental conditions. If the measurements of these conditions exceed certain threshold values, then data reflecting the conditions is written into an environmental and power warning register (EPOW register) in the system. In the CHRP architecture, the data written into the EPOW register is referred to as an action code. However, the complexity of redundant power and cooling components cannot be adequately handled with the standard EPOW arrangement. Redundant failures need to be reported with an appropriate level of severity without restricting the ability to power-up and use the system. Further, multiple levels of error reporting are required to handle the potential of more than one failure occurring.
It is, therefore, one object of the present invention to provide an improved environmental and power warning system which addresses the difficulties associated with a system having redundant power and cooling components. Additional objects and advantages of the present invention will become apparent in view of the following disclosure.
SUMMARY OF THE INVENTION
The present invention provides aspects for detecting environmental faults in redundant components of a computer system. In an exemplary method aspect, the method includes monitoring system environment conditions, including a status for redundant power supply and cooling components. The method further includes registering a failure condition with an appropriate error type when a monitored system environment condition exceeds a design threshold, and utilizing the registered failure condition as data in an architected error log.
Through the present invention, a methodology for handling redundant failure situations is provided. The methodology integrates with and extends current EPOW error handling architectures. Further, the present invention provides additional power/cooling failure isolation capability for service personnel. These and other advantages of the aspects of the present invention will be more fully understood in conjunction with the following detailed description and accompanying drawings.
REFERENCES:
patent: 4803592 (1989-02-01), Ashley
patent: 4881230 (1989-11-01), Clark et al.
patent: 5878377 (1999-03-01), Hamilton, II et al.
patent: 6035416 (2000-03-01), Abdelnour et al.
patent: 6044476 (2000-03-01), Ote et al.
Kitamorn Alongkorn
McLaughlin Charles Andrew
Patel Kanisha
Thorson Donald LeRoy
Leeuwen Leslie Van
Ray Gopal C.
Sawyer Law Group LLP
LandOfFree
Environmental and power error handling extension and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Environmental and power error handling extension and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Environmental and power error handling extension and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2958566