Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2000-02-08
2004-03-09
Beausoliel, Robert (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S026000, C703S021000
Reexamination Certificate
active
06704888
ABSTRACT:
FIELD OF THE INVENTION
The field of the present invention is the troubleshooting of hardware failures and the maintenance of computers.
More particularly, it relates to a process for analyzing information that is recorded the moment a malfunction is detected in the computer, in order to locate the component or components that caused the failure and to replace only the malfunctioning components.
It also relates to a tool for analyzing and locating failures and a computer that incorporates the tool.
DESCRIPTION OF RELATED ART
The constant decrease in the price of computing machines sometimes leads manufacturers to lower the quality of certain hardware components.
A component can be, for example, an ASIC or “Application Specific Hardware Circuit,” or a processor.
The user is therefore more and more frequently confronted with problems linked to hardware-related errors. All of the current machines are more or less capable of finding these errors, which can sometimes lead to failures in certain parts of the machine, or to a complete shutdown of the machine.
Each sensible component of a machine has status registers indicating the performance level of the component in question.
A given status of the machine is characterized by a “signature” of its status registers, i.e., a characteristic value of each register for this given status.
It is these values that constitute the information that will subsequently be analyzed by the machine.
It is possible to distinguish several types of failures in a computing machine.
In a first type, the failure causes a minor error that remains localized at the component level and is immediately corrected by the software that controls this component, and therefore the user does not experience any disturbance of his work.
In a second type, the failure can cause an error whose seriousness makes it no longer possible to guarantee the integrity of the data processed and may make it necessary to restart the machine.
The present invention relates more specifically, though not exclusively, to this second type of failure, which can cause interruptions in the operation of the machine, also known by the respective terms “machine check” and “checkstop.”
In the case of an interruption of the “machine check” type, the information collected is targeted to the component that detected the error, while in the case of an interruption of the “checkstop” type, all the “signatures” of the status registers of the machine are collected.
In both cases, it is then necessary to interrupt the values of the status registers in order to determine the error and possibly deduce its cause.
Each component of the machine is more or less directly linked to one or more other components of this machine, which will be called “neighbor components.” If a component has a defect, it is revealed by the neighbor components in their status registers. The user is then warned that there has been a failure in the machine, but in certain cases, there is nothing that allows him to know exactly which component is the defective one that caused the error.
There is still the signature of the status registers of the machine in case of error, but not an overall view of the status of the machine. There is a gap in the information. The information known is precise, but partial (the status registers) and global, but imprecise (there is an operational error). When the error results in a hard stop of the machine, it is necessary to pore through a thick manual to find the meaning of the status registers. It requires the help of an expert to perform a global analysis of these registers a posteriori.
The existing error analysis tools can provide all of the values of the registers in text form and can even perform the analysis of these values. However, the description of the status registers and the rules for interpreting their contents are buried in the machine code of these tools.
Since a tool is generally dedicated to one hardware version, it is not possible to add new descriptions of registers or new rules of interpretation without creating a new version of the tool.
SUMMARY OF THE INVENTION
The object of the invention is to specifically eliminate these drawbacks.
To this end, the subject of the invention is a process for analyzing and locating hardware failures in a computing machine storing information on operational errors generated by the various sensible hardware components of the machine.
It is characterized in that it consists of creating a man/machine interface through which the components and the rules for interpreting errors are described in a structured language and used by the machine as external parameters in correlation with the error information to detect the malfunctioning component or components.
Another subject of the invention is a tool for analyzing and locating hardware failures in a computing machine comprising means for storing error information generated by the sensible components of the machine.
It is characterized in that it includes an error analysis engine receiving through a first series of inputs the error information, and receiving through a second series of inputs the parameters required for the description of the sensible components of the machine and for the description of the rules for interpreting errors, and in that it includes a man/machine interface between the tool and the component expert to allow him to formulate the parameters in a structured language.
Finally, another subject of the invention is a computer that incorporates the tool defined above.
The formulation of the parameters for describing the registers and the rules for interpreting errors according to the invention makes it possible to add new descriptions or to enrich the interpretation simply by editing source files written in a given format, without having to create a new version of a tool with each hardware upgrade.
Moreover, the architecture of the tool according to the invention is scalable and its maintenance is facilitated by separating the analysis tool itself (the engine), which processes the information in “machine” code, from the descriptions of the status registers and the interpretation rules written in “source” code.
REFERENCES:
patent: 4649515 (1987-03-01), Thompson et al.
patent: 4964125 (1990-10-01), Kim
patent: 5164912 (1992-11-01), Osborne et al.
patent: 5394543 (1995-02-01), Hill et al.
patent: 5548714 (1996-08-01), Becker
patent: 5944839 (1999-08-01), Isenberg
patent: 6041425 (2000-03-01), Kokunishi et al.
patent: 6105149 (2000-08-01), Bonissone et al.
patent: 6119246 (2000-09-01), McLaughlin et al.
patent: 6401219 (2002-06-01), Shigeta
patent: 6430707 (2002-08-01), Matthews et al.
patent: 6442542 (2002-08-01), Ramani et al.
patent: 6539429 (2003-03-01), Rakavy et al.
patent: 6587960 (2003-07-01), Barford et al.
Caudrelier Christian
Espie Eric
Garrigues Philippe
Randon Christian
Bull S.A.
Duncan Marc M
Kondracki Edward J.
Miles & Stockbridge P.C.
LandOfFree
Process and tool for analyzing and locating hardware... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Process and tool for analyzing and locating hardware..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Process and tool for analyzing and locating hardware... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3248056