Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2000-07-07
2003-11-25
Baderman, Scott (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C701S029000
Reexamination Certificate
active
06654910
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates in general to an electronic control system for controlling the function of a processing system. In particular the invention relates to a method allowing to manage system fault situations of an electronic control system. Still more specifically, the invention deals with such a control system that can be used in an automotive vehicle.
BACKGROUND OF THE INVENTION
In recent years the complexity of electronic control systems used in consumer products and specifically the automobile electronics has increased dramatically. Although manufacturers of electronic subassemblies try to ensure that their products are reliable, it is almost impossible to ensure not to have any faults somewhere in a system at any given time within the products lifecycle. As a result, reliability and fault tolerant behavior of complex systems has become a topic of major concern to designers, manufacturers and users.
DESCRIPTION OF BACKGROUND ART
There are two fundamentally different approaches that are presently used to increase the reliability of computing systems.
The first approach is called fault prevention, also known as fault intolerance. The second approach is represented by real fault tolerance.
In the traditional fault prevention approach the objective is to increase the reliability of each used part within the overall system. Since it is almost impossible to achieve an absolute reliable system in practice, the goal of fault prevention is to reduce the probability of system failure to an acceptably low value. The reliability of a system can be increased by employing the method of worst case design and by using high-quality components. Since system interconnection devices represent a very common crystallization point for various failures, refined interconnections and imposing strict quality control procedures during the assembly phase are further important reliability improving measures.
However, most likely this type of solutions and measures will increase the cost of a system significantly.
As to the fault tolerance approach, two major techniques are typically used:
(a) Incorporate redundancy (i.e. usage of additional, multiple identical resources) into a system with the aim of masking the effects of faults, and
(b) Use error corrections (most common realized and utilized by bus systems and by storage devices).
In this type of systems, faults are expected to occur during computation. In case of an detected, identified failure, the system will
(i) be reconfigured by enabling the respective redundant elements, and/or
(ii) the error correction circuitry generating, controlling and monitoring the error corrections codes will automatically correct the differing data.
The realization of such type of fault tolerant system will require to provide and manage multiple instances of the redundant (identical) hardware elements and/or error correction circuits. As a drawback, this type of system implementation is encountering a multiplicity of cost—and going along physical size and power consumption.
FIG. 1
is illustrating a typical system using state of the art techniques. The examplary system is using a redundant instantiation for the NVRAM/VRAM (Non-Volatile and Volatile Random Access Memory) for the storage sub-system. The I/O devices are laid out redundantly for the I/O device controller and for the adjacent physical I/O device. A multiplexer element is switching to the redundant data path in case of occurring failure in this system area. A ‘system test and fault recovery controller unit’
0
is implemented to monitor the system functionality and to manage and to control the fault recovery steps to be performed. Additionally, a typical system supervising feature is provided by the Parity Checker. In this example, this feature is additionally providing Error Correction covering data integrity failures detected on the system bus.
Most commonly the system CPU is performing failure detecting and diagnostic routines as well. The application code for the additional diagnostic software routines is typically stored in the basic storage sub-system—and of course, redundantly contained in the redundant storage devices as well.
The system CPU, as explained exercising failure detecting routines, supporting and assisting the ‘system test and fault recovery controller’, can in addition be used to test and verify the integrity of all implemented failure detection and fault management devices and sub-systems.
This type of fault-tolerant system implementation is typically restoring the originally system functionality for all occurring ‘recoverable’ fault situations. Failures detected or not detected by the fault-management system will lead to a—potentially hidden—system malfunction, or to a general system abort.
Typically this type of fault-tolerant system realization is used in expensive and safety relevant commercial systems, justifying the extensive cost for implementation. Cost sensitive embedded systems for this reason only use partial and drastically reduced implementations, with the drawback of providing only limited fault recovery capability and emergency running attributes.
Nevertheless, the effectiveness of fault tolerance for enhancing the reliability of processing systems is much more pronounced in a system composed of basically reliable components than in a system of unreliable components. In other words, while fault tolerance can be used to increase the reliability of an already reliable system significantly, it is of little use—and can even have a detrimental effect—if the original system is unreliable in the first place.
Co-pending European Patent Application 99 101 817.7, assigned to the same assignee as the present application, dicloses an electronic control system for controlling the function of a processing system, especially for the use in an automotive vehicle, wherein said control system comprises a plurality of logical control elements, each of which is especially adapted to perform special tasks, whereby each of said control elements is able to communicate with every other control element.
SUMMARY OF THE INVENTION
It is therefore an object of the present invention to provide a method that to manage system fault situations with high system reliability and availability for Electronic Control Systems (ECUs) while maintaining low system cost.
It is a further objective to keep the hardware and software overhead at a minimum, thus limiting negative influence to the power dissipation as well as the physical measures size and weight.
It is still a further object to provide a system that is able to overcome the above mentioned shortcomings of the prior art.
The present invention describes a principle (hereinafter called “Intelligent Fault Management” (IFM) principle) allowing to manage system malfunctions and to restore system vitality of complex electronic control systems, featuring multiple cooperating processing elements—to an achievable extend and for a justifiable effort.
As mentioned above, IFM stands for a principle handling system failure situations maintaining minimum fault recovery time and providing high system availability. This principle is providing unique solutions for fault analysis, fault recovery definition and system re-vitalization. A method applying graceful degradation of system functionality is proposed, to allow to achieve the implementation of cost effective systems.
In differentiation to typical, i.e., state of the art fault management systems, the proposed idea is providing calculated deterministic fall back strategies, allowing to manage and to control the fault/vital system behavior. The method used by the IFM principle is supporting prioritized staggered fall-back solutions, degrading the system functionality in pre-assigned levels for system functionality.
The application of the IFM principle is focusing on the requirements most commonly encountered by embedded commercial systems and demanding advanced consumer electronics.
In particular the IFM principle is advantageous to be used in electronic control systems applied in highl
Eibach Wolfgang
Staiger Dieter E.
Baderman Scott
Senterfitt Akerman
LandOfFree
Intelligent fault management does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Intelligent fault management, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Intelligent fault management will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3168335