Electronic system and method for implementing functional...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Electronic system and method for implementing functional... Electronic system and method for implementing functional...

: 1998-08-12
: 2002-03-12
: Iqbal, Nadeem (Department: 2184)
: Error detection/correction and fault detection/recovery
: Data processing system error or fault handling
: Reliability and availability

: Reexamination Certificate
: active
: 06357024
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to electronic computer systems, and more particularly to fault-tolerant or reliable electronic systems employing multiple processing units in order to reduce computational errors and/or determine the source of computational errors. The invention described herein may also be useful in supporting the development or investigation of improvements to components used in electronic systems employing multiple processing units.
2. Description of the Relevant Art
An electronic circuit such as a microprocessor may fail to produce a correct result due to “hard” failures or “soft” errors. Hard failures are permanent and reproducible, and typically result from design errors, fabrication errors, fabrication defects, and/or physical failures. A failure to properly implement a functional specification represents a design error. Fabrication errors are attributable to human error, and include the use of incorrect components, the incorrect installation of components, and incorrect wiring. Examples of fabrication defects, which result from imperfect manufacturing processes, include conductor opens and shorts, mask alignment errors, and improper doping profiles. Physical failures occur due to wear-out and/or environmental factors. The thinning and/or breakage of fine aluminum lead wires inside integrated circuit packages due to electromigration or corrosion are examples of physical failures. Soft errors, on the other hand, are temporary and non-reproducible. Soft errors are often the result of transient phenomenon such as electrical noise (e.g., power supply “glitches” and ground “bounce”), energetic particles (e.g., alpha particles), or “marginal” circuit design.
Incorrect results cannot be tolerated in computer systems used in, for example, aircraft flight control systems, missile guidance systems, and banking transactions. Computer systems used in such critical applications must be highly reliable. One method used to increase the reliability of such computer systems is called functional redundancy checking (FRC). FRC typically employs two electronic microprocessor devices functioning as central processing units (CPUs). A first “master” microprocessor and a second “checker” microprocessor receive the same input signals and execute instructions simultaneously (i.e., in lock step). The checker microprocessor compares the output signals produced by the master microprocessor to its own internally-generated output signals. If any output signal produced by the master microprocessor does not match the respective output signal produced by the checker microprocessor, the checker microprocessor generates an error signal which initiates corrective action (i.e., “notification”).
FIG. 1
is a block diagram of a typical electronic computer system
10
employing FRC. Electronic computer system
10
includes identical first and second CPUs
12
a
and
12
b
, a processor bus
14
, chip set logic
16
, a memory unit
18
, a memory bus
20
, a system bus
22
, and a peripheral device
24
. CPUs
12
a
and
12
b
are typically microprocessor integrated circuits formed upon a single monolithic semiconductor substrate. Processor bus
14
couples both CPU
12
a
and CPU
12
b
to each other and to chip set logic
16
. Chip set logic
16
functions as interface between CPUs
12
a-b
and system bus
22
, and between CPUs
12
a-b
and memory unit
18
. System bus
22
is adapted for coupling to one or more peripheral devices. Peripheral device
24
is coupled to system bus
22
. Peripheral device
24
may be, for example, a disk drive unit, a video display unit, or a printer. Memory unit
18
stores data, and typically includes semiconductor memory devices. Chip set logic
16
is coupled to memory unit
18
via memory bus
20
, and may include a memory controller.
CPUs
12
a
and
12
b
include built-in functional redundancy checking circuitry. During system initialization, either CPU
12
a
or CPU
12
b
is configured to be the master, and the other CPU is configured to be the checker CPU. The master CPU drives its output terminals, while the checker CPU changes its output terminals to function as input terminals. The respective terminals (e.g., “pins”) of CPUs
12
a
and
12
b
are coupled together. The checker CPU compares its intemally-generated values to those produced by the master CPU and received at the respective terminals. If any output signal produced by the master CPU does not match the respective output signal produced by the checker CPU, the checker CPU produces an error signal. The error signal may serve as notification to external error recovery hardware (not shown). For example, the error signal may be routed to a third maintenance CPU (not shown) or an interrupt controller (not shown) which initiates an error recovery routine in response to the error signal. The error recovery routine may involve “backing up” the software program running at the time the error occurred to an established “checkpoint” at which instruction execution may be reinitiated.
The master CPU initiates data read and write operations. In response to a memory read request from the master CPU, chip set logic
16
obtains data from memory unit
18
via memory bus
20
and provides the data to both CPU
12
a
and CPU
12
b
via processor bus
14
. During a memory write operation, chip set logic
16
receives the data from the master CPU and stores the data within memory unit
18
via memory bus
20
. In response to a read request from an address within an address range assigned to peripheral device
24
, chip set logic
16
obtains data from peripheral device
24
via system bus
22
and provides the data to both CPU
12
a
and CPU
12
b
via processor bus
14
. During a write operation to an address within an address range assigned to peripheral device
24
, chip set logic
16
receives the data from the master CPU and provides the data to peripheral device
24
via system bus
22
.
Several problems occur when implementing electronic computer system
10
. Most importantly, the signals driven upon the output terminals of a CPU often do not adequately reflect the current internal execution state of the CPU. For example, there may be a time delay of many system clock cycles before an activity within the CPU results in signals being driven upon the output terminals. In addition, CPUs
12
a
and
12
b
may include relatively large internal cache memory systems
26
a
and
26
b
. Such cache memory systems are capable of holding large numbers of instructions and data. CPUs
12
a
and
12
b
are capable of operating for extended periods using instructions and data stored in respective cache memory systems
26
a
and
26
b
. During these extended periods, any computational errors produced do not propagate to the terminals of CPUs
12
a
and
12
b
, and are hence not “visible” for detection using FRC. As a result, cache memory systems
26
a
and
26
b
tend to delay error detection. Early detection of an error is key to determining the cause of the error and reducing the likelihood that valuable data is lost due to the error.
Furthermore, the maximum amount of data which may be transferred over processor bus
14
in a given amount of time (i.e., the maximum “speed” of processor bus
14
) is limited by the increased electrical loading of two CPUs and signal reflections within the signal lines of processor bus
14
due to the multiple connection points (i.e., terminations). Electronic computer system
10
does not support separate “point-to-point” processor buses capable of much higher speeds.
It would be beneficial to have an electronic system and method implementing FRC by comparing “signatures” generated by each CPU. Each “signature” would include a relatively small number of bits, and would preferably be representative of the internal execution state of the CPU. Immediate comparisons of representative signatures would facilitate earlier error detection, especially when the CPUs include relatively large internal cache memory systems. In addition, comparing only such sig

Affiliated with

Dutton Drew J.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Mudgett Dan S.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

White Scott A.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Conley & Rose & Tayon P.C.

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Daffer Kevin L.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

Iqbal Nadeem

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Electronic system and method for implementing functional... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Electronic system and method for implementing functional..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Electronic system and method for implementing functional... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2827233

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure