Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1999-12-21
2003-09-23
Beausoliel, Robert (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S010000
Reexamination Certificate
active
06625756
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates to microprocessors and, in particular, to microprocessors capable of operating in high-reliability modes.
2. Background Art
Soft errors arise when alpha particles or cosmic rays strike an integrated circuit and alter the charges stored on the voltage nodes of the circuit. If the charge alteration is sufficiently large, a voltage representing one logic state may be changed to a voltage representing a different logic state. For example, a voltage representing a logic true state may be altered to a voltage representing a logic false state, and any data that incorporates the logic state will be corrupted.
Soft error rates (SERs) for integrated circuits, such as microprocessors (“processors”), increase as semiconductor process technologies scale to smaller dimensions and lower operating voltages. Smaller process dimensions allow greater device densities to be achieved on the processor die. This increases the likelihood that an alpha particle or cosmic ray will strike one of the processor's voltage nodes. Lower operating voltages mean that smaller charge disruptions are sufficient to alter the logic state represented by the node voltages. Both trends point to higher SERs in the future. Soft errors may be corrected in a processor if they are detected before any corrupted results are used to update the processor's architectural state.
Processors frequency employ parity-based mechanisms detect data corruption due to soft errors. A parity bit is associated with each block of data when it is stored. The bit is set to one or zero according to whether there is an odd or even number of ones in the data block. When the data block is read out of its storage location, the number of ones in the block is compared with the parity bit. A discrepancy between the values indicates that the data block has been corrupted. Agreement between the values indicates that either no corruption has occurred or two (or four . . . ) bits have been altered. Since the latter events have very low probabilities of occurrence, parity provides a reliable indication of whether data corruption has occurred. Error correcting codes (ECCs) are parity-based mechanisms that track additional information for each data block. The additional information allows the corrupted bit(s) to be identified and corrected.
Parity/ECC mechanisms have been applied extensively to caches, memories, and similar data storage arrays. These structures have relatively high densities of data storing nodes and are susceptible to soft errors even at current device dimensions. Their localized array structures make it relatively easy to implement parity/ECC mechanisms. The remaining circuitry on a processor includes data paths, control logic, execution logic and registers (“execution core”). The varied structures of these circuits and their distribution over the processor die make it more difficult to apply parity/ECC mechanisms.
One approach to detecting soft errors in an execution core is to process instructions on duplicate execution cores and compare results determined by each on an instruction by instruction basis (“redundant execution”). For example, one computer system includes two separate processors that may be booted to run in a Functional Redundant Check unit (“FRC”) mode. In FRC mode, the processors execute identical code segments and compare their results on an instruction by instruction basis to determine whether an error has occurred. This dual processor approach is costly (in terms of silicon). In addition, inter-processor signaling through which results are compared is too slow to detect corrupted data before it updates the processors' architectural states. Consequently, this approach is not suitable for correcting detected soft errors.
Another computer system provides execution redundancy using dual execution cores on a single processor chip. This approach eliminates the need for inter-processor signaling, and detected soft errors can usually be corrected. However, the processor employs an on-chip microcode to correct soft errors. This approach consumes significant processor area to store the microcode and it is a relatively slow correction mechanism.
The present invention addresses these and other deficiencies of available high reliability computer systems.
SUMMARY OF THE INVENTION
The present invention provides a mechanism for correcting soft errors in high reliability processors.
In accordance with the present invention, a processor includes a protected execution unit, a check unit to detect errors in results generated by the protected execution unit, and a replay unit to track selected instructions issued to the protected execution unit. When the check unit detects an error, it triggers the replay unit to reissue the selected instructions to the protected execution unit.
For one embodiment of the invention, the protected execution unit includes first and second execution units that provide redundant execution results to detect soft errors. For another embodiment of the invention, the protected execution unit includes parity protected storage structures to detect soft errors. For yet another embodiment of the invention, the replay unit provides an instruction buffer that includes pointers to track issue and retirement status of in-flight instructions. When the check unit indicates an error, the replay unit resets a pointer to reissue the instruction for which the error was detected.
REFERENCES:
patent: 4453215 (1984-06-01), Reid
patent: 4912707 (1990-03-01), Kogge et al.
patent: 5012403 (1991-04-01), Keller et al.
patent: 5247628 (1993-09-01), Grohoski
patent: 5321698 (1994-06-01), Nguyen et al.
patent: 5475856 (1995-12-01), Kogge
patent: 5504859 (1996-04-01), Gustafson et al.
patent: 5530802 (1996-06-01), Fuchs et al.
patent: 5530804 (1996-06-01), Edgington et al.
patent: 5535410 (1996-07-01), Watanabe et al.
patent: 5561775 (1996-10-01), Kurosawa et al.
patent: 5604753 (1997-02-01), Bauer et al.
patent: 5630047 (1997-05-01), Wang
patent: 5659721 (1997-08-01), Shen et al.
patent: 5664214 (1997-09-01), Taylor et al.
patent: 5748873 (1998-05-01), Ohguro et al.
patent: 5751985 (1998-05-01), Shen et al.
patent: 5764971 (1998-06-01), Shang et al.
patent: 5765208 (1998-06-01), Nelson et al.
patent: 5784587 (1998-07-01), Lotz et al.
patent: 5787474 (1998-07-01), Pflum
patent: 5903771 (1999-05-01), Sgro et al.
patent: 5966544 (1999-10-01), Sager
patent: 6047370 (2000-04-01), Grochowski
patent: 6279119 (2001-08-01), Bissett et al.
patent: 6393582 (2002-05-01), Klecka et al.
patent: 0 315 303 (1989-05-01), None
patent: 0 411 805 (1991-02-01), None
Hennessy, John L., Patterson, David A., “Computer Organization and Design”, 1998, Morgan Kaufmann Publishers, Inc., p. G-8.*
Keith Diefendorff, Microprocessor Report, Nov. 15, 1999, pp. 8, vol. 13, No. 15.
Keith Diefendorff, Power4 Focuses on Memory Bandwidth, Oct. 6, 1999, pp. 11-17.
Anthony Marsala & Basel Kanawati; PowerPC Processors; System Theory 1994; Proceedings of the 26th Southwestern Symposium. p.p. 550-556; Athens, OH USA; ISBN: 0-8186-5320; IEEE Catalog # 94TH0599-1.
1984 Data Book; ROckwell International, Semiconductor Products Division; Order No. 1, Oct. 1983; pp. 3-1-3-22.
Grochowski Edward T.
Quach Nhon
Rash William
LandOfFree
Replay mechanism for soft error recovery does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Replay mechanism for soft error recovery, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Replay mechanism for soft error recovery will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3035899