Recovery mechanism for L1 data cache parity errors

Electrical computers and digital processing systems: memory – Storage accessing and control – Control technique

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S117000, C711S118000, C714S048000

Reexamination Certificate

active

06332181

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to computer systems, and more specifically to a method of operating a computer system in such a manner as to allow a software recovery from detected errors, such as a parity error arising during execution of program instructions, without requiring a machine check, that is, without requiring a “reboot” of the system, and particularly to such a method adapted for parity errors occurring in a first level data cache.
2. Description of Related Art
A typical structure for a conventional computer system includes one or more processing units connected to a system memory device (random access memory or RAM) and to various peripheral, or input/output (I/O), devices such as a display monitor, a keyboard, a graphical pointer (mouse), and a permanent storage device (hard disk). The system memory device is used by a processing unit in carrying out program instructions, and stores those instructions as well as data values fed to or generated by the programs. A processing unit communicates with the peripheral devices by various means, including a generalized interconnect or bus, or direct memory-access channels. A computer system may have many additional components, such as serial and parallel ports for connection to, e.g., modems, printers, and network adapters. Other components might further be used in conjunction with the foregoing; for example, a display adapter might be used to control a video display monitor, a memory controller can be used to access the system memory, etc.
A conventional processing unit includes a processor core having various execution units and registers, as well as branch and dispatch units which forward instructions to the appropriate execution units. Caches are commonly provided for both instructions and data, to temporarily store values that might be repeatedly accessed by a processor, in order to speed up processing by avoiding the longer step of loading the values from system memory (RAM). These caches are referred to as “on-board” when they are integrally packaged with the processor core on a single integrated chip. Each cache is associated with a cache controller or bus interface unit that manages the transfer of values between the processor core and the cache memory.
A processing unit can include additional caches, such as a level
2
(L
2
) cache which supports the on-board (level
1
) caches. In other words, the L
2
cache acts as an intermediary between system memory and the on-board caches, and can store a much larger amount of information (both instructions and data) than the on-board caches can, but at a longer access penalty. Multi-level cache hierarchies can be provided where there are many levels of interconnected caches.
A typical system architecture is shown in
FIG. 1
, and is exemplary of the PowerPC™ processor marketed by International Business Machines Corporation (IBM—assignee of the present invention). Computer system
10
includes a processing unit
12
a
, various I/O devices
14
, RAM
16
, and firmware
18
whose primary purpose is to seek out and load an operating system from one of the peripherals whenever the computer is first turned on. Processing unit
12
a
communicates with the peripheral devices using a system bus
20
(a local peripheral bus (e.g., PCI) can be used in conjunction with the system bus). Processing unit
12
a
includes a processor core
22
, and an instruction cache
24
and a data cache
26
, which are implemented using high speed memory devices, and are integrally packaged with the processor core on a single integrated chip
28
.
Cache
30
(L
2
) supports caches
24
and
26
via a processor bus
32
. For example, cache
30
may be a chip having a storage capacity of 256 or 512 kilobytes, while the processor may be a PowerPC™ 604-series processor having on-board caches with 64 kilobytes of total storage. Cache
30
is connected to bus
20
, and all loading of information from memory
16
into processor core
22
must come through cache
30
. More than one processor may be provided, as indicated by processing unit
12
b.
Values are stored in a computer using bits (binary digits), which can have a value of zero or one. A bit in a given cache block may contain an incorrect value, either due to a soft error (a random, transient condition caused by, e.g., stray radiation or electrostatic discharge) or to a hard error (a permanent condition, e.g., defective cell). One common cause of errors is a soft error resulting from alpha radiation emitted by the lead in the solder (C
4
) bumps used to form wire bonds with circuit leads. Most errors are single-bit errors, that is, only one bit in the field is incorrect.
If a cache block, such as in data cache
26
, contains an error (that cannot be corrected with, e.g., error-correcting code) then data cache
26
notifies processor core
22
of the parity error. While parity errors can be detected, there is currently no mechanism to enable the computer's operating system to identify offending instructions and provide context synchronization. As a result, for many systems, there is no software recovery mechanism defined and all parity errors result in a machine check, that is, a reboot of the system. “Rebooting” refers to the restarting of a computer system by reloading its most basic program instructions, viz., the operating system, and is very time-consuming. This limitation presents a serious quality issue for such systems, not only since it presents an inconvenience, but also because files or data can be lost. It would, therefore, be desirable to devise a method of allowing software recovery from, e.g., L
1
data cache parity errors. It would be further advantageous if the method were to allow full context synchronization to enable multiple software recovery schemes.
SUMMARY OF THE INVENTION
It is therefore one object of the present invention to provide an improved computer system having a processor which includes a cache memory.
It is another object of the present invention to provide such an improved computer system which allows software recovery of cache errors such as parity errors.
It is yet another object of the present invention to provide such a recovery mechanism for cache parity errors that enables context synchronization to reduce adverse effects of the failure.
The foregoing objects are achieved in a method of handling a cache parity error in a computer system, generally comprising the steps of constructing an interrupt service for suspending current operations of a processor of the computer system upon the occurrence of a defined condition other than a cache parity error, detecting a parity error in a cache associated with the processor, and reporting the parity error to the processor using the interrupt service. The described implementation is adapted for handling on-board (L
1
) cache parity errors arising from load instructions, although the invention can also be used to handle cache parity errors arising from store instructions. In a preferred embodiment, the parity error is reported by generating a data storage interrupt, and using a data storage interrupt status register (DSISR) to indicate that the data storage interrupt is a result of the parity error.
The invention advantageously allows synchronizing of the processor context. A machine status register of the processor (one of the save/restore registers), is used to hold a pointer for a next-to-complete instruction such that all instructions preceding the next-to-complete instruction have already completed execution, and no instruction subsequent to the next-to-complete instruction has begun execution. A flush of the corresponding cache block is performed in response to the data storage interrupt.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.


REFERENCES:
patent: 5283876 (1994-02-01), Tague
patent: 5375211 (1994-12-01), Maruyama et al.
patent: 5550988 (1996-08-01), Sarangdhar et al.
patent: 56

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Recovery mechanism for L1 data cache parity errors does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Recovery mechanism for L1 data cache parity errors, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Recovery mechanism for L1 data cache parity errors will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2584460

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.