Cache memory and system with partial error detection and...

Error detection/correction and fault detection/recovery – Pulse or data error handling – Digital data error correction

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S753000, C714S774000, C714S777000, C714S768000, C714S054000, C711S118000, C711S141000

Reexamination Certificate

active

06631489

ABSTRACT:

BACKGROUND OF THE INVENTION
It is axiomatic that data entering a data processor, whether it originates in a local memory, or is received from a remote source via a communication link, must be correct. For this reason many error detecting codes (EDC) and error correcting codes (ECC) have been developed to insure the integrity of the information to be processed. Common to all of these codes is redundancy, wherein additional bits are added to the information bits, as a function thereof, to permit the algorithm controlling the check bits to be recomputed at the destination for error detection and possible correction, if the code is sufficiently redundant. Computer memories are an example of a source of data entering a data processor where it is advantageous to use error detecting and error correcting codes. The most likely source of errors in computer memories is corruption of the data during the time the data is held in the memory. Such soft (intermittent) errors may be induced by background cosmic radiation and alpha particle bombardment.
It is well known in the prior art to add a parity bit to units of data being stored in computer memories to detect a single bit error in the data unit when the unit is read. Typically, a parity bit is added for each 8 bit byte of data in the memory. Thus, 9 bits of storage are used for each 8 bit byte of data storage provided. Parity protected memories are limited in that the process requesting faulty data only knows that the data is faulty. There is no general mechanism to allow the process to recover from the error. Most often, a memory fault requires that the process accessing the faulty memory line be terminated or that the system be rebooted. It is also well known in the prior art to add error correction codes to units of data being stored to detect and correct errors. This provides a system that can recover from detected errors. For example, a 32 bit computer word can be protected by adding a 6 bit ECC. The ECC allows all single bit errors to be detected and corrected. A 7 bit ECC detects and corrects single bit errors and also detects double bit errors.
One class of error correcting codes, known as Hamming codes, is described by R. W. Hamming in “Error Detecting and Error Correcting Codes”, Bell Systems Technical Journal, 29, 1950, pages 147-160. Hamming described several specific instances of Hamming codes. The specific codes described were single error detection codes (SED), single error correction codes (SEC), and single error correction, double error detection codes (SEC/DED). Error correcting code theory and applications are treated in the text “Error Control Coding, Fundamentals and Applications” by Lin et al., published by Prentice-Hall, 1982.
The correction capabilities of any code is dependent upon redundancy. In the simplest case of a SED, a single parity bit, the redundancy is very low and there are no correction possibilities. In fact, two compensating errors will not be detected, as the parity is unchanged. A Hamming SEC is more redundant, with the number of redundant ECC bits related to the number of information bits that are to be protected. Three ECC bits are required to provide SEC for two to four information bits. When error correcting codes are used to protect a small number of information bits, the redundancy becomes very high.
In standard coding theory notation, “k” represents the number of data bits in a code word, “r” represents the number of check bits in the code word, and “n” represents the total number of bits in a code word (n=k+r). According to Hamming, a single error correcting code must satisfy the equation 2
r
≧k+r+1.
A geometric model is sometimes used to examine problems of error detection and correction. The n bits of a code are viewed as the vertices of an n-dimensional unit cube. This is readily visualized for n=3. As shown in
FIG. 1
, a 3 dimensional unit cube
100
has eight vertices: (0,0,0), (0,0,1), (0,1,0), (0,1,1), (1,0,0), (1,0,1), (1,1,0), (1,1,1). Starting from (0,0,0) and moving the distance of 1 unit along an edge takes us to (0,0,1), (0,1,0), or (1,0,0). Moving a distance of 1 unit means that the value of one coordinate is changed. Starting from (0,0,0) and moving the distance of 2 units along an edge takes us to (0,1,1), (1,0,1), or (1,1,0). Moving a distance of 2 unit means that the value of two coordinates are changed.
An error is the unexpected change in value of one or bits in the code. A single bit error changes the value of one coordinate so that a single bit error code will be a distance of 1 from the original valid code symbol. If all the possible 2
n
code symbols of an n bit code are valid code symbols, then errors will produce other valid code symbols and no error detection or correction is possible.
To obtain an error detecting code or an error correcting code of n bits, the valid code symbols must be a subset of all the possible 2
n
code symbols. If all the valid code symbols are separated by a distance of 2, then a code symbol with a single bit error will change one coordinate and be at a distance of 1 from the original valid code symbol. However, the code symbol with a single bit error will also be at a distance of 1 from additional valid code symbols other than the original valid code symbol because valid code symbols are separated by a distance of 2. Therefore, a code where all the valid code symbols are separated by a distance of 2 is a single bit error detecting code. If the distance between the valid code symbols is increased to 3, then a code symbol with a single bit error will be at a distance of 1 from the original valid code symbol and at a distance of 2 or more from all other valid code symbols. Thus, a code where all the valid code symbols are separated by a distance of 3 is a single bit error correcting code.
An exemplary application of error correcting codes is in the area of cache memory coherency state bits. To speed memory access, computers often use cache memory, which is a small high speed memory that provides fast access to a copy of the data in current use. When more than one device can write to the memory, a mechanism must be provided to maintain cache coherency. The activity of one device may invalidate the contents of a cache or main memory as used by another device. One cache coherency protocol well-known in the art is the MESI Protocol. The name is derived from four states which are maintained for the status of each line in the cache memory: Modified, Exclusive, Shared, and Invalid (MESI). The state is maintained by status bits associated with each cache line. While two bits are sufficient to maintain the four states, the status is typically protected by an ECC because of the grave consequences of an error in these bits. Three additional bits are required to provide an ECC for two data bits. Therefore, five status bits per line of cache memory are required.


REFERENCES:
patent: 2552629 (1951-05-01), Hamming et al.
patent: 4541095 (1985-09-01), Vries
patent: 4713816 (1987-12-01), Van Gils
patent: 5903410 (1999-05-01), Blaum et al.
patent: 5960457 (1999-09-01), Skrovan et al.
patent: 6269465 (2001-07-01), Hill et al.
patent: 6442653 (2002-08-01), Arimilli et al.
R. W. Hamming, Error Detecting and Error Correcting Codes, The Bell System Technical Journal, vol. XXVI, Apr. 1950, pp. 147-160.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Cache memory and system with partial error detection and... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Cache memory and system with partial error detection and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cache memory and system with partial error detection and... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3126002

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.