Error detection/correction and fault detection/recovery – Pulse or data error handling – Digital data error correction
Reexamination Certificate
1998-09-24
2001-08-28
Decady, Albert (Department: 2133)
Error detection/correction and fault detection/recovery
Pulse or data error handling
Digital data error correction
C714S766000
Reexamination Certificate
active
06282686
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to error detection and correction and, more particularly, to error codes that detect and correct bit errors in computer memory systems.
2. Description of the Relevant Art
Error codes are commonly used in computer systems to detect and/or correct data errors, such as transmission errors or storage errors. For example, error codes may be used to detect and correct errors of data transmitted via a telephone line, a radio transmitter or a compact disk laser. Another common use of error codes is to detect and correct errors within data that are stored and read from a memory of a computer system. For example, error correction bits, or check bits, may be generated for data prior to storing data to one or more memory devices. When the data are read from the memory device, the check bits may be used to detect or correct errors within the data. Errors may be introduced either due to faulty components or noise within the computer system. Faulty components may include faulty memory devices or faulty data paths between devices within the computer system, such as faulty pins.
Hamming codes are one commonly used error code. The check bits in a Hamming code are parity bits for portions of the data bits. Each check bit provides the parity for a unique subset of the data bits. If an error occurs, i.e. one or more bits change state, one or more syndrome bits will be asserted (assuming the error is within the class of errors covered by the code). Generally speaking, syndrome bits are generated by regenerating the check bits and comparing the regenerated check bits to the original check bits. If the regenerated check bits differ from the original check bits, an error has occurred and one or more syndrome bits will be asserted. Which syndrome bits are asserted may also be used to determine which data bit changes state, and enable the correction of the error. For example, if one data bit changes state, this data bit will modify one or more check bits. Because each data bit contributes to a unique group of check bits, the check bits that are modified will identify the data bit that changed state. The error may be corrected by inverting the bit identified to be erroneous.
One common use of Hamming codes is to correct single bit errors within a group of data. Generally speaking, the number of check bits must be large enough such that 2
k−1
is greater than or equal to n, where k is the number of check bits and n is the number of data bits plus the number of check bits. Accordingly, seven check bits are required to implement a single error correcting Hamming code for 64 bits. A single error correcting Hamming code is able to detect and correct a single error. The error detection capability of the code may be increased by adding an additional check bit. The use of an additional check bit allows the Hamming code to detect double bit errors and correct single bit errors. The addition of a bit to increase the data detection capabilities of a Hamming code is referred to as an extended Hamming code. Extended Hamming codes are discussed in more detail below.
Component failures are one problem that arises in computer memory systems. A component failure may introduce multiple errors that are uncorrectable by the error code. For example, if eight bits of a block of data are stored in the same memory device, the failure of the memory device may introduce eight bit errors into that block of data. Accordingly, one component failure may introduce a sufficient number of errors that the error correction code is not able to detect or correct the error. Likewise, a data path failure between a memory component and error correction circuitry, such as a pin failure, may introduce multiple errors into a block of data for which the error correction code is used.
One potential solution to prevent a component error from introducing multiple errors into a group of data is to store the data such that only one bit of data within the group is affected by any one component. For example, in a group of data with 64 data bits and 7 check bits, each bit of data may be stored in a different memory device. In this embodiment, 71 memory chips are required. Each memory device would store one bit of the 71-bit data group. Unfortunately, allocating bits to a group of data based on the number of data bits and check bits may not optimize the use of check bits within the system.
It is a common design goal of computer systems to reduce the number of check bits used to detect and correct errors. The check bits increase the amount of data handled by the system, which increases the number of memory components, data traces and other circuitry. Further, the increased number of bits increases the probability of an error. Although the check bits may make an error detectable and/or correctable, increasing the number of data bits within the system increases the probability of an error occurring. For at least these reasons, it is desirable to decrease the number of check bits for a given level of error detection and/or correction. It is further desired to increase the error correcting capability of a single error correcting code with a minimal number of additional bits.
SUMMARY OF THE INVENTION
The problems outlined above are in large part solved by a technique for sharing parity over multiple single error correcting code words in accordance with the present invention. The bits of a data block are assigned to a plurality of logical groups such that at most one bit from any component group is assigned to a logical group. This assignment insures that a component failure may introduce at most one bit error to a logical group.
Each logical group uses a single error correcting code to detect and correct bit errors. A parity bit is appended to a data block that includes a plurality of logical groups. The parity bit may be used in conjunction with the single error correcting codes to determine whether a detected error is a single bit error or a multiple bit error. If the detected error is a single bit error, the error correction codes may be used to correct the error. If the detected error is a multiple bit error, an uncorrectable error may be reported. By using one parity bit per data block, the number of bits required to extend the error detection capability of an error detection code may be reduced.
Broadly speaking, the present invention contemplates a method of increasing the error detection capability of a single error correction code, comprising: generating a global check bit for a block of data wherein the block of data includes a plurality of groups of data and each group of data employs the single error correction code; detecting one or more bit errors within each group of data using the single error correction codes; determining whether the one or more bit errors detected using the error correction codes are single bit errors using the parity bit; and correcting the bit one or more errors if the one or more errors are single bit errors.
The present invention further contemplates method of detecting errors in a data block of a computer system that includes a plurality of components, comprising: assigning the bits of the data block to a plurality of logical groups such that at most one bit from a component is assigned to a logical group; generating check bits for each group of data using single error correction codes; generating a global check bit for the data block; detecting one or more bit errors within each logical group using the single error correction codes; determining whether the one or more bit errors detected using the error correction codes are single bit errors using the global check bit; correcting the bit errors if the errors are single bit errors.
The present invention still further contemplates a memory system including a plurality of memory devices configured to store a data block, and an error detection circuit coupled to the plurality of memory devices. Each bit of the data block is assigned to one of the plurality of memory devices. The bits of the da
Conely, Rose & Tayon, PC
De'cady Albert
Kivlin B. Noäl
Sun Microsystems Inc.
Ton David
LandOfFree
Technique for sharing parity over multiple single-error... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Technique for sharing parity over multiple single-error..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Technique for sharing parity over multiple single-error... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2497066