High performance fault tolerant memory system utilizing...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S133000

Reexamination Certificate

active

06732291

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates in general to memory configurations for computing systems, and in particular to fault detection and correction. More specifically, the present invention relates to a method and system for providing a high performance fault tolerant memory system utilizing greater than four-bit data word memory arrays.
2. Description of the Related Art
Memory systems employed in conventional data processing systems, such as computer systems, typically include large arrays of physical memory cells that are utilized to store information in a binary manner. Generally in a conventional memory system, all of the memory cells on a memory chip are disposed in one or more memory arrays having a set number of rows and columns. Operatively, the rows are selected by row decoders that are typically located adjacent to the ends of the row lines. Each of the row lines is electrically connected to the row decoders so that the appropriate signals can be received and transmitted.
The columns of the memory array are connected to input/output (I/O) through column decode devices. In the case of dynamic random access memories (DRAMs), the memory array columns are also connected to line precharging circuits and sense amplifiers at the end of each column line to periodically sense amplify and restore data in the memory cells.
There are two kinds of errors that can typically occur in a memory system, soft errors and hard errors. A soft error is a seemingly random inversion of stored data. This inversion may be caused by occasional electrical noise, environmental conditions and, in some cases, by bombardment of radioactive particles, the so-called alpha particle event. The soft error problem has increased as the individual cell sizes of the memory arrays have been reduced increasing their susceptibility to relatively low amounts of noise. Although soft error failure rates are generally 2-3 times the order of magnitude higher than hard error failure rates in DRAM arrays, soft error failures typically only cause single bit errors in memory system words. A hard error, in contrast, represents a permanent electrical failure of the memory array, often restricted to particular memory locations but may also sometimes associated with peripheral circuitry of the memory array so that the entire array can be affected. Naturally, designers of memory arrays have strived to reduce the occurrence of both hard and soft errors in their memory arrays. However, both types of errors have not been completely eliminated and, indeed, it is not believed that they can be eliminated. Designing to achieve high reliability beyond a certain point can be done only at the expense of reduced performance and increased cost.
One solution for detecting and correcting both hard and soft errors has been the implementation of error correction codes (ECC) in large computer memories. The fundamentals of error detecting and correcting are described by R. W. Hamming in a technical article entitled “Error Detecting and Error Correcting Codes” appearing in the Bell System Technical Journal, Volume 26, No. 2, 1950 at pages 147-160. Utilizing one of the most popular Hamming codes, an 8 bit data word is encoded to a 13-bit word according to a selected Hamming code. A decoder can process the 13-bit word and correct any 1 bit error in the 13 bits and can detect if there are 2-bit errors. The described code, thus, is classified as SEC/DED (single error correct/double error detect). The use of such codes has been particularly efficient for memory arrays having single-bit outputs. For instance, if a relatively simple computer were to have 16K (16,348) bytes of data where each byte contains 8 data bits, an efficient error-protected design would utilize thirteen 16K×1 memory arrays with the extra five 16K memory arrays providing a Hamming SEC/DED protection. The Hamming code not only can correct a single bit hard or soft random error occurring in any byte, but can also further correct any one failed 16K memory array since any one memory array contributes only 1 bit per each error-protected word.
The above-described 13-bit Hamming code can only correct one error, whether it be a hard error or a soft error. Consequently, if one memory array has suffered a hard failure in all its locations, then the remaining memory arrays are not protected against an occasional soft error although it could be detected but not corrected. To be able to detect and correct more than one error, more elaborate error correcting codes have been developed and implemented. However, as a general rule, the more errors that can be corrected in a word, the more extra check bits are required by the check code.
Presently, memory arrays typically contain 256 Mbit devices and the trend is towards production of memory arrays that will contain 1 Gbit within two to four years. With the anticipated increase in memory array sizes, the present approach of utilizing 1 or 4 bit wide memory chip organization must be reconsidered. For example, employing the present 1 or 4 bit memory chip organization with a 32 bit wide data word will require 32 memory arrays (1 bit organization) or 8 memory arrays (4 bit organization). This will, in turn, result in a minimum granularity in, e.g., a personal computer (PC), of 8 GB or 2 GB, respectively. This large amount of memory in a desktop or laptop computer is excessive and also has the added disadvantage of increasing the overall cost of the computer system. In response to the minimum granularity problem, memory array manufacturers are moving to 8, 16 and even 32 bit wide memory organization schemes with the corresponding increase in the number of check bits required for error detection and correction.
Unfortunately, Hamming codes require several check bits to accomplish the error detection and correction. As discussed above, an eight-bit data word requires five check bits to detect two-bit errors and correct one-bit errors. As the bus grows wider and the number of bits of transmitted data increases, the number of check bits required also increases. Because modern memory buses are often 64 or 128 bits wide, the associated Hamming code would require substantially more check bits and increasing levels of logic circuits to implement the error correction code. Consequently, using powerful Hamming codes in large memory systems is expensive and consumes substantial memory resources. Additionally, these designs will result in slower memory access time due to the increasing number of logic circuits and logic levels necessary to implement the support and error correction code.
Accordingly, what is needed in the art is an improved error detection and correction scheme that mitigates the above-described limitations in the prior art. More particularly, what is needed in the art is a high performance fault tolerant memory system utilizing memory arrays organized to provide greater than four-bits.
SUMMARY OF THE INVENTION
It is therefore an object of the invention to provide an improved memory system.
It is another object of the invention to provide a fault tolerant memory system utilizing greater than four bit data word memory arrays.
To achieve the foregoing objects, and in accordance with the invention as embodied and broadly described herein, a method for providing a fault tolerant memory system having a number of memory arrays that includes at least one spare memory array and utilizing a data word organization of greater than 4 bits is disclosed. The method includes detecting a multi-bit word error in a memory array. In one advantageous embodiment, a single package detect (SPD) logic, for detecting a package error of 1-4 bits, is utilized to identify the failed memory array. Next, the content of a first row of cells in the failed memory array is read and a first complement of the content is generated. Subsequently, the first complement is written back to the first row of cells in the failed array. A second read operation is then initiated to retrieve the first complement from the failed memory array, follow

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

High performance fault tolerant memory system utilizing... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with High performance fault tolerant memory system utilizing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and High performance fault tolerant memory system utilizing... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3218499

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.