High speed memory system capable of selectively operating in...

Error detection/correction and fault detection/recovery – Pulse or data error handling – Digital data error correction

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S710000

Reexamination Certificate

active

06370668

ABSTRACT:

FIELD OF THE INVENTION
A memory system characterized by high speed data throughput on a bus between a memory controller and an associated plurality of memory devices is disclosed, wherein the memory system is capable of selectively operating in non-chip-kill and chip-kill modes. A method of selectively operating the memory system is either non-chip-kill or chip-kill modes is also disclosed.
BACKGROUND OF THE INVENTION
During the last several decades, memory technology has progressed dramatically. The density of commercial memory devices, taking Dynamic Random Access Memory (DRAM) as a convenient example, has increased from 1 Kbit to 64 Mbits per chip, a factor of 64,000. Unfortunately, memory device performance has not kept pace with increasing memory device densities. In fact, memory device access times during the same time period have only improved by a factor of 5. By comparison, during the past twenty years, microprocessor performance has increased by several orders of magnitude. This growing disparity between the speed of microprocessors and that of memory devices has forced memory system designers to create a variety of complicated and expensive hierarchical memory techniques, such as Static Random Access Memory (SRAM) caches and parallel DRAM arrays. Further, now that computer system users increasingly demand high performance graphics and other memory hungry applications, memory systems often rely on expensive frame buffers to provide the necessary data bandwidth. Increasing memory device densities satisfy the overall quantitative demand for data with fewer chips but the problem of effectively accessing data at peak microprocessor speeds remains.
Overlaying the problem of data access speed, some computer systems have particularly high requirements for availability and reliability. Central data processing systems at banks and financial institutions, Internet service providers, and telecommunications control systems are ready examples of computer systems which simply can not fail when accessed by a user. The inevitable occurrence of memory device failures within such computer systems has lead to the development of numerous methods and features whereby memory device failures are detected and corrected without shutting down the computer system. One such method is called “chip-kill.”
Conventional chip-kill will be explained with reference to FIG.
1
.
FIG. 1
illustrates a conventional memory system with the architectural changes required to implement chip-kill. In
FIG. 1
, four memory devices
10
are arranged along a data bus
12
. In the example, each memory device is a Dual In-Line Memory Module (DIMM) including
18
DRAMs, each DRAM communicating 4 data bits to/from data bus
12
(i.e., 18×4 DRAMs). For clarity, only the data line connections for a single DRAM are shown. This example assumes four (4) groups of 72 bits each (of which 64 bits are data to be returned to the requestor and 8 bits are used for error correction) are communicated by the memory system, thus transferring 256 bits of data to a requester, normally a controller or microprocessor connected to the memory system. Notably, in the conventional chip-kill memory system two quantities of data are returned by each memory device during a read operation: (i) 16 bytes of data to be returned to the requester, and (ii) an 2 additional bytes of data used for error detection and correction. These additional 2 bytes of data are called “syndrome.”
Syndrome is used in error detection and correction algorithms to determine whether data from a given memory device contains one or more errors. Some algorithms merely detect the presence of data error(s). Other algorithms have the ability to actually correct one or more detected errors. Single-error-correct/double-error-detect (SECDED) algorithms are well understood by those of ordinary skill in the art. Many other conventional error detection and correction algorithms are known, but as a rule the requirement for additional bits of syndrome increases with the increasing sophistication of the algorithm, i.e., the ability of an algorithm to detect and correct data errors depends on the quantity of associated syndrome provided. For one type of SECDED algorithm, the relationship between data and associated syndrome is well known: the number of syndrome bits increases as the log of the number of data bits. So, 64 bits of data require 8 bits of syndrome, 128 bits of data require 9 bits of syndrome, 256 bits of data require 10 bits of syndrome, etc.
Returning to
FIG. 1
, each of the four memory device returns 18 bits of data. Thus, 288 bits (256 bits of data and 32 bits of syndrome) are actually read during a read operation. In the example, 8 bits of syndrome are applied to each one of four error correcting code (ECC) generators
14
along with 64 bits of data. Using a known SECDED algorithm, this is enough syndrome to detect up to two bit errors in the 64 bits of data, and correct one bit error.
By having each DRAM in the example supply one bit of data to each ECC generator, the failure of one DRAM can be tolerated since each ECC generator will detect and be able to correct the resulting bit error. Once error detection and correction is complete each ECC generator
14
strips syndrome from the data and communicates the data to the requestor. During a write operation, the opposite flow of data occurs. A 256 bit block of data is presented by the requester to the memory system and divided between ECC generators
14
into separate 64 bit blocks of data. Each ECC generator computes the required syndrome bit values and adds syndrome data to the 64 bits of data. The resulting 72 bits data block is then stored in memory devices
10
.
Error detection and correction by the ECC generators
14
is typically monitored within the computer system. Should any one DRAM fail, the system may “replace” the failed DRAM with a spare (not shown). This replacement process may be performed in background processing while the computer system remains available to users. In the unlikely event of simultaneous failures in two DRAMs, the computer system in the foregoing example could detect the two failures, but remedial action would require maintenance intervention. Such a happenstance would force a system shut-down or switch over to a back-up system. A more powerful error correction algorithm, one capable of correcting two bit errors, would avoid this event.
In sum, conventional memory systems implementing chip-kill read and write both data and syndrome to an ECC generator(s) during each operation. Further, the amount of syndrome furnished by each DRAM to individual ECC generators is dependent on the type of error detection and correction algorithm being used by the computer system. More powerful error detection and correction algorithms require more syndrome bits.
As can be seen from the foregoing example, conventional memory systems use a large number of data lines, or a relatively wide bus. The term “line(s)” is used to describe the physical mechanism by which data bits are electronically communicated from one point to another in a system. A line may take the form, alone or in combination, of a printed circuit board (PCB) strip, metal contact, pin and/or via, microstrip, semiconductor channel, etc. A line may be single or may be associated with a bus. A “bus” is a collection, fixed or variable, of lines, and may also be used to describe the drivers, laches, buffers, and other elements associated with an operative collection of lines. A bus may communicate control information, address information, and/or data. In the foregoing example, four sets of 72 data bit lines connect the memory devices
10
and ECC generators
14
. On the other side of the ECC generators, four sets of 64 data bit lines combine to form a 256 bit wide data bus.
Such massively parallel, or wide buses, are required in conventional memory systems due to the slow access speed of memory devices. Wide buses have long been associated with implementation and performance problems, such as excessive power consumption, slow speed, loss

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

High speed memory system capable of selectively operating in... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with High speed memory system capable of selectively operating in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and High speed memory system capable of selectively operating in... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2912644

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.