Multi-bank, fault-tolerant, high-performance memory...

Electrical computers and digital processing systems: memory – Addressing combined with specific memory configuration or... – For multiple memory modules

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S202000, C711S203000, C711S217000

Reexamination Certificate

active

06748480

ABSTRACT:

FIELD OF THE INVENTION
This invention relates generally to a memory addressing system and method and, in particular, to a memory addressing system and method that provides high-performance access to a multi-bank memory having an arbitrary number of banks.
BACKGROUND OF THE INVENTION
For a conventional memory design to achieve the highest performance, the memory space is composed of individual components, typically called banks, whose number is typically a power of 2. The memory space is “interleaved” among the banks, meaning that consecutive addresses are typically mapped to different banks. The bank number in this case may be determined by simply looking at the lowest bits in the address, A: i.e., Bank=A mod 2
b
, where 2
b
is the number, N, of banks. This approach has been used in high performance systems using as many as 512 banks of memory. Increasing the number of memory banks generally increases the throughput of memory and thus the bandwidth from the memory system to the processing unit. This throughput has traditionally been the weakest point in computer operations.
A known problem with this memory representation lies in the performance degradation it incurs when accessing arrays, or other data structures, with a stride which is even or divisible by a higher power of 2. For example, in a 16-bank system, accesses of stride 16 result in the worst performance, since only one of 16 banks is accessed. In many practical applications, array accesses have strides divisible by a high power of 2. For example, in matrices of sizes 2
m
×2
m
, for m≧b, column accesses give only 1/N of the peak performance, since whole columns reside in the same memory bank. Similar performance degradation occurs for other types of explicit patterns of accesses (i.e., explicitly defined sequences of accesses, which are commonly referred to as regular access sequences, e.g., diagonal accesses in the above matrices).
Another problem with this memory representation is its lack of fault tolerance, especially for memory devices having multiple memory banks inside a single chip. In such a single chip device, a single bad bank (i.e., a bank having at least one unusable memory location) typically results in either the whole memory device being discarded or the number of banks used being cut to the next lower power of two. This problem is particularly significant because the yield on new memory parts can be very low; thus, a part of the capacity of a plant that manufactures such devices is devoted to making unsellable product. The problem is further exacerbated in the new generation of multiprocessor chips having embedded memory units. Such a chip may, for example, comprise 2
b
microprocessors and 2
b
memory units (with, for example, 1 to 8 Mbits of DRAM per each unit), communicating with each other over, ideally, a full 2
b
×2
b
crossbar switching network. The memories in such chips may be treated in a shared memory model as a flat address space of 2
b
·2
m
memory locations, where 2
m
is the size of each individual memory unit. Embedded memory chips are much more complex than ordinary memory units; accordingly, the cost of discarding or downgrading such chips is correspondingly greater than the cost of doing so for ordinary memory units.
Attempts to solve these problems have not been entirely successful.
For example, RAMBUS, and other similar technologies, attempt to alleviate the processor-memory bottleneck by providing faster memory operations to a non-banked memory or simply interleaved multi-bank memory. However, improvements are seen primarily for contiguous memory requests only. In addition, as the speed of processing units increases dramatically, the bottleneck remains.
Another technique, addressed particularly to the bank conflict problem, is described in P. P. Budnick and D. J. Kuck, “The organization and use of parallel memories,” IEEE Trans. Computers, 20, pp. 1566-1569 (1971). Budnick et al. suggests implementing a memory using p banks of memory, where p is a prime number. In this case, bank conflicts for linear arrays can only occur for strides divisible by p. This, arguably, makes bank conflicts less likely in practice. However, there is a significant increase in the decoding logic—in particular, a full integer division by p circuit is required. For a requested address A, the remainder, A mod p, gives the address' bank, while the quotient, A/p, gives the physical address within the bank. The early BSP (Burroughs Scientific Processor) had this type of memory system, with p=17. In addition to the increased decoding logic, this kind of solution is inadequate because limiting the number of banks to a prime number is too restrictive—for reasons of, e.g., placement, routing and interfaces, a non-prime number of banks, especially a power of 2, is a preferred choice.
Another remedy to the bank conflict problem is to use a pseudo-random number generator to generate a mapping between a logical address A and a corresponding bank. One such system is described, for example, in R. Raghavan, J. P. Hayes, “On randomly interleaved memories,”
Proceedings of Supercomputing
, pp. 49-58, 1990. A pseudo-random generator generates a random sequence of output values for an ordered sequence of input values, but will always produce the same output value for a given input value. One problem with this technique is that it produces bank conflicts for stride 1 accesses. Stride 1 accesses are the most common access patterns in most computer applications (occurring for example when reading an instruction stream) and any significant degradation in memory performance for such accesses is therefore unacceptable. The general problem is that a pseudo-random, or truly random, mapping produces, on average, bank conflicts in not less than 1/e% (i.e., 36.78 . . . %) of accesses (where e is the base of the natural log), even for large N. This tends to substantially reduce peak performance. Additionally, certain known pseudo-random number generators may not uniformly map the address space across all banks (i.e., some banks may have more addresses mapped to them than others), which in turn increases bank conflicts and reduces performance.
Thus, while simple address translation schemes (the standard interleaving scheme or various schemes derived from Budnick-Kuck translation) create simple periodic sequences of bank numbers for sequences of fixed stride patterns (and thus suffer repeatable bank conflicts at many strides), general address scrambling mappings produce random sequences of bank numbers for arbitrary exact access sequences. These sequences of bank numbers, where the corresponding physical addresses reside, have, as explained above, statistically significant bank conflicts (within the sequence of N addresses), and, being sufficiently randomized, do not have a period less than the size of the address space to which the scrambling is applied.
Accordingly, a low complexity, fault tolerant scrambling technique that would generally provide conflict-free accesses for stride 1 access patterns, other explicit access patterns of particular importance, such as even stride patterns, power of 2 stride patterns, or diagonal and other access patterns of interest, is thus extremely desirable.
SUMMARY OF THE INVENTION
It is therefore an object of the present invention to provide a multi-bank memory addressing system and method which generally provides no bank conflicts for stride 1 access patterns and infrequent bank conflicts for other access patterns of interest. In one embodiment, a memory device is provided having a plurality, N, of memory banks comprising a plurality of addressable memory locations. Each memory location has a logical address and a corresponding physical address, the physical address comprising a memory bank number and a local address within the memory bank. The memory device comprises an address mapping system, including an address translation unit, that derives, for each logical address, the corresponding physical address. In a preferred embodiment, the address translation unit o

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Multi-bank, fault-tolerant, high-performance memory... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Multi-bank, fault-tolerant, high-performance memory..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multi-bank, fault-tolerant, high-performance memory... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3360812

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.