Processor/memory device with integrated CPU, main memory,...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S005000, C711S105000, C711S123000, C711S133000, C711S144000, C711S168000

Reexamination Certificate

active

06199142

ABSTRACT:

The present invention relates generally to integrated processor/memory (P/M) devices with an on-chip cache and an on-chip main memory. In particular, it pertains to a P/M device with an on-chip cache that is as wide as the on-chip main memory (i.e., is full width).
BACKGROUND OF THE INVENTION
Traditionally, the development of processor and memory devices has proceeded independently. Advances in process technology, circuit design, and integrated chip (IC) architecture have led to a near exponential increase in processor speed and memory capacity. However, memory device latencies have not improved as dramatically and access times are increasingly becoming the limiter of processor performance. This is a problem known as the Memory Wall and is more fully described in
Hitting the Memory Wall: Implication of the Obvious,
by William A. Wulf and Sally A. McKee, ACM Computer Architecture News, Vol. 23, No. 1, March 1995, which is hereby explicitly incorporated by reference.
Current high performance processors, which use complex superscalar central processing units (CPUs) that interface to external off-chip main memory through a hierarchy of caches, are particularly affected by the Memory Wall problem. In fact, this CPU-centric design approach requires a large amount of power and chip area to bridge the gap between CPU and memory speeds.
The Memory Wall problem is commonly addressed by adding several levels of cache to the memory system so that small, high speed, static random access memory (SRAM) devices feed the CPU at low latencies. Combined with latency hiding techniques, such as prefetching and proper code scheduling, it is possible to run a high performance processor at reasonable efficiencies for applications with enough locality for the caches. However, while achieving impressive performance on applications that fit nicely into their caches, these processors have become increasingly application sensitive. For example, large applications such as CAD programs, data base applications, or scientific applications often fail to meet CPU based speed expectations by a wide margin.
Moreover, the CPU-centric design approach has lead to very complex superscalar processors with deep pipelines. Much of this complexity, such as out-of-order execution and register scoreboarding, is devoted to hiding memory system latency. In addition, these processors demand a large amount of support logic in terms of caches, controllers and data paths to talk to the external main memory. This adds considerable cost, power dissipation, and design complexity.
To fully utilize a superscalar processor, a large memory system is required. The effect of this is to create a bottleneck that increases the distance between the CPU and main memory. Specifically, it adds interfaces and chip boundaries which reduce the available memory bandwidth due to packaging and connection constraints.
However, integrating the processor with the memory device avoids most of the problems of the CPU-centric design approach. And, doing so offers a number of advantages that effectively compensate for the technological limitations of a single chip design.
Specifically, in CPU-centric processor designs, the instruction and data cache lines have a width that is significantly less than the width of the main memory. This is primarily due to the fact that the time to fill these cache lines from the off-chip main memory would introduce severe second order contention effects at the memory interface of the processor. As a result, such less than full width caches are unable to take advantage of the often high spatial locality of instruction and data streams.
Thus, there is a need for full width instruction and data caches that take advantage of the high spatial locality of instruction and data streams in many applications. Moreover, the corresponding U.S. Pat. No. 5,900,011, issued May 4, 1999, and hereby explicitly incorporated by reference, describes and claims the use of a victim data cache to further improve the miss rate of such a full width data cache.
SUMMARY OF THE INVENTION
In summary, the present invention is an integrated processor/memory device. It comprises a main memory, a CPU, and a full width cache.
The main memory has a predefined address space and comprises main memory banks. Each of the main memory banks occupies a corresponding portion of the address space and stores rows of words at memory locations with addresses in the corresponding portion of the address space. The rows are a predetermined number of words wide.
The cache comprises cache banks. Each of the cache banks is coupled to a corresponding main memory bank of the main memory banks and the CPU. Each of the cache banks comprises a cache bank line storage, a cache bank tag storage, and cache bank logic. The cache bank line storage is coupled to the corresponding main memory bank and stores one or more cache lines of words. Each of the cache lines has a corresponding row in the corresponding main memory bank. The cache lines are the predetermined number of words wide. The cache bank tag storage stores a corresponding tag for each of the cache lines. Each of the tags identifies the row in the corresponding memory bank of the corresponding cache line. The cache bank logic is coupled to the CPU, the corresponding memory bank, and the cache storage. When the CPU issues an address in the address space of the corresponding main memory bank, the cache bank logic determines from the address and the tags of the cache lines whether a cache bank hit or a cache miss has occurred in the cache bank line storage. When a cache bank miss occurs, the cache bank logic replaces a victim cache line of the cache lines with a new cache line that comprises the corresponding row of the corresponding memory bank specified by the issued address.


REFERENCES:
patent: 4577293 (1986-03-01), Matick et al.
patent: 4899275 (1990-02-01), Sachs et al.
patent: 5184320 (1993-02-01), Dye
patent: 5345576 (1994-09-01), Lee et al.
patent: 5353429 (1994-10-01), Fitch
patent: 5510934 (1996-04-01), Brennan et al.
patent: 5581725 (1996-12-01), Nakayama
patent: 5649154 (1997-07-01), Kumar et al.
patent: 5848004 (1998-12-01), Dosaka et al.
Iwata, S. et al., “Performance Evaluation of a Microprocessor with On-chip DRAM and High Bandwidth Internal Bus”IEEE 1996 Custom Integrated Circuits Conference, pp. 269-272 (1996).
ADSP-21060 SHARC Microcomputer Family,Super Harvard Architecture Computer, Analog Devices, Norwood, MA, Oct. 1993.
Wulf, William A., et al., “Hitting the Memory Wall: Implications of the Obvious”,ACM Computer Architecture News, vol. 23 (1):pp.20-24, Mar. 1995.
Nowatzyk, A., et al., “The S3.mp Scalable Shared Memory Multiprocessor”,Proc. of the 24th Int'l. Conference on Parallel Processing, 1995.
Nowatzyk, A., et al., “S-Connect: from Networks of Workstations to Supercomputer Performance”,Proc. Of the 22nd Int'l. Symp. On Computer Architecture, Jun. 1994.
Nowatzyk, A., et al. “Exploting Parallelism in Cache Coherency Protocol Engines”,Europar 1995, Stockholm, Sweden.
Jouppi, Norman P., “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers”,Proceedings of the 17th Annual Int'l. Sympo. On Computer Architecture, pp. 364-373, 1990.
Fillo, M., et al. “The M-Machine Multicomputer”,Artificial Intelligence Lab MIT, Cambridge, MA, Mar. 1995.
Kogge, P.M., et al. “Execube-—A New Architecture for Scaleable MPPs”,Int'l. Conference on Parallel Processing, 1994.
T. Shimizu, et al., “A Multimedia 32b RISC Microprocessor with 16Mb DRAM”,IEEE, 216-217 & 448 (1996).
Aimoto, Y., et al., “A 7.68GIPS 3.84 GB/s 1W Parallel Image-Processing RAM Integrating a 16Mb DRAM and 128 Processors”,IEEE, 372-373 (1996).
“The IMS T800 Transputer”,IEEE Micro, 10-26:vol. 7(5) (Oct. 1987).
“Mitsubishi Debuts Industry's First Microprocessor with On-Chip DRAM”, Mitsubishi Electronics America, INc., Mar. 12, 1996.
Jim Handy, “The Cache Memory book,” Academic Press, pp. 37-107 and 120-125, 1993.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Processor/memory device with integrated CPU, main memory,... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Processor/memory device with integrated CPU, main memory,..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Processor/memory device with integrated CPU, main memory,... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2477245

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.