Method and cache-coherence system allowing purging of...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S120000, C711S124000, C711S138000

Reexamination Certificate

active

06681293

ABSTRACT:

FIELD OF THE INVENTION
This invention relates generally to computer cache memories, and more particularly to a cache-coherence system and a method for allowing purging of mid-level cache entries without purging lower-level cache entries.
COPYRIGHT NOTICE/PERMISSION
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawings hereto: Copyright© 2000, Silicon Graphics Incorporated, All Rights Reserved.
BACKGROUND OF THE INVENTION
Parallel computer systems provide economic, scalable, and high-availability approaches to computing solutions. From the point of view of managing computer systems including parallel-processor systems, there is a need for a cache coherence system and control in order to obtain desired system operation.
Conventional hierarchical cache systems provide small fast cache memories next to fast information processing units, and larger slower memories that are further away in time and space. It is too expensive to make a fast memory large enough to hold all of the data for a large computer program, and when memories are made larger, the access times slow down and heat dissipation also becomes a problem.
Modern computer systems thus typically include a hierarchy of memory systems. For example, a processor might have an L0 cache on the same chip as a processor. This L0 cache is the smallest, perhaps 16 to 256 kilobytes (KB), and runs at the fastest speed since there are no chip-boundary crossings. An L1 cache might be placed next to the processor chip on the same chip carrier. This L1 cache is the next smallest, perhaps 0.5 to 8 megabytes (MB), and runs at the next fastest speed since there are chip-boundary crossings but no card-boundary crossings. An L2 cache, if implemented, might be placed next to the processor card in the same box but on a different chip carrier. This L2 cache is typically still larger than the L1 and runs at the next fastest speed since there are card-boundary crossings but no box-boundary crossings. A large main memory, typically implemented using RDRAMs (RAMBUS™ dynamic random-access memories) or DDR SDRAMs (double-data-rate synchronous dynamic random-access memories) is then typically provided. Beyond that, a disc array provides mass storage at a slower speed than main memory, and a tape farm can even be provided to hold truly enormous amounts of data, accessible within seconds, minutes or hours. At each level moving further from the processor, there is typically a larger store running at a slower speed. For each level of storage, the level closer to the processor thus contains a proper subset of the data in the level further away. For example, in order to purge data in the main memory leaving that data in the disc storage, one must first purge all of the portions of that data that may reside in the L0, L1, and/or L2 levels of cache. Conventionally, this may not lead to any performance problems, since the processor is finished with the data by the time that the main memory is purged.
However, as more processors and more caches are added to a system, there can be more competition for scarce cache resources. There is a need to maintain coherence of data (i.e., ensuring that as data is modified, that all cached copies are timely and properly updated) among the various cache types, levels, and locations. Thus there is a need for improved methods and apparatus to improve system performance while also maintaining system integrity and cache coherence.
SUMMARY OF THE INVENTION
The present invention provides solutions to the above-described shortcomings in conventional approaches, as well as other advantages apparent from the description and appendices below.
The present invention provides a method and apparatus for purging data (e.g., a first cache line) from a middle cache level without purging the corresponding data from a lower cache level (i.e., a cache level closer to the processor using the data), and replacing the purged first data in the middle-level cache with other data (e.g., with another cache line) of a different memory address than the purged first data, while leaving the data of the first cache line in the lower cache level. In some embodiments, in order to allow such mid-level purging, the first cache line must be in the “shared state” that allows reading of the data, but does not permit modifications to the data. If it is desired to modify the data, a directory facility will issue a purge to all caches of the shared-state data for that cache line, and then the processor that wants to modify the data will request an exclusive-state copy to be fetched to its lower-level cache and to all intervening levels of cache. Later, when the data in the lower cache level is modified, the modified data can be moved back to the original memory from the caches.


REFERENCES:
patent: 4477713 (1984-10-01), Cook et al.
patent: 4514749 (1985-04-01), Shoji
patent: 4587445 (1986-05-01), Kanuma
patent: 4823184 (1989-04-01), Belmares-Sarabia et al.
patent: 4896272 (1990-01-01), Kurosawa
patent: 4926066 (1990-05-01), Maini et al.
patent: 5295132 (1994-03-01), Hashimoto et al.
patent: 5315175 (1994-05-01), Langner
patent: 5394528 (1995-02-01), Kobayashi et al.
patent: 5416606 (1995-05-01), Katayama et al.
patent: 5481567 (1996-01-01), Betts et al.
patent: 5490252 (1996-02-01), Macera et al.
patent: 5506953 (1996-04-01), Dao
patent: 5521836 (1996-05-01), Hartong et al.
patent: 5535223 (1996-07-01), Horstmann et al.
patent: 5544203 (1996-08-01), Casasanta et al.
patent: 5555188 (1996-09-01), Chakradhar
patent: 5603056 (1997-02-01), Totani
patent: 5604450 (1997-02-01), Borkar et al.
patent: 5617537 (1997-04-01), Yamada et al.
patent: 5657346 (1997-08-01), Lordi et al.
patent: 5682512 (1997-10-01), Tetrick
patent: 5757658 (1998-05-01), Rodman et al.
patent: 5778429 (1998-07-01), Sukegawa et al.
patent: 5784706 (1998-07-01), Oberlin et al.
patent: 5787268 (1998-07-01), Sugiyama et al.
patent: 5793259 (1998-08-01), Chengson
patent: 5811997 (1998-09-01), Chengson et al.
patent: 5828833 (1998-10-01), Belville et al.
patent: 5844954 (1998-12-01), Casasanta et al.
patent: 5847592 (1998-12-01), Gleim et al.
patent: 5898729 (1999-04-01), Boezen et al.
patent: 5910898 (1999-06-01), Johannsen
patent: 5915104 (1999-06-01), Miller
patent: 6005895 (1999-12-01), Perino et al.
patent: 6016553 (2000-01-01), Schneider et al.
patent: 6314491 (2001-11-01), Freerksen et al.
patent: 6314498 (2001-11-01), Arimilli et al.
patent: 6360301 (2002-03-01), Gaither et al.
patent: 6397302 (2002-05-01), Razdan et al.
patent: 6412056 (2002-06-01), Gharachorloo et al.
patent: 6415362 (2002-07-01), Hardage et al.
patent: 6493801 (2002-12-01), Steely et al.
“Low Power Quad Differential Line Driver with Cut-Off”,National Semiconductor, F100K ECL 300 Series Databook and Design Guide, pp. 2-54-2-60, (1992).
“The SA27 library includes programmable delay elements Delaymuxo and Delaymuxn. How are these cells used!”,IBM Delaymuxn Book, (Feb. 1999),pp. 1-6.
Brewer, Kevan, “Re: Memory mapped registers”, (Online): comp.arch.embedded, (May 2, 1996).
Djordjevic, A. R., et al., “Time Domain Response of Multiconductor Transmission Lines”,Proceedings of the IEEE, 75(6), (Jun. 1987), 743-64.
Gjessing, et al., “Performance of the RamLink Memory Achritecture”,Proceedings HICSS'94, (1994), 154-162.
Gjessing, Stein , et al., “RamLink: A High-BandwidthPoint-to-Point Memory Architecture”,Proceeding CompCon, (1992),328/331.
“IEEE Standard for High-Bandwidth Memory Interface Based on Scalable Coherent Interface(SCI) Signaling Technology (RAMLink)”,IEEE Std 1596.4-1996, (1996), 1-91.
Im, G. , et al., “Bandwidth-Efficient Digital Transmission over Unshielded Twisted-Pair Wiring”,IEEE Journal on Selected Areas in Commu

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and cache-coherence system allowing purging of... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and cache-coherence system allowing purging of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and cache-coherence system allowing purging of... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3261128

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.