Comprehensive multilevel cache preloading mechanism in a...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S042000

Reexamination Certificate

active

06240490

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates in general to cache preloading for multiprocessor system design simulation and in particular to comprehensive cache preloading ensuring that all possible legal cache coherency state combinations are reached. Still more particularly, the present invention relates to randomly preloading all possible legal cache coherency state combinations during simulation to verify proper design operation even in corner case state combinations.
2. Description of the Related Art:
Caches have traditionally been designed to take advantage of the spatial and temporal locality of code sequences in commercial applications to reduce the memory access latency for load and store instructions by staging data predicted to be needed in the future into smaller memories having shorter latencies. As multiprocessing capabilities have increased in popularity, cache structures have been expanded and improved to support this functionality.
In a multiprocessor system, the same data may be shared and separately cached by different processors. To address the problem of multiple processors modifying the same data in local caches without notifying the other, various cache states have been defined and included into the cache organization to support different cache coherency protocols in snooping mechanisms. While many different cache coherency states have been defined for different multi-processor systems, the MESI protocol states remain very popular basic cache coherency states.
The modified (M) coherency state indicates that only one cache has the valid copy of the data, and that copy is “dirty” or modified with respect to the copy in system memory. The exclusive (E) coherency state is defined to signify that only one cache has a valid copy of the data, which is unmodified with respect to the data in system memory. The shared (S) coherency state denotes that one or more caches have copies of the data and that no copy is modified with respect to system memory. The invalid (I) coherency state indicates that no caches have a valid copy of the data.
In multiprocessor systems employing the MESI protocol or a variant, a processor preparing to store data will first examine the cache coherency state within the local cache corresponding to the store location. If the subject cache line is either modified or exclusive, the store will be performed immediately. Otherwise, the processor seeking to store the data must invalidate all other copies of the data in the memory hierarchy before the store may be safely executed. These protocols are followed by all processors in a multiprocessor system to ensure that data coherency with respect to instruction execution sequences is maintained.
The protocols described, however, can become extremely complicated if multiple levels of caches—including internal or external in-line caches—are implemented for each processor, particularly when 3 or more cache levels are implemented. For single-level cache systems, only horizontal cache coherency across processors need be considered. However, where multi-level cache hierarchies are implemented for each processor, both vertical and horizontal cache coherency must be maintained.
In a multiprocessor system having a multi-level cache hierarchy, the number of legal combinations for cache coherency states among the caches is extremely large. Even if a very thorough methodology were employed, it would not be easy to reach all of the legal combinations by running limited simulation cycles, as is conventional. Some legal combinations may only occur after execution of a complex sequences of many load, store and castout operations.
For instance, in order for data X within the level one (L1) and level two (L2) caches to be in the invalid state in both but in the modified state in the level three (L3) cache, the processor must first store data X to the appropriate address, causing the L1 to be in the modified state. Next, a number of loads or stores (depending on the L1's replacement algorithm) must be executed which map to the cache segment containing addresses including that of data X, forcing a castout of X from the L1 to the L2. Finally, a number of loads and stores which cause L1 misses and also force the L2 to select data X as the victim and castout the cache line containing the modified data from the L2 to the L3 must occur.
The sequence of operations required to reach a particular combination of cache coherency states may entail an extremely large number of operations. Furthermore, the example described above assumes that cache levels are not shared between processors. Many more additional legal combinations of cache coherency states become possible where the several processors and L1 caches share a common L2 and/or L3 cache, further complicating the string of operations necessary to achieve a particular legal combination of coherency states.
Finally, a physical multiprocessor system running any kind of application at several hundred million or more instructions per second will result in the caches filling up within a very short period of time and the system reaching any combination of cache states eventually. In contrast, simulations run tests at a much lower frequency, often less than 100 processor cycles per second. With this limitation, it is not practical to run long test cases—more than a thousand instructions, for example. Therefore, it is nearly impossible to reach all types of cache state combination by running limited simulations without cache preloading. An insufficient or limited preloading mechanism could leave hidden bugs in the design which would not be found until the real silicon comes back.
It would be desirable, therefore, to provide a comprehensive and complete cache preloading mechanism for simulations. It would further be advantageous for the mechanism to employ only possible cache coherency state combinations, excluding those which are legal in theory but could never occur. It would also be desirable for the cache preload mechanism to randomly preload combinations for more complete verification, including corner case combinations.
SUMMARY OF THE INVENTION
It is therefore one object of the present invention to provide an improved cache preloading mechanism for multiprocessor system design simulation.
It is another object of the present invention to provide a comprehensive cache preloading mechanism for multiprocessor system design simulation which ensures that all possible legal cache coherency state combinations are reached.
It is yet another object of the present invention to provide a cache preloading mechanism randomly preloading all possible legal cache coherency state combinations during simulation to verify proper design operation even in corner case state combinations.
The foregoing objects are achieved as is now described. For simulation of a multiprocessor system having a multi-level cache hierarchy, possible and legal cache coherency state combinations are classified based on the state of one level one cache, and subclassified within the major classes to define unique combinations, a number significantly less than the number of all possible combinations. For data words in the test case, a cache coherency state combination is randomly selected from a combination table listing all subclasses. Stale data generated by inverting all or part of the original data from the test case may be preloaded with the coherency states as necessary. Existing coherency is maintained when test case data is preloaded to a cache location already preloaded to avoid previously loaded stale data from becoming valid with the new coherency state. Coherency state combinations which are preloaded are tracked to help ensure that all subclasses are preloaded and tested during simulation prior to tapeout. The cache preload mechanism of the present invention allows bugs which only occur when the caches are in some corner case states to be detected.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the followi

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Comprehensive multilevel cache preloading mechanism in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Comprehensive multilevel cache preloading mechanism in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Comprehensive multilevel cache preloading mechanism in a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2565336

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.