Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1999-10-01
2002-05-21
Kim, Matthew (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S123000
Reexamination Certificate
active
06393523
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates in general to cache management in microprocessors and, more particularly, to a system, method, and mechanism for instruction cache block invalidation.
2. Relevant Background
Computer programs comprise a series of instructions that direct a data processing mechanism to perform specific operations on data. These operations including loading data from memory, storing data to memory, adding, multiplying, and the like. Data processors, including microprocessors, microcontrollers, and the like include a central processing unit (CPU) comprising one or more functional units that perform various tasks. Typical functional units include a decoder, an instruction cache, a data cache, an integer execution unit, a floating point execution unit, a load/store unit, and the like. A given program may run on a variety of data processing hardware.
As used herein the term “data processor” includes complex instruction set computers (CISC), reduced instruction set computers (RISC) and hybrids. A data processor may be a stand alone central processing unit (CPU) or an embedded system comprising a processor core integrated with other components to form a special purpose data processing machine. The term “data” refers to a digital or binary information that may represent memory addresses, data, instructions, or the like.
In response to the need for improved performance several techniques have been used to extend the capabilities of these early processors including pipelining, superpipelining, and superscaling. Pipelined architectures attempt to keep all the functional units of a processor busy at all times by overlapping execution of several instructions. Pipelined designs increase the rate at which instructions can be executed by allowing a new instruction to begin execution before a previous instruction is finished executing. A simple pipeline may have only five stages whereas an extended pipeline may have ten or more stages. In this manner, the pipeline hides the latency associated with the execution of any particular instruction.
The ability of processors to execute instructions has typically outpaced the ability of memory subsystems to supply instructions and data to the processors. Most processors use a cache memory system to speed memory access. Cache memory comprises one or more levels of dedicated high-speed memory holding recently accessed instructions and data, designed to speed up subsequent access to the same data. Cache may be implemented as a unified cache in which data and instructions are cached together, or as a split cache having separate instruction and data caches.
Cache technology is based on a premise that programs frequently reuse the same instructions and data. When data is read from main system memory, a copy is also saved in the cache memory. In the case of an instruction, subsequent requests for instructions are checked against the cache to see if the information needed has already been stored. If the instruction had indeed been stored in the cache, it is delivered with low latency to the processor. If, on the other hand, the data had not been previously stored in cache then it is fetched from main memory and also saved in cache for future access.
A feature of program instructions is that they often exhibit “spatial locality”. Spatial locality is a property that information (i.e., instructions and data) that is required to execute a program is often close in address space in the memory media (e.g., random access memory (RAM), disk storage, and the like) to other data that will be needed in the near future. Instructions tend to have higher spatial locality than data. Cache designs take advantage of spatial locality by filling the cache not only with information that is specifically requested, but also with additional information from addresses sequentially adjacent to the currently fetched address. In this manner if the sequentially adjacent instructions are actually needed, they will already be loaded into cache.
In a split cache or “harvard architecture” cache it is necessary to maintain coherency between the instruction and data caches. In this type of architecture the instruction cache is usually optimized for read operations and has little support for write operations as most implementations do not allow writes to the instruction cache. As a result, the content of the instruction cache can get out of sync with the data cache and main memory when the program performs a store operation into the address space occupied by the program. This occurs in self-modifying code, for example.
One solution to this problem is to define special instructions or special instruction sequences, or both that maintain the instruction cache coherency. These instructions and instruction sequences function to discard or invalidate portions of the cache that are inconsistent and to explicitly synchronize the instruction cache with other instructions. Generally such instructions must be handled carefully by software. All instructions subsequent to an instruction cache block invalidate (ICBI) instruction must be assured that the preceding ICBI instruction has completed. In prior solutions the only way to assure completion was to serialize the ICBI execution (i.e., executed each ICBI by itself in a pipeline) so that the ICBI was committed to the instruction cache before a subsequent instruction was issued to the pipeline. As a result of serialization, each ICBI consumed multiple pipeline cycles before a subsequent instruction was issued. Such restrictions reduce instruction throughput and can significantly affect processor performance in cases where an instruction is changed by a previous instructions or new instructions are brought in from external sources. It is desirable to implement instruction cache invalidate instructions and cache synchronization instructions using existing hardware in an efficient manner that also avoids a need to serialize the instructions.
SUMMARY OF THE INVENTION
The present invention involves a processor having an execution pipeline. A cache memory includes a plurality of cache blocks with instruction words held in selected ones of the cache blocks. An ICBI address buffer is provided for holding addresses of instruction cache blocks to be invalidated by ICBI instructions pending in the processor's execution pipeline. An instruction cache controller coupled to the cache memory generates cache accesses to invalidate specified cache blocks in response to receiving buffered addresses from the ICBI address buffer. Preferably the cache accesses serve to commit ICBI instructions to the instruction cache asynchronously with respect to the processor's execution pipeline.
In a particular example, the execution pipeline includes a fetch stage, a decode stage, one or more execution stages, and a writeback stage. The fetch unit is also coupled to receive interim results generated by the execution stages from a result bus. A decode unit obtains instructions fetched by the fetch unit and can detect an ICBI instruction. The decode unit notifies the fetch unit upon detection of an ICBI. At least one execution unit implements the decoded ICBI, determines an address identifying the cache block to be invalidated and places the address on the result bus. The ICBI address buffer is coupled to the result bus and stores the determined addresses for one or more pending ICBI instructions.
In another aspect the present invention involves a cache synchronization technique in which one or more instruction cache block addresses are buffered where each buffered address is associated with a pending ICBI requests. A synchronization instruction (SYNCI) is executed following the pending ICBI instructions. In response to the SYNCI instruction the processor prevents instructions following the SYNCI from being executed until the pending ICBI instructions are committed to the instruction cache. In this manner, the instructions following the SYNCI are not exposed to the incomplete state created by the pending, uncommit
Gearty Margaret
Irie Naohiko
Peng Chih-Jui
Werner Tony L.
Elmore Stephen
Hitachi , Ltd.
Kim Matthew
Townsend and Townsend / and Crew LLP
LandOfFree
Mechanism for invalidating instruction cache blocks in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Mechanism for invalidating instruction cache blocks in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Mechanism for invalidating instruction cache blocks in a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2844082