Electrical computers and digital processing systems: memory – Storage accessing and control – Access timing
Reexamination Certificate
2000-03-30
2003-04-22
Yoo, Do Hyun (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Access timing
C711S125000, C711S126000, C711S128000, C711S123000
Reexamination Certificate
active
06553473
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates in general to the field of instruction execution in computers, and more particularly to an apparatus and method for allocating lines within a data cache upon writes to memory.
2. Description of the Related Art
The architecture of a present day pipeline microprocessor consists of a path, or channel, or pipeline, that is divided into stages. Each of the pipeline stages performs specific tasks related to the accomplishment of an overall operation that is directed by a programmed instruction. Software application programs are composed of a number of programmed instructions. As an instruction enters the first stage of the pipeline, certain tasks are accomplished. The instruction is then passed to subsequent stages of the pipeline for the execution of subsequent tasks. Following completion of a final task, the instruction completes execution and exits the pipeline. Execution of programmed instructions by a pipeline microprocessor is very much analogous to the manufacture of items on an assembly line.
The efficiency of any assembly line is primarily a function of the following two factors: 1) the degree to which each stage of the assembly line can be occupied with productive work; and 2) the balance of the effort required to perform tasks within each individual stage as compared to that required to perform tasks in the other stages, in other words, the degree to which bottlenecks are avoided in the assembly line. These same factors can also be said to affect the efficiency of a pipeline microprocessor. Consequently, it is incumbent upon microprocessor designers 1) to provide logic within each of the stages that maximizes the probability that none of the stages in the pipeline will sit idle and 2) to distribute the tasks among the architected pipeline stages such that no one stage will be the source of a bottleneck in the pipeline. Bottlenecks, or pipeline stalls, cause delays in the execution of application programs.
A pipeline microprocessor receives its data inputs from and provides its results to the outside world through memory devices that are external to the microprocessor. These external memory devices, along with the microprocessor are interconnected in parallel via a system bus. The system bus interconnects other devices as well within a computing system so that the other devices require can access data in memory or communicate with the microprocessor.
The memory devices used within present day computing systems operate almost an order of magnitude slower than logic devices internal to the microprocessor. Hence, when the microprocessor has to access external memory to read or write data, the program instruction directing the memory access is stalled in the pipeline. And if other devices are accessing data over the system bus at the same time that the microprocessor wants to access memory, then the program instruction may experience more lengthy delays until the system bus becomes available.
For these two reasons, a present day microprocessor incorporates a smaller-yet significantly faster-memory device within the microprocessor itself. This memory device, referred to as a cache, retains a copy of frequently used data so that when the frequently used data is required by instructions within an application program, rather than experiencing the delays associated with accessing the system bus and external memory, the data can be quickly accessed from within the cache.
The management of data within a cache, however, is a very complex task involving algorithms and logic that identify frequently used data and predict when one block of data is to be cast out of the cache and another block of data is to be retrieved into the cache. The goal of an effective data cache is to minimize the number of external memory accesses by the microprocessor. And to minimize the number of accesses to the system bus, present day cache logic does not read data from memory one byte at a time. Rather, memory is read into a cache in multiple-byte bursts. The number of bytes accessed within a burst is called a cache line. Cache lines are typically on the order of tens of bytes. Many pipeline microprocessors today employ 32-byte cache lines. Thus, when the system bus is accessed to retrieve data from external memory, an entire cache line is read that contains the required data along with surrounding data. Reading in the surrounding data is beneficial as well because one of the characteristics of application programs is that they tend to use data that is adjacent to that which has just been accessed. Consequently, when a program instruction requires a data entity that is not within the cache, the cache line that contains the data entity is retrieved from memory and placed into the cache. As a result, if following instructions require access to the data entity or surrounding data entities, they can execute much faster because the cache line is already present in the cache.
But program instructions not only read data from memory; they also write data. And the attribute of application programs discussed above applies just as well to writing data to memory as it does to reading data from memory. More specifically, when a program instruction directs a write to a location in memory, it is also very probable that following program instructions will either want to read or write that location or other locations within the same cache line. Hence, when a program instruction is executed that directs a write to a memory location that is not in the cache, a present day microprocessor first reads the corresponding cache line into the cache and then writes the data to the cache line. This technique for writing data is commonly referred to as blocking write allocate because a cache line entry within a cache is reserved, or allocated, only in response to a read operation. Consequently, every time that a program instruction directs a write to a memory location whose corresponding cache line is not within the cache, a read of the cache line is performed prior to writing the data.
For the program instruction directing the write to external memory, the above scenario is inconsequential because most microprocessors today provide store buffers within which memory write data can be buffered. Thus, the program instruction can continue to proceed through the pipeline without delay. Cache control logic within the microprocessor will complete the write to the allocated cache line within the cache after the cache line has been read.
But there is a problem associated with the blocking write allocate technique when viewed from the standpoint of following instructions. While the data associated with the first write to the cache line is retained within the store buffer, subsequent writes to the same cache line must be stalled only when the complete cache line is retrieved from memory and updated in the data cache can the following write instructions be allowed to proceed. This is a problem. More specifically, application programs that exhibit a significant number of writes to external memory experience considerable delays when they are executed on present day microprocessors employing blocking write allocate techniques.
Therefore, what is needed is an apparatus for allocating a cache line within a data cache corresponding to a memory write that does not require that the cache line first be loaded into the cache from memory.
In addition, what is needed is a pipeline microprocessor that can execute multiple writes to the same cache line much faster than has heretofore been provided.
Furthermore, what is needed is a data cache apparatus in a pipeline microprocessor that allows subsequent writes to a cache line to proceed without delay while waiting for the cache line to be provided from memory.
Moreover, what is needed is a method for improving the processing speed of a pipeline microprocessor executing multiple writes to adjacent memory locations that are not presently within its cache.
SUMMARY OF THE INVENTION
To address the above-detailed deficiencies, it is an obje
Gaskins Darius D.
Hooker Rodney E.
Huffman James W.
Huffman Richard K.
IP-First LLC
Peugh Brian R.
Yoo Do Hyun
LandOfFree
Byte-wise tracking on write allocate does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Byte-wise tracking on write allocate, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Byte-wise tracking on write allocate will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3057376