Multiple store miss handling in a cache memory memory system

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S143000

Reexamination Certificate

active

06311254

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to data processing systems, and specifically to memory control of a data cache.
BACKGROUND OF THE INVENTION
A known way to increase the performance of a computer system is to include a local high speed memory known as a cache. A cache increases system performance in part because there is a high probability that once the central processing unit (CPU) accesses data at a particular address it will soon access an adjacent address. A well designed cache typically fetches and stores a quantity of data, commonly referred to as a line, that includes data from a desired memory address as well as data from addresses in the vicinity of the desired address from slower main memory or from a lower level cache. In very high performance computer systems, several caches may be placed in a hierarchy. The cache which is closest to the CPU, known as the upper level or L1 cache, is the highest level cache in the hierarchy and is generally the fastest. Other generally slower caches are then placed in descending order in the hierarchy, starting with the L2 cache, etc., until the lowest level cache which is connected to main memory. Note that typically, the L1 cache is located on the same integrated circuit as the CPU whereas the L2 cache may be located off chip.
Recently, microprocessors designed for desktop applications such as personal computers (PCs) have been modified to increase processing efficiency for multi-media applications. For example, a video program may be stored in a compression format known as the motion picture experts group (MPEG-2) format. When processing the MPEG-2 data, the microprocessor must create frames of decompressed data quickly enough for display on the computer screen in real time. However, when processing MPEG-2 data, the data set may be large enough to cause high cache miss rates, resulting in a fetch latency that can be as long as 100 to 150 processor clock cycles.
Even with aggressive out-of-order processor micro-architectures, it is difficult for the processor to make forward progress in program execution when waiting for data from long latency memories when cache miss rates are significant. Moreover, for data processing systems that require coherent data sharing between a processor and another peripheral device such as a graphics card or in processing systems requiring coherent data sharing between multiple processors, it is even more difficult for the processing system to make forward progress in program execution when waiting for data from long latency memories when cache miss rates are significant. Accordingly, a need exists for processors and processing systems which allow for efficient use of memory subsystem resources and prevent memory stalls on cache misses.
SUMMARY OF THE INVENTION
The problems identified above are addressed by a cache memory system according to the present invention in which transactions that are initiated and placed in a transaction queue in response to load/store operations generated by a CPU are modified while pending in the queue in recognition of additional load/store operations that alter the data requirements of the originally issued transaction. Additional utility is achieved in one embodiment of the invention by merging multiple store operations that miss to a common cache line into a single entry. In another embodiment, a similar benefit is achieved through a mechanism and method by which multiple load operations that miss to a common cache line are satisfied or completed from a buffer thereby effectively reducing cache pipeline stalls.
Broadly speaking, a first application of the present invention contemplates a computer and its corresponding cache system that includes a cache memory, a buffer unit, and a transaction queue. The cache memory is coupled to a load/store unit of a CPU. The buffer unit is coupled to the cache memory and includes a plurality of entries suitable for temporarily storing data, address, and attribute information of operations generated by the CPU. The bus transaction queue is coupled to the buffer unit and includes a plurality of entries. Each transaction queue entry includes a pointer to one of the plurality of buffer unit entries. A first operation initiated by the load/store unit buffers an operation in a first entry of the buffer unit, which in turn initiates a first transaction to be queued in a first entry of the bus transaction queue where the first transaction in the bus transaction queue points to the first entry of the buffer unit. Preferably, the buffer unit is configured to modify the first transaction from a first transaction type to a second transaction type prior to execution in response to a event occurring after the queuing of the first transaction.
In one embodiment, the first transaction type requires data from a system memory or from a lower order cache memory while the second transaction type requires no data. The required data for the first transaction type may be provided via a system bus to which the cache system is connected. In one embodiment, the first operation comprises a store operation that misses in the cache memory and the first transaction is a read with intent to modify (RWITM) transaction. The event that results in the modification of the transaction type may comprise additional store miss operations occurring after the first operation but prior to execution of the first transaction, wherein the additional store miss operations and the first operation map to a common cache line (i.e., the operations share a common cache line address)
In another embodiment, the first transaction type requires no data and the second transaction type requires data. In this embodiment, the first operation may include a store operation that hits in the cache memory to a shared cache line and the first transaction may comprise a KILL transaction that invalidates all other cached copies of the cache line. An event that might suitably initiate modification of the first transaction in this embodiment includes a snooped transaction on the system bus detected by a snoop control unit coupled between the buffer unit and the system bus where the cache line address of the snooped transaction is the same as the cache line address of the shared cache line.
In one embodiment, each buffer unit entry includes a transaction type field that indicates whether the corresponding transaction requires data. In a presently preferred embodiment, a single bit transaction type field is sufficient to differentiate between transactions requiring data and transactions not requiring data.
The first application of the invention further contemplates a method of handling operations in a cache system. Initially, in response to a CPU issuing a first operation that is unable to complete in a cache memory, a first operation is stored in an entry of the buffer unit and queued in a first entry of a bus transaction queue which points to the buffer unit entry. Thereafter, the transaction type of the first transaction is modified in response to an event occurring prior to execution of the first transaction where the transaction type indicates whether the first transaction requires data.
In one embodiment, the first transaction type requires data prior to the modification and requires no data after the modification. In this embodiment, the first operation may suitably comprise a store operation that misses in the cache memory and the event responsible for the modification of the transaction type may comprise at least one subsequent store operation where the first and subsequent store operations share a common cache line address. In one embodiment, the first and subsequent store operations may be merged into a single buffer unit entry and the modification of the first transaction occurs if the first and subsequent store operations affect each byte of the buffer unit entry's data buffer. In this embodiment, the first transaction type may suitably comprise a RWITM transaction prior to modification and a KILL transaction after modification.
In another embodiment, the

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Multiple store miss handling in a cache memory memory system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Multiple store miss handling in a cache memory memory system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multiple store miss handling in a cache memory memory system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2560256

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.