Efficient store machine in cache based microprocessor

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S133000, C711S210000

Reexamination Certificate

active

06446170

ABSTRACT:

BACKGROUND
1. Field of the Present Invention
The present invention generally relates to the field of computer architecture and more particularly to a method and circuit for improving the efficiency of retiring store operations in microprocessor based computers.
2. History of Related Art
In typical modern microprocessor designs, cache-able store instructions are executed and retired to cache memory in program order. Because the number of load operations exceeds the number of stores by a significant margin in typical codes sequences and because many load operations may be speculatively executed to take advantage of processor parallelism, cache access arbitration schemes commonly assign relatively low priority to store operations. This prioritization hierarchy can potentially result in a backlog of executed store operations awaiting an opportunity to access the cache. The constraint of in-order execution and retirement is accommodated by placing completed store instructions in a completed store queue where they await resolution of conflicts from higher priority cache access requests. Higher priority cache accesses may occur in the form of snoop requests, cache status bits updates, and other cache accesses depending upon the environment. As a result, a large number of store instructions may become stockpiled in the store queue, especially in processor intensive applications such as multi-processor systems, thereby making it imperative to take maximum advantage of each opportunity to retire store operations to cache.
Conventional microprocessor architectures, unfortunately, do not typically handle the retiring of store operations in optimal fashion. Referring to
FIG. 4
of the drawings, a timing representation of a store queue of a conventional microprocessor architecture is presented. For each cycle, the state of selected locations of the microprocessor are detailed. The “BST” represents the location within the store queue designed to hold the oldest pending store operation. In a typical microprocessor, the BST contents are transferred to a latch if a resource conflict is encountered during an attempt to access the cache from the store queue. In
FIG. 4
, a resource conflict denoted by reference number
402
is detected in cycle 0. In response to the resource conflict, the microprocessor transfers the BST contents (identified as op
0
) to the latch and shifts the next oldest pending operation (op
1
) to the BST. Thus, in cycle 1, op
1
resides in BST as indicated by reference numeral
408
while op
0
is found in the latch as indicated by reference numeral
406
. Because op
0
is no longer present within the store queue, a select signal SEL is set to indicate that the next pending store operation retired must be selected from the latch. In the example of
FIG. 4
, no resource conflict exists during cycle 1. Accordingly, the cache is accessed from the latch with op
0
as indicated by reference number
412
. The result of the cache access (i.e., hit/shared hit/miss, etc.) is not known until the following cycle 2. When the cache access is returned as a hit indicated by reference numeral
414
, the select signal may be returned to 0 in the following cycle so that subsequently selected store operations are retired from the cache. Unfortunately, this architecture insures that no cache access may be attempted during cycle 2, despite the absence of a resource conflict, because the unknown result of the cache access prohibits updating the select signal until the following cycle. Thus, an opportunity to retire a pending store operation in cycle 2 has gone unfulfilled. Therefore, it would be desirable to provide an architecture in which the retiring of pending operations is handled in a more efficient manner without incurring any performance degradation and without significantly increasing the cost or complexity of the circuit.
SUMMARY OF THE INVENTION
The problems identified above are in large part addressed by a method and corresponding circuit for retiring executed operations to cache in an efficient manner by maintaining a store machine preferred state when a resource conflict preventing the store machine from accessing the cache is detected. This permits the store machine of the present invention to retire an operation in a cycle immediately following resolution of the resource conflict.
Broadly speaking, the present invention contemplates a method of retiring operations to a cache. Initially, a first operation is queued in a stack such as the store queue of a retire unit. The first operation is then copied, in a first transfer, to a latch referred to as the miss latch in response to a resource conflict that prevents the first operation from accessing the cache. The first operation is maintained in the stack for the duration of the resource conflict. When the resource conflict is resolved, the cache is accessed, in a first cache access, with the first operation from the stack. Preferably, the first operation is removed from the stack when the resource conflict is resolved and the first cache access is initiated. In the preferred embodiment, the first operation is maintained in the miss latch until the first cache access results in a cache hit. One embodiment of the invention further includes accessing the cache, in a first miss access, with the first operation from the miss latch in response to a cache miss that resulted from the first cache access. In a presently preferred embodiment, a second access is executed to access the cache with a second operation queued in the stack in response to a cache hit resulting from the first cache access. The first and second cache accesses preferably occur in consecutive cycles. Typically, the first and second operations are store operations that are queued in the stack in program order. In one embodiment the first operation is removed from the stack upon resolving of the resource conflict.
The present invention still further contemplates a system for retiring operations to a cache memory. The system includes a stack that is configured to save a first operation destined for the cache memory. A miss latch is coupled to the stack and configured to receive a first operation from the stack. A multiplexer of the system includes a first input connected to the stack, a second input coupled to the miss latch, an output connected to the cache memory, and a select input. A control circuit is coupled to the select input of the multiplexer. The control circuit is configured to select the first input of the mux and initiate copying, in a first transfer, of the first operation from the stack to the miss latch while maintaining the first operation in the stack. The first transfer occurs in response to a resource conflict preventing the stack from accessing the cache.
The control circuit preferably continues to select the first input of the mux for the duration of the resource conflict. In this manner, the stack acts as the source of a first access of the cache following a resolution of the resource conflict. In one embodiment, the system is further configured to access the cache, in a first cache access, with the first operation from the stack, in response to detecting a resolution of the resource conflict. The system preferably maintains the first operation in the miss latch until the first cache access results in a cache hit. In one embodiment, the control circuit selects the second input of the mux if the first cache access results in a cache miss. The system preferably accesses the cache, in a second cache access, with a second operation from the stack, in response to a cache hit resulting from the first cache access.
The present invention further contemplates a computer system including a processor, a cache memory, and a system memory. The processor is coupled to a processor bus via a bus interface unit. The cache memory is interfaced to the processing unit and the bus interface unit and the system memory coupled to bus interface unit. The processor includes a control circuit, a store queue, and a miss latch. The store queue is configured to save a first ope

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Efficient store machine in cache based microprocessor does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Efficient store machine in cache based microprocessor, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Efficient store machine in cache based microprocessor will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2874637

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.