Electrical computers and digital processing systems: memory – Storage accessing and control – Access timing
Reexamination Certificate
1998-06-19
2001-03-13
Gossage, Glenn (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Access timing
C711S167000, C711S131000, C711S149000, C711S140000, C365S189050, C365S230050, C710S006000, C710S039000
Reexamination Certificate
active
06202139
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is related to the field of microprocessors and, more particularly, to data caches employed within microprocessors.
2. Description of the Related Art
Superscalar microprocessors attempt to achieve high performance by issuing/executing multiple instructions concurrently. To the extent that superscalar microprocessors are successful at issuing/executing multiple instructions concurrently, high performance may be realized. Several factors may influence the successful concurrent issue/execution of instructions. For example, a first instruction which is dependent upon a second instruction (e.g. for a source operand) generally does not issue/execute concurrently with the first instruction. Still further, the frequency of branch instructions (which determine which instructions will be fetched next from a variety of sources) may impact the number of instructions available for issue and hence the number of instructions issued concurrently.
In the continuing evolution of superscalar microprocessors, the maximum issue rate (i.e. the number of instructions which can be concurrently issued) has been increasing. In other words, a trend toward wider issue superscalar microprocessors has been occurring. While additional performance gains may be realized by allowing for larger numbers of instructions to concurrently issue, wider issue microprocessors may face additional design challenges as well.
Among the additional design challenges is providing sufficient data cache ports for the number of memory operations which may be concurrently issued. As used herein, the term “port”, in connection with a cache, refers to a facility for accessing the cache in response to one memory operation. Other memory operations use other ports for accessing the cache concurrently. Superscalar microprocessors generally include data caches to decrease the latency of access to memory operands. Instruction sequences include a certain number of memory operations to access and/or update memory operands. Generally speaking, a memory operation specifies the transfer of data between the microprocessor and a memory external to the microprocessor (although the transfer may be completed via an internal cache). Load memory operations specify the transfer of data from a memory to the microprocessor, while store memory operations specify the transfer of data from the microprocessor to the memory. Memory operations may be explicit instructions, or an implicit part of another instruction specifying a memory operand, depending upon the instruction set architecture employed by the microprocessor.
As issue rates increase, the number of memory operations for which concurrent access to a cache is desired increases as well. If concurrent access is not provided (by providing sufficient data cache ports), then performance generally degrades. For example, many instructions are dependent upon load memory operations (either directly or indirectly) for source operands. Such dependent instructions typically cannot execute if the load memory operations are stalled due to a lack of available cache ports. Additionally, pipeline stalls may develop if subsequent memory operations attempt to issue prior to execution of prior memory operations and the available resources for queuing memory operations become full.
Various methods for multiporting data caches have been employed in the past. For example, the cache arrays may be physically multiported (allowing for concurrent access to any storage location within the array from each port in parallel with access to any other storage location from the other ports). Unfortunately, physically multiporting the array typically leads to large increases in the microprocessor chip area occupied by the array. The size of the chip is important to chip yields and number of chips per semiconductor wafer, and hence to the cost of producing the microprocessor. Accordingly, increase in the area occupied by a cache array is generally undesirable.
Another method employed to provide multiported cache access is to bank the cache. Each port may access one of the banks in parallel with a different port accessing a different bank. If two or more memory operations which would otherwise concurrently access the data cache actually access data within the same bank, one of the memory operations completes and the others are inhibited. Unfortunately, even with a large number of available ports, concurrent access to the data cache may not be achieved due to the occurrence of bank conflicts. Accordingly, a solution to multiporting a data cache which does not incur the disadvantages of physically multiporting the array or banking the cache is desired.
SUMMARY OF THE INVENTION
The problems outlined above are in large part solved by a cache in accordance with the present invention. The cache includes multiple ports, although the storage array included within the cache employs fewer ports than the cache supports. The cache is pipelined and operates at a clock frequency higher than that employed by the remainder of a microprocessor including the cache. Advantageously, the multiple accesses can be pipelined through the cache and the cache may internally have fewer ports than the number of ports actually supported by the cache. Accordingly, the cache may be implementable in a smaller area then a cache supporting more ports internally. Additionally, since the accesses are pipelined instead of applied to separate banks, the performance losses due to bank conflicts may be avoided. The cache may provide multiport access to support wide issue superscalar microprocessors in a small area and with high performance.
In one embodiment, the cache preferably operates at a clock frequency which is at least a multiple of the clock frequency at which the remainder of the microprocessor operates. The multiple is equal to the number of ports provided on the cache (or the ratio of the number of ports provided on the cache to the number of ports provided internally, if more than one port is supported internally). Accordingly, the accesses provided on each port of the cache during a clock cycle of the microprocessor clock can be sequenced into the cache pipeline prior to commencement of the subsequent clock cycle.
In one particular embodiment, the load/store unit of the microprocessor is configured to select only load memory operations or only store memory operations for concurrent presentation to the data cache. Accordingly, the data cache may be performing only reads or only writes to its internal array during a clock cycle. The data cache may implement several techniques for accelerating access time based upon this feature. For example, the bit lines within the data cache array may be only balanced between accesses instead of precharging (and potentially balancing).
Broadly speaking, the present invention contemplates a cache comprising a plurality of ports and a pipeline. The plurality of ports are operable at a first clock frequency and each of the plurality of ports is configured to concurrently receive a different cache access according to a first clock signal operable at the first clock frequency. Coupled to the plurality of ports, the pipeline is configured to perform one cache access per clock cycle of a second clock signal operable at a second clock frequency. The second clock frequency is at least a multiple of the first clock frequency, wherein the multiple is equal to a number of the plurality of ports.
The present invention further contemplates a processor comprising a data cache having a plurality of ports and a load/store unit. Coupled to the data cache, the load/store unit is configured to select a memory operation for each of the plurality of ports. The load/store unit is configured to select only load memory operations for concurrent presentation on the plurality of ports or only store memory operations for concurrent presentation on the plurality of ports.
Moreover, the present invention contemplates a computer system comprising a processor and an input/output (I/O)
Pickett James K.
Witt David B.
Advanced Micro Devices , Inc.
Conley Rose & Tayon PC
Gossage Glenn
Merkel Lawrence J.
LandOfFree
Pipelined data cache with multiple ports and processor with... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Pipelined data cache with multiple ports and processor with..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Pipelined data cache with multiple ports and processor with... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2503752