Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
2000-11-16
2003-04-15
Yoo, Do Hyun (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S129000, C711S140000, C365S049130, C712S239000
Reexamination Certificate
active
06549987
ABSTRACT:
BACKGROUND
The present invention relates to a cache architecture for variable length data. When used in a processor core, the cache architecture can support storage of variable length instruction segments and can retrieve multiple instruction segments (or portions thereof) in a single clock cycle. The cache architecture also contributes to minimized fragmentation of the instruction segments.
FIG. 1
is a block diagram illustrating the process of program execution in a conventional processor. Program execution may include three stages: front end
110
, execution
120
and memory
130
. The front-end stage
110
performs instruction pre-processing. Front end processing
110
is designed with the goal of supplying valid decoded instructions to an execution unit
120
with low latency and high bandwidth. Front-end processing
110
can include instruction prediction, decoding and renaming. As the name implies, the execution stage
120
performs instruction execution. The execution stage
120
typically communicates with a memory
130
to operate upon data stored therein.
Conventionally, front end processing
110
may build instruction segments from stored program instructions to reduce the latency of instruction decoding and to increase front-end bandwidth. Instruction segments are sequences of dynamically executed instructions that are assembled into logical units. The program instructions may have been assembled into the instruction segment from non-contiguous regions of an external memory space but, when they are assembled in the instruction segment, the instructions appear in program order. The instruction segment may include instructions or uops (micro-instructions).
A trace is perhaps the most common type of instruction segment. Typically, a trace may begin with an instruction of any type. Traces have a single entry, multiple exit architecture. Instruction flow starts at the first instruction but may exit the trace at multiple points, depending on predictions made at branch instructions embedded within the trace. The trace may end when one of number of predetermined end conditions occurs, such as a trace size limit, the occurrence of a maximum number of conditional branches or the occurrence of an indirect branch or a return instruction. Traces typically are indexed by the address of the first instruction therein.
Other instruction segments are known. The inventors have proposed an instruction segment, which they call an “extended block,” that has a different architecture than the trace. The extended block has a multiple-entry, single-exit architecture. Instruction flow may start at any point within an extended block but, when it enters the extended block, instruction flow must progress to a terminal instruction in the extended block. The extended block may terminate on a conditional branch, a return instruction or a size limit. The extended block may be indexed by the address of the last instruction therein.
A “basic block” is another example of an instruction segment. It is perhaps the most simple type of instruction segment available. The basic block may terminate on the occurrence of any kind of branch instruction, including an unconditional branch. The basic block may be characterized by a single-entry, single-exit architecture. Typically, the basic block is indexed by the address of the first instruction therein.
Regardless of the type of instruction segment used in a processor
110
, the instruction segment typically is cached for later use. Reduced latency is achieved when program flow returns to the instruction segment because the instruction segment may store instructions already assembled in program order. The instructions in the cached instruction segment may be furnished to the execution stage
120
faster than they could be furnished from different locations in an ordinary instruction cache.
Caches typically have a predetermined width; the width determines the maximum amount of data that could be retrieved from cache in a single clock cycle. The width of a segment cache typically determines the maximum size of the instruction segment. To retrieve data, a cache address is supplied to the cache, which causes contents of a cache entry to be driven to a cache output.
Because instruction segments are terminated based on the content of the instructions from which they are built, the instruction segments typically have variable length. So, while a segment cache may have capacity to store, say,
16
instructions per segment, the average length of the instructions segments may be much shorter than this maximum length. In fact, in many typical applications, an average instruction segment length is slightly more than 8 instructions per segment. If these instruction segments were stored in a traditional segment cache, the capacity of the segment cache may be under-utilized; the 8-instruction segment would prevent excess capacity in a much larger cache line from storing other data. Further, a traditional segment cache would output the smaller instruction segment, when addressed, even though it may have the capacity for much larger data items.
Accordingly, there exists a need in the art for a cache structure that stores variable length data and can output data with higher utilization than would be provided by a traditional cache.
REFERENCES:
patent: 4905141 (1990-02-01), Brenza
patent: 5381533 (1995-01-01), Peleg et al.
patent: 5796978 (1998-08-01), Yoshioka et al.
patent: 6128704 (2000-10-01), Jun
patent: 6167510 (2000-12-01), Tran
patent: 6327643 (2001-12-01), Egan
patent: 6349364 (2002-02-01), Kai et al.
Black et al, “The Block-Based Trace Cache”, Proceedings of The 26thInt'l. Symposium on Computer Architecture, May 2-4, 1999, Atlanta, Georgia.
Conte et al, “Optimization of Instruction Fetch Mechanisms for High Issue Rates”, Proceedings of The 22ndAnnual Int'l. Symposium on Computer Architecture, Jun. 22-24, 1995, Santa Margherita Ligure, Italy.
Dutta et al, “Control Flow Prediction with Tree-Like Subgraphs for Superscalar Processors”, Proceedings of The 28thInt'l. Symposium on Microarchitecture, Nov. 29-Dec. 1, 1995, Ann Arbor, Michigan.
Friendly et al, “Alternative Fetch and Issue Policies for the Trace Cache Fetch Mechanism”, Proceedings of The 30thAnnual IEEE/ACM Int'l. Symposium on Microarchitecture, Dec. 1-3, 1997, Research Triangle Park, North Carolina.
Intrater et al, “Performance Evaluation of a Decoded Instruction Cache for Variable Instruction-Length Computers”, Proceedings of The 19thAnnual Int'l. Symposium on Computer Architecture, May 19-21, 1992, Gold Coast, Australia.
Jacobson et al, “Path-Based Next Trace Prediction”, Proceedings of The 30thAnnual Int'l. Symposium on Microarchitecture, Dec. 1-3, 1997, Research Triangle Park, North Carolina.
McFarling, Scott, “Combining Branch Predictors”, Jun. 1993, WRL Technical Note TN-36, Digital Western Research Laboratory, Palo Alto, California.
Michaud et al, “Exploring Instruction-Fetch Bandwidth Requirement in Wide-Issue Superscalar Processors”, Proceedings of The 1999 Int'l. Conference on Parallel Architectures and Compilation Techniques, Oct. 12-16, 1999, Newport Beach, California.
Patel et al, “Improving Trace Cache Effectiveness with Branch Promotion and Trace Packing”, Proceedings of The 25thAnnual Int'l. Symposium on Computer Architecture, Jun. 27-Jul. 1, 1998, Barcelona, Spain.
Reinman et al, “A Scalable Front-End Architecture for Fast Instruction Delivery”, Proceedings of The 26thInt'l. Symposium on Computer Architecture, May 2-4, 1999, Atlanta, Georgia.
Rotenberg et al, “Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching”, Proceedings of The 29thAnnual IEEE/ACM Int'l. Symposium on Microarchitecture, MICRO-29, Dec., 2-4, 1996, Paris, France.
Seznec et al, “Multiple-Block Ahead Branch Predictors”, Proceedings of The 7thInt'l. Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1-4, 1996, Cambridge, United States.
Yeh et al, “Increasing the Instruction Fetch Rate via Multiple Branch Predicti
Jourdan Stephan J.
Rappoport Lihu
Ronen Ronny
Intel Corporation
Kenyon & Kenyon
Moazzami Nasser
Yoo Do Hyun
LandOfFree
Cache structure for storing variable length data does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Cache structure for storing variable length data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cache structure for storing variable length data will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3016316