Electrical computers and digital processing systems: processing – Processing architecture – Long instruction word
Reexamination Certificate
1997-10-13
2001-05-22
Chan, Eddie (Department: 2183)
Electrical computers and digital processing systems: processing
Processing architecture
Long instruction word
C712S215000, C712S234000, C712S241000, C717S152000
Reexamination Certificate
active
06237077
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
This invention relates to microprocessor architecture, and in particular to a system and method for processing branch instructions.
2. Background Art
Modem processors have the capacity to process multiple instructions concurrently at very high rates, with processor pipelines being clocked at frequencies that are rapidly approaching the gigahertz regime. Despite the impressive capabilities of these processors, their actual instruction throughput on a broad cross-section of applications is often limited by a lack of parallelism among the instructions to be processed. While there may be sufficient resources to process, for example, six instructions concurrently, dependencies between the instructions rarely allow all six execution units to be kept busy.
The problem is magnified by the long latency of certain operations that gate subsequent instructions. For example, long latency on a load instruction delays the execution of instructions that depend on the data being loaded. Likewise, long latency instruction fetches triggered by branch instructions starve the processor pipeline of instructions to execute. Memory latency problems are exacerbated on programs that have working sets too large to fit in the nearest level cache. The result can be significant under-utilization of processor resources. Consequently, there has been an increasing focus on methods to identify and exploit the instruction level parallelism (“ILP”) needed to fully utilize the capabilities of modem processors.
Different approaches have been adopted for identifying ILP and exposing it to the processor resources. For example, Reduced Instruction Set Computer (RISC) architectures employ relatively simple, fixed length instructions and issue them several at a time to their appropriate execution resources. Any dependencies among the issued instructions are resolved through extensive dependency checking and rescheduling hardware in the processor pipeline. Some advanced processors also employ complex, dynamic scheduling techniques in hardware.
Compiler-driven speculation and predication are alternative approaches that operate through the compiler to address the bottlenecks that limit ILP. Speculative instruction execution hides latencies by issuing selected instructions early and overlapping them with other, non-dependent instructions. Predicated execution of instructions reduces the number of branch instructions and their attendant latency problems. Predicated instructions replace branch instructions and their subsequent code blocks with conditionally executed instructions which can often be executed in parallel. Predication may also operate in conjunction with speculation to facilitate movement of additional instructions to enhance parallelism and reduce the overall latency of execution of the program.
One side effect of the above-described code movement is that branch instructions tend to become clustered together. Even in the absence of predication and speculation, certain programming constructs, e.g. switch constructs and “if then else if” constructs, can cluster branch instructions in close proximity. There is thus a need for systems and methods that process clustered branch instructions efficiently.
SUMMARY OF THE INVENTION
The present invention is a method for processing branch instructions efficiently. It is generally applicable to any programming strategy that clusters branch instructions, and it is particularly useful for instruction set architectures (ISAs) that support speculation and predication.
In accordance with the present invention, one or more branch instructions are placed in an instruction bundle. The instructions are ordered in an execution sequence within the bundle, with the branch instructions ordered last in the sequence. The bundled instructions are transferred to execution units indicated by a template field that is associated with the bundle. The first branch instruction in the bundle's execution sequence that is resolved taken is determined, and retirement of subsequent instructions in the execution sequence is suppressed.
In one embodiment of the invention, branch instructions are characterized according to their complexity, and more complex branch instructions are assigned to a selected position in the bundle. In another embodiment of the invention, the branch is a return from interrupt, and control of the processor is returned to the instruction in the execution sequence following the instruction that encountered the interruption (for traps), and to the instruction that encountered the interruption (for faults). In yet another embodiment of the invention, the branch is a return from call, and control of the processor is returned to an instruction bundle following the instruction bundle that contained the original call.
REFERENCES:
patent: 4833599 (1989-05-01), Colwell et al.
patent: 5333280 (1994-07-01), Ishikawa et al.
patent: 5414822 (1995-05-01), Saito et al.
patent: 5655098 (1997-08-01), Witt et al.
patent: 5699536 (1997-12-01), Hopkins et al.
patent: 5699537 (1997-12-01), Sharangpani et al.
patent: 5729728 (1998-03-01), Colwell et al.
patent: 5742804 (1998-04-01), Yeh et al.
patent: 5742805 (1998-04-01), Kulkarni et al.
patent: 5826070 (1998-10-01), Olson et al.
patent: 5860017 (1999-01-01), Sharangpani et al.
patent: 5903750 (1999-05-01), Yeh et al.
Sharangpani, Harsh, “Intel Itanium Processor Microarchitecture Overview”,Intel, http://www.intel.comon Oct. 19, 1999. all pages.
Tom Shanley, Pentium Pro Processor System Architecture, 1997, pp. 63-89, MindShare, Inc. Published simultaneously in Canada.
Corwin Michael Paul
Fielden Kent
Hull James
Morris Dale
Mulder Hans
Chan Eddie
Idea Corporation
Novakoski Leo V.
Patel Gautam R.
LandOfFree
Instruction template for efficient processing clustered... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Instruction template for efficient processing clustered..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Instruction template for efficient processing clustered... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2512921