Optimization of initialization of parallel compare...

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S216000

Reexamination Certificate

active

06505345

ABSTRACT:

TECHNICAL FIELD OF THE INVENTION
The present invention pertains generally to computers, and more particularly to processing of computer instructions in a computer system.
BACKGROUND OF THE INVENTION
Many microprocessors use instruction pipelining as one way to increase instruction throughput, sometimes using very deep instruction pipelines including many pipeline stages and substages. Another approach to improving instruction execution speed is called “out-of-order” execution. In a microprocessor providing for out-of-order execution, instructions may be executed out of program order to take advantage of instruction parallelism. Instruction pipelining and out-of-order execution techniques may be used separately or together in the same microprocessor.
In a microprocessor having a pipelined architecture, instruction throughput is most efficient when the pipeline or pipelines are kept “fall.” In other words, it is most advantageous to have instructions being processed at every pipeline stage in each processor clock cycle. To keep an instruction pipeline full, instructions must generally be fetched continuously. Often an instruction pipeline is many pipeline stages deep such that instructions are fetched several clock cycles before they are executed. This can be an issue where the instruction fetched is a program flow control instruction such as a conditional branch instruction.
Conditional branch instructions often rely on data from other instructions for resolution of the condition specified within the instruction. If an instruction effecting the resolution of the condition has not been executed at the time the conditional branch instruction is fetched, the processor does not have the information necessary to resolve the branch instruction. If the branch instruction cannot be resolved because the necessary data is not available, the processor does not know how to direct the program flow and therefore, which instructions are the correct instructions to fetch next.
If the branch instruction is resolved to be taken, the target instruction and instructions sequentially following the target instruction, referred to herein as the target instruction stream, must be fetched. If the branch instruction is resolved to be not taken, instructions sequentially following the branch instruction must be fetched. Thus, the program flow depends on resolution of the branch instruction. Where the branch instruction cannot be resolved at the time it is fetched because the required data is not available, there can be an issue.
Some processors address this issue by using some form of branch prediction logic. Although the use of branch prediction logic can help to improve microprocessor performance, it is virtually impossible to accurately predict the resolution of a branch instruction every time regardless of the branch prediction scheme used. Mispredictions by the branch prediction logic can have a significant impact on microprocessor performance. The microprocessor relies on the branch prediction logic to determine whether to fetch instructions from the sequential instruction stream following the conditional branch instruction or the target instruction stream. A misprediction means that the wrong instructions are being processed in the microprocessor pipeline.
When a misprediction is identified, the processor instruction pipeline is usually “flushed”. Thus, instructions in the microprocessor at various phases of execution are cleared from the pipeline and instructions from the correct instruction stream, and thus the correct program flow, must be fetched. Instructions flushed from the pipeline create “bubbles” in the pipeline such that several clock cycles may be required before the next instruction completes execution. Thus, instruction throughput and overall microprocessor performance is compromised.
One approach addresses the misprediction penalty using a method called predication, also referred to as guarding. Predication is actually used in lieu of branch instructions in this approach. Predicate registers are defined by special instructions and these “predicate” registers are associated with subsequent instructions and may be used to guard them. The predicate specifies a condition that determines whether or not the instruction will be executed. Once the condition is resolved, the predicated instructions are executed and the architectural state of the processor is updated appropriately.
Predication can be used to perform parallel compares, which are used to collapse the height of two or more conditional compares for execution in parallel within the same cycle despite output dependency. Parallel ‘and’ is used for Booleanand conditions and parallel ‘or’ is used for Boolean-or conditions. Parallel compares that execute simultaneously and write the same target register must write the same value. Parallel-and compares can only write a value of zero, thus the target register must be initialized to a one. Parallel-or compares can only write a value of one, thus the target register must be initialized to a zero.
One drawback of using parallel compares is the initialization of predicate registers. An initialization instruction is an additional instruction at the head of a dependence chain of instructions. For parallel-and compares, the predicate register initialization placement is important because the initial value is true (one). If there is an execution path on which the predicate register is undefined, the initial value should be false (zero) for the path. This situation restricts the initialization of the parallel-and result register to be directly before the parallel compares. Since there is a dependency between the initialization and the parallel compare instruction group, this adds to the overall critical path length. In general, materialization of initialization instructions that are restricted in code motion is not desirable because of the increase in resource usage and addition to the critical path length. Thus, it is desirable to optimize these predicate initialization instructions away.
SUMMARY OF THE INVENTION
According to one embodiment of the invention, there is provided a method and apparatus for finding a parallel compare sequence in a computer program using predication wherein the sequence requires an initialization, and using a predicate defined in the program in advance of the parallel compare sequence to initialize the parallel compares.


REFERENCES:
patent: 5832260 (1998-11-01), Arora et al.
patent: 5920716 (1999-07-01), Johnson et al.
patent: 5937195 (1999-08-01), Ju et al.
patent: 5999736 (1999-12-01), Gupta et al.
patent: 6026241 (2000-02-01), Chow et al.
patent: 6170052 (2001-01-01), Morrison
patent: 6192515 (2001-02-01), Doshi et al.
patent: 6286135 (2001-09-01), Santhanam
patent: 6321330 (2001-11-01), Doshi et al.
patent: 6353883 (2002-03-01), Grochowski et al.
Geva-Morris, IA-64 Architecture Disclosure White Paper, Feb. 1999, Intel Developers' Forum.*
Dehnert, J.C., et al., “Compiling for the Cydra 5”,The Journal of Supercomputing, 7(1/2), pp. 181-227, (1993).
Kathail, V., et al., “HPL-PD Architecture Specification: Version 1.1”,Hewlett Packard, Revision of HPL PlayDoh Architecture Specification: Version 1.0, Technical Report HPL-93-80, Feb. 1994, pp. 1-58, (Feb. 2000 (Revised)).
Mahlke, S.A., et al., “Effective Compiler Support for Predicated Execution Using the Hyperblock”,Proceedings of the 25th Annual International Symposium on Microarchitecture(MICRO 25), Portland, Oregan, pp. 45-54, (1992).
Park, J.C., et al., “On Predicated Execution”,Hewlet Packard, HPL-91-58, Software and Systems Laboratory, pp. 1-25, (May 1991).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Optimization of initialization of parallel compare... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Optimization of initialization of parallel compare..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Optimization of initialization of parallel compare... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3028111

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.