Link pipe system for storage and retrieval of sequences of...

Electrical computers and digital processing systems: processing – Processing control – Branching

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06640297

ABSTRACT:

FIELD OF THE INVENTION
This invention relates to digital processors and, more particularly, to a novel method and apparatus for increasing the speed of execution of a sequence of indirect branches by permitting the latencies between the identification of the indirect branch targets and the execution of the indirect branch instructions to be overlapped.
BACKGROUND
One of the important known techniques for effectively improving processor performance is the use of an execution pipeline through which instructions are “pipelined” for execution. In that technique, the execution of each instruction is separated or broken into multiple sub-steps, and those sub-steps are performed (e.g. processed) in an overlapping “stair step” manner. The pipelining technique works well for sequential execution of instructions. A recognized difficulty, however, is how to maintain the overlap of the sub-steps in the presence of instructions, such as branches, which are able to alter the flow of control of the processing. When the flow of control changes unexpectedly, the future instruction sub-steps which have and are being performed in the pipeline must be voided and the pipeline restarted at the new execution address. The time taken to make that changeover generally detracts from the execution efficiency of the program being processed and is often referred to as a pipeline or branch “bubble”.
Typically, branch bubbles are addressed by identifying or, more specifically, attempting to identify branch instructions early, that is, in the earliest stage or stages of the instruction pipeline. When the branch and branch target address can be identified early enough, pipelining can be started beginning at the branch target. Should the branch then be taken, the branch bubble is thereby reduced or, possibly, eliminated.
For a large class of branches, often referred to as direct branches, the branch target address is embedded in the branch instruction either as a direct address or as an offset relative to the address of the branch instruction. For the foregoing class of branches, early identification of both the branch instruction and the branch target in the pipeline process is relatively easy of accomplishment by the pipelining mechanism. The pipelining mechanism is thus able to initiate procedures which reduce or eliminate pipeline bubbles.
For another important class of branches, often referred to as indirect branches, the branch target address is not embedded in the branch instruction but is located in a register within the processor. Such a branch presents a greater difficulty for the pipelining mechanism. Although the pipeline mechanism is able to identify the branch instruction early, which would also identify which register holds the target address, the pipeline mechanism is unable to correctly identify the target address. The reason for that inability is that the value that is contained in the identified register (e.g. a branch address) could be changed during the interval between the time when the initial identification was made of the assertion of the branch instruction and the time, shortly thereafter, when the branch instruction actually reaches the execution stage of the pipeline. As a consequence on many processors indirect branches always cause the maximum length of branch bubble, increasing the processing time for instruction execution.
A partial solution to the foregoing problem was to trigger the pipelining of indirect branch target instruction, not on the identification of the indirect branch instruction in the pipeline, but, instead, on the writing of a branch target address to a special branch target address register. This technique may be implemented with one or more of such branch target registers and has proven effective in many classes of computation.
However, there are some important classes of computation, notably interpreters, whose heavy and stylized use of indirect branch instructions continues to result in excessive performance-limiting branch bubbles during processing of a program. In these cases, there is a need to execute a sequence of indirect branches with few intervening branch instructions.
An interpreter typically contains a main loop which fetches an instruction, breaks the instruction into pieces and performs subroutine calls or jumps to perform the actions specified by the pieces. As an example, assume an interpreter that handles a sixteen bit instruction set, such that a typical instruction is of the form: [opcode-7 bits] [operand reg
1
-3 bits] [operand reg
2
-3 bits] [result reg-3 bits]. One instruction might be to “add r
1
, r
2
, r
3
”.
The interpreter would fetch the full sixteen bit instruction, break the instruction into the four pieces, above bracketed and then use the opcode field to select the address of the “add” handling code from a code pointer table. Then the interpreter would take an indirect branch to that code. The “add” handling code could then use the operand fields to select subroutine addresses to fetch the values of reg
1
and reg
2
; then execute those subroutines using indirect branches or calls; and then perform the add. Finally, the interpreter would use the result register field to select a subroutine to effect a store of the (reg
1
+reg
2
) value into the result register. The foregoing results in a good number of indirect branches with few intervening instructions. In other words the register fetch and register store subroutines or code sections may be only one or two instructions in length.
Although the foregoing processing technique for the interpreter allows overlapping of the indirect branch latency with the execution of the intervening non-indirect branch instructions, it is not possible to overlap the latencies of the indirect branch instructions with respect to each other in processor systems containing a single branch target register; and it is very difficult to overlap those latencies in those processing systems that contain multiple directly-addressed branch target registers.
As an advantage, the present invention introduces a system and method of permitting the overlapping of indirect branch latencies with respect to each other, effectively speeding up processing of application programs.
Accordingly, an object of the invention is to increase the processing speed of a digital processor.
Another object of the invention is to permit an overlap of the latencies between the identification of the indirect branch targets and the execution of the indirect branch instructions in a sequence of indirect branch instructions.
SUMMARY
In accordance with the foregoing objects and advantages, a computer implemented method for increasing the speed of processing of a code sequence containing two or more indirect branches comprises the writing of target branch addresses for a sequence of indirect branch instructions to a single storage address; and reading those target branch addresses from another storage address in the same sequence earlier written. The respective writes and reads of the target branch addresses are accomplished on a first-in, first-out basis. Accordingly, in a pipelined processing system separate branch instructions in a sequence may retrieve the associated branch addresses in the same sequence, incrementally spaced apart in time, permitting the respective branch instructions to be processed in overlapped relationship, and thereby expedite processing of the program associated with such branch instructions.
A digital processor constructed in accordance with the present invention employs (or reserves) a series of registers, defining or termed a link pipe for temporarily storing branch addresses, a first pointer circuit, called a head pointer, that points the processor to successive different registers in the link pipe when the processor is to write, store, multiple branch instruction addresses; and a second pointer circuit, defining or termed a tail pointer, that points the processor to successive registers in the link pipe when the processor is to read, retrieve, multiple branch instr

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Link pipe system for storage and retrieval of sequences of... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Link pipe system for storage and retrieval of sequences of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Link pipe system for storage and retrieval of sequences of... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3151256

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.