Hardware system for fetching mapped branch target...

Electrical computers and digital processing systems: processing – Instruction fetching

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S206000, C712S207000, C711S118000, C711S119000, C711S120000, C711S137000

Reexamination Certificate

active

06430675

ABSTRACT:

TECHNICAL FIELD OF THE INVENTION
This application relates in general to run-time optimizers, and in specific to hardware embedded run-time optimizers.
BACKGROUND OF THE INVENTION
A run-time optimizer is an adaptive software system that transparently optimizes applications at run-time. The optimizer rewrites the binary code of an application on-the-fly to achieve a higher execution efficiency.
FIG. 2
depicts prior art run time optimizer
20
. The control loop
21
begins execution of a block of program code, via emulation performed by the profiling emulator
22
. The profiling aspect of emulator
22
allows the control loop
21
to track the number of times the particular block of code has been executed via emulation. Note that a run time optimization system is different from a run time binary translation system, in that the latter is for architecture migration, while the former is to decrease execution time. The run time optimization system is using the emulator
22
for profiling in order to guide optimizations, i.e. the code is running on its native system. After a predetermined number of executions via emulation, the control loop
21
designates the block of code as hot code, and desirable for optimization. The control loop
21
then activates trace selector
23
to translate the block of code. The trace selector
23
forms a trace of the instructions that comprise the block of code by following the instructions in the block. When a branch instruction is encountered, the trace selector makes a prediction as to whether the branch is taken or falls through. If the selector decides the branch is mostly taken, then the trace is formed by extending the code from the branch target block. If the selector decides not to take the branch, then the branch falls through, and the trace continues within the fall through block. The trace terminates at a backward branch predicted to take or when the trace becomes sufficiently large. After the trace is completed, the code is rewritten with machine dependent and machine independent optimizations. The optimized code is then placed into the code cache
24
. The next time the control loop
21
encounters a condition to execute this block of code, then the control loop
21
will execute the code in the code cache
24
and not emulate the code via emulator
22
.
As shown in
FIG. 3
, if the target of a branch which is taken to exit trace
1
, as shown by branch instruction
31
, then control is returned to the run time system RTS
20
and to control loop
21
, which determines if the target resides in the code cache. If the target resides in code cache, then the control loop
21
modifies the target of the branch instruction
31
to be the trace
2
in code cache as shown by branch instruction
33
. This modification is called backpatching. Thus, if the exit of the trace is already translated, then the branch is backpatched such that a subsequent execution will directly branch to the new trace without returning to the control loop. Backpatching increases the speed of execution of the code, as returning to the RTS significantly slows down execution time.
A problem with the prior art RTS is that it cannot backpatch an indirect branch. The RTS cannot backpatch an indirect branch because the target address is unknown. The target address is typically in a register or memory location, and not written directly in code. Thus, the RTS will shift control back to the control loop
21
to determine whether the target address has been translated, which is expensive in terms of time. The prior art has attempted to minimize this problem by inlining a code sequence to search a smaller look up table in the optimized traces, however, these mechanism still incur high overhead. Moreover, if small table lookup fails then the RTS will shift control back to the control loop, as described above. Examples of indirect branches are return branches and switch branches. This software approach adds an additional 10-100 s of cycles to the processing time.
Therefore, there is a need in the art for a RTS that can handle indirect branches without returning control to a control loop.
SUMMARY OF THE INVENTION
These and other objects, features and technical advantages are achieved by a system and method that uses a table to map branch targets that is built into the hardware as cache. Thus, when a fetch instruction is initiated, the IP-to-TM cache is examined to determine whether the branch target instruction has been optimized and placed into the trace memory. If there is a match with the P-to-TM cache, then the code in the trace is executed.
The P-to-TM cache is in the instruction fetch unit. This cache maps branch targets to optimized traces. This cache is examined in parallel with Instruction Translation Lookup Buffer (ITLB), if a match is found, the control will transfer to the optimized code. Otherwise, the execution control continues on the original code. This cache can significantly speed up the process of mapping a branch target to an optimized trace. Protection information will be included in the cache to enforce various protection needs typically served by ITLB.
The inventive mechanism eliminates the need to add a table look up to access separate pipeline stages, and thus increases the speed of the pipeline steps. This eliminates the need to use a software table look up and special handling of indirect branches.
When instructions are executed, the instructions typically have a virtual address. This virtual address needs to be translated into a physical memory address, particularly when assessing caches like the instruction cache or data cache. This translation is usually done by the ITLB. Thus, every time an instruction needs to be fetched, the ITLB needs to be examined. The inventive mechanism uses this requirement to perform a parallel lookup in the IP-to-TM cache.
The inventive mechanism first determines if the target address is a trace address, if so then the IP-to-TM cache does not need to be examined, and the instruction can be fetched directly from the trace memory. If the target address is virtual address in original code address, then the mechanism examines both the IP-to-TM cache and ITLB. If the address matches an entry in the IP-to-TM cache, then the instruction is fetched from trace memory. Note that if the IP-to-TM cache hits, then the ITLB should also hit, however, the ITLB hit is ignored. If the IP-to-TM cache misses, and the ITLB hits, then the instruction is fetched from original code and executed from there. If both the IP-to-TM cache and ITLB miss, then the mechanism invokes a hardware walker to load the correct translation into the ITLB.
Therefore, it is a technical advantage of the present invention to have the mapping from a branch to a trace stored in the IP-to-TM cache that is embedded into the hardware.
It is further technical advantage of the present invention that the IP-to-TM cache handles indirect branches much more efficiently than a software approach.
It is a further technical advantage of the present invention that the IP-to-TM cache and the ITLB are examined in parallel.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and the specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.


REFERENCES:
patent: 4611281 (1986-09-01), Suko et al.
patent: 5668969 (1997-09-01), Fitch
patent: 5909578 (1999-06-01), Buzbee
patent: 5949995 (1999-09-01), Freeman
patent: 5950009 (1999-09-01), Bortnikov et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Hardware system for fetching mapped branch target... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Hardware system for fetching mapped branch target..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Hardware system for fetching mapped branch target... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2927959

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.