System and method for fusing instructions

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C717S106000, C717S154000, C712S030000, C712S042000, C712S206000, C712S221000, C712S223000, C708S205000

Reexamination Certificate

active

06675376

ABSTRACT:

BACKGROUND OF THE INVENTION
I. Field of the Invention
This invention relates generally to computer technology, and more particularly, to improving processor performance in a computer system.
II. Background Information
Developers are continually trying to improve processor performance and program execution time. Processor performance and program execution time can be improved using hardware and software techniques. Hardware techniques include pipelining where the fetch, decode, and execute logic stages are overlapped such that the processor operates on several instructions simultaneously. Software techniques include having a compiler optimize the program code. Normally, passes in the compiler transform programs written in a high-level language (e.g., the high-level programming language may be the “C” computer programming language) into progressively lower-level representations, eventually reaching the instruction set. The instruction set is the collection of different instructions that the processor can execute (e.g., the Intel Architecture 32-bit (“IA-32”) instruction set from Intel Corporation).
An optimizing compiler is a compiler that analyzes its output to produce a more efficient (smaller or faster) instruction set. The optimizing compiler may use multiple passes to convert high-level code to low-level code (the instruction set). One way that the optimizing compiler improves program execution time is by reducing the code footprint (number of instructions generated into assembly language from the high-level program code). Reducing the code footprint improves program execution time since the program code has fewer instructions, and thus fewer instructions are fetched from a memory unit in the fetch stage (the memory unit's speed is slower than the processor's speed) and fewer instruction are decoded in the decode stage.
Reducing the code footprint also improves processor performance as a cache memory is better utilized. Almost all modem processors use cache memory. Cache memory is a special memory subsystem in which frequently used data values are duplicated for quick access. Cache memory is useful when main memory accesses are slow compared with processor speed, because cache memory is faster than main memory. Cache memory has to be efficiently utilized in order to obtain a high ratio of “hits” (e.g., the data is found in the cache memory and thus access to the main memory is avoided) to “misses” (e.g., the main memory is accessed in order to obtain the data). Since a cache miss results in additional time to retrieve the data into the cache, processing time is lost waiting for this data to arrive when a cache miss occurs. An instruction cache is cache memory that stores instructions fetched from main memory. Reducing the code footprint allows more of the instructions that make up the program code to be stored in the instruction cache, thus increasing the likelihood of a cache hit and the resulting increase in processor performance. Other means of instruction storage can benefit from code footprint reduction. For example, a trace cache stores instructions that have already been executed. By reducing the code footprint, the number of executed instructions stored in the trace cache increases and thus increases the likelihood of cache hits and the resulting increase in processor performance.
In a pipeline implementation, the bottleneck tends to be feeding an execution unit (the fetch and decode stages feed the execution unit) rather than executing the instructions themselves (this occurs in the execution stage). If two or more instructions are packed into the storage space of a single instruction, then multiple instructions can be fetched and decoded in the time that it takes to fetch and decode a single instruction resulting in the execution unit being fed at a faster rate and thereby improving the processor performance.
A clock cycle determines how quickly the processor can execute instructions and is used to synchronize the activities of various components of a computer system. The length of the clock cycle is determined by the time required for the slowest instruction to execute. Typically, the execution unit (in the execution stage) executes one instruction per clock cycle (i.e., performs one operation per clock cycle). However, because the clock cycle is tailored for the slowest instruction, many of the instructions finish executing long before completion of the clock cycle. Because the clock cycle is tailored toward the slowest instruction, one instruction performing two operations or two instructions (each instruction performing only one operation) may be executed in one clock cycle if a specialized execution unit is available that can execute both operations simultaneously. If the specialized execution unit is employed, then upon decoding one or more instructions that can benefit from the specialized execution unit, those instructions can be tagged for execution on the specialized execution unit.
For the foregoing reasons, there is a need to combine instructions whenever possible in order to minimize the program size and thus improve processor performance and program execution time. There is a also a need for a specialized execution unit that can process two operations in one clock cycle.


REFERENCES:
patent: 5392228 (1995-02-01), Burgess et al.
patent: 5903761 (1999-05-01), Tyma
patent: 5957997 (1999-09-01), Olson et al.
patent: 6006324 (1999-12-01), Tran et al.
patent: 6018799 (2000-01-01), Wallace et al.
patent: 6151618 (2000-11-01), Wahbe et al.
patent: 6247113 (2001-06-01), Jaggar
patent: 6282634 (2001-08-01), Hinds et al.
Title: A Fetch-And-Op Implementation for Parallel Computers, author: Lipovski et al, IEEE, 1988.*
Title: Improving the three instruction machine, author: Agro, ACM, 1989.*
Title: An Efficient Resource-Constrained Global Scheduling Technique for Superscalar and VLIW processors, author: Moon et al, IEEE, 1992.*
Title: Design of a Machine-Independent Optimizing System for Emulator Development, author: Ma et al, ACM, 1980.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for fusing instructions does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for fusing instructions, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for fusing instructions will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3239780

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.