Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Reducing an impact of a stall or pipeline bubble
Reexamination Certificate
2000-10-31
2004-09-28
Pan, Daniel H. (Department: 2183)
Electrical computers and digital processing systems: processing
Dynamic instruction dependency checking, monitoring or...
Reducing an impact of a stall or pipeline bubble
C712S227000, C712S234000, C712S245000
Reexamination Certificate
active
06799266
ABSTRACT:
BACKGROUND
1. Field of the Invention
The invention relates to methods and apparatus for reducing code size of instructions on a microprocessor or micro-controller, e.g., on digital signal processing devices, with instructions requiring NOPs (hereinafter “processors”). In particular, the invention relates to methods and apparatus for reducing code size on architectures with an exposed pipeline, such as a very large instruction word (VLIW), by encoding NOP operations as an instruction operand.
2. Description of Related Art
VLIW describes an instruction-set philosophy in which a compiler packs a number of relatively simple, non-interdependent operations into a single instruction word. When fetched from a cache or memory into a processor, these words are readily broken up and the operations dispatched to independent execution units. VLIW may perhaps best be described as a software- or compiler-based, superscalar technology. VLIW architectures frequently have exposed pipelines.
Delayed effect instructions are instructions, in which one or more successive instructions may be executed before the initial instructions effects are complete. NOP instructions are inserted to compensate for instruction latencies. A NOP instruction is a dummy instruction that has no effect. It may be used as an explicit “do nothing” instruction that is necessary to compensate for latencies in the instruction pipeline. However, such NOP instructions increase code size. For example, NOPs may be defined as a multiple cycle NOP or a series of individual NOPs, as follows:
Example A:
Example B:
inst
inst
nop m
nop
nop
. . .
nop
NOPs occur frequently in code for VLIWs.
Often NOP instructions are executed for multiple sequential cycles. The c6x series architecture has a multi-cycle NOP for encoding a sequence of NOP instructions. c6000 platform, available from Texas Instruments, Inc., of Dallas, Tex., provides a range of fixed- and floating-point digital signal processors (DSPs) that enable developers of high-performance systems to choose the device suiting their specific application. The platform combines several advantageous feature with DSPs that achieve enhanced performance, improved cost efficiency, and reduced power dissipation. As some of the industry's most powerful processors, the c6000 platform, available from Texas Instruments, Inc., of Dallas, Tex., offers c62x fixed-point DSPs with performance levels ranging from 1200 million instructions per second (MIPS) up to 2400 MIPS. The c67x floating-point devices range from 600 million floating-point operations per second (MFLOPS) and to above the 1 GFLOPS (1 billion floating-point operations per second) level. To accommodate the performance needs of emerging technologies, the c6000 platform provides a fixed-point and floating-point code compatible roadmap to 5000 MIPS for the c62x generation fixed-point devices and to more than 3 GFLOPS for the floating-point devices.
Load (LD) and branch (B) instructions may have five (5) and six (6) cycle latencies, respectively. A latency may be defined as the period (measured in cycles or delay slots) within which all effects of an instruction are completed. Instruction scheduling is used to “fill” these latencies with other useful operations. Assuming that such other instructions are unavailable for execution during the instruction latency, NOPs are inserted after the instruction issues to maintain correct program execution. The following are examples of the use of NOPs in current pipelined operations:
Example 1a:
LD *a0, a5
% load a5 to a0 (one (1) cycle)
NOP 4
% no operations for four (4) cycles
(delay slots)
ADD a5, 6, a7;
% a5 value available
Example 2a:
B Label
% a branch to label instruction
(one (1) cycle)
NOP 5
% no operations for five (5) cycles
(delay slots)
;
% branch occurs
Although NOPs are used to compensate for delayed effects of other instructions, NOPs may be associated with other types of instructions having a latency greater than one (1). Generally complex operations, load instructions that read memory, and control flow instructions (e.g., Branches) have latencies greater than one (1), and their execute phases may take multiple cycles.
Pipelining is a method for executing instructions in an assembly-line fashion. Pipelining is a design technique for reducing the effective propagation delay per operation by partitioning the operation into a series of stages, each of which performs a portion of the operation. A series of data is typically clocked through the pipeline in sequential fashion, advancing one stage per clock period.
The instruction is the basic unit of programming that causes the execution of one operation. It consists of an op-code and operands along with optional labels and comments. An instruction is encoded by a number of bits, N. N may vary or be fixed depending on the architecture of a particular device. For example, the c6x family of processors, available from Texas Instruments, Inc., of Dallas, Tex., has a fixed, 32-bit instruction word. A register is a small area of high speed memory, located within a processor or electronic device, that is used for temporarily storing data or instructions. Each register is given a name, contains a few bytes of information, and is referenced by programs.
In one example of an instruction pipeline, the pipeline may consist of fetch, decode, and execute stages. Each of these stages may take multiple cycles. For example, the instruction-fetch phase is the first phase of the pipeline. The phase in which the instruction is fetched from program-memory. The instruction-decode phase is the next phase of the pipeline; the phase in which the instruction is decoded. The operand-fetch phase is the third phase of the pipeline, in which an operand or operands are read from the register file. Operands are the parts of an instruction that designates where the central processing unit (CPU) will fetch or store information. The operand consists of the arguments (or parameters) of an assembly language instruction. Finally, in the instruction-execute phase, the instruction is executed. An instruction register (IREG) or (IR) is a register that contains the actual instruction being executed, and an instruction cache is an on-chip static RAM (SCRAM) that contains current instructions being executed by one of the processors.
SUMMARY OF THE INVENTION
Thus, a need has arisen for a method and apparatus for reducing or minimizing code size by reducing the number of NOP instructions and a method for reducing the total and average code size for codes developed for use with an exposed pipeline and on processors. Because the insertion of NOPs as separate instructions increases code size, by including the NOP as a field within an existing instruction, code size may be reduced.
Further, the need has arisen to reduce the cost of processors by reducing the memory requirements for such devices. Reducing code size reduces total system cost by lessening or minimizing the amount of physical memory required in the system. Reducing code size also may improve system performance by allowing more code to fit into on-chip memory, i.e., memory that is internal to the chip or device, which is a limited resource.
Moreover, the need has arisen to increase the performance and capabilities of existing processors by reducing the memory requirements to perform current operations. It also may improve performance in systems that have program caches.
In addition, the need has arisen for methods for reducing the total power required to perform the signal processing operations on existing and new devices. Reducing code size also reduces the amount of power used by a chip, because the number of instructions that are fetched may be reduced.
In an embodiment, the invention also is a method for reducing total code size in a device having an exposed pipeline, e.g., in a processor. The method may comprise the steps of determining a latency between a defining instruction, e.g., a load instruction, and a using instruction and inserting a NOP field into the defining or using instruction or into an intervening instr
Granston Elana D.
Stotzer Eric J.
Ward Alan S.
Brady III W. James
Marshall, Jr. Robert D.
Pan Daniel H.
Telecky , Jr. Frederick J.
Texas Instruments Incorporated
LandOfFree
Methods and apparatus for reducing the size of code with an... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods and apparatus for reducing the size of code with an..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for reducing the size of code with an... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3220276