Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
2000-06-14
2004-03-09
Tsai, Henry W. H. (Department: 2183)
Electrical computers and digital processing systems: processing
Processing control
Branching
C710S260000
Reexamination Certificate
active
06704863
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to computer processor operation, and more particularly to a method for optimizing the ability of a pipelined processor to respond to Direct Memory Access (DMA) interrupts. Described herein are means for reducing the time required for the processor to service a DMA request (or other exceptions or interrupts), without adversely impacting instruction flow in the processor's pipeline.
2. Description of the Related Art
Although nominally a computational device, the central processing unit (CPU) in a computing system is typically charged with a variety of other tasks. In addition to strictly computational functions, the CPU may be required to handle input/output from peripheral devices, manage memory, etc. Many of these activities are driven by external events, which may occur randomly with respect to the sequence of operations being carried out by the CPU. It is important that these event-driven functions be performed expediently by the CPU, and with minimal disruption of its computational activities. Polling external inputs to detect whether the event in question has occurred is an obvious, but very inefficient, way of doing this. Polling refers to the option of simply adding instructions to the main program sequence of the CPU to periodically test all of the event-driven inputs. However, since polling diverts the CPU from its main computational task, it presents a dilemma. If polling is done too infrequently, latency in responding to external events may become intolerable. On the other hand, polling too frequently, while improving the ability of the CPU to respond to external events, may add excessive overhead to the computational task.
Interrupts provide a way out of this dilemma. An interrupt is a special type of input to the CPU. When an interrupt occurs, the CPU temporarily suspends whatever it is doing and executes special interrupt-related instructions in response to the external event responsible for the interrupt. The interrupt-related instructions are typically referred to as an Interrupt Service Routine (ISR), and may perform some function requested by an external device. For example, an interrupt from a keyboard can momentarily divert the processor from executing main program instructions to accept a typed character. An ISR is typically executed as promptly as possible after the interrupt is received. Prior to entering the ISR, the CPU makes preparations so that, upon completion of the ISR, it can resume the process that was suspended when the interrupt occurred. This may involve saving the current context (i.e., program counter, status register, etc.). The advantage of using interrupts is that no time is wasted in polling the external inputs, since the CPU is never diverted from its computational activities until an interrupt occurs. Furthermore, the worst-case response time to an external event is no longer based on the polling interval. The interval between the occurrence of an interrupt and the completion of the ISR (known as the interrupt latency) is now dependent on shorter times, such as the time required for the CPU to save the context.
An architectural feature of many modern CPUs is the instruction pipeline. A pipeline consists of a sequence of stages through which instructions pass as they are executed, with partial processing of an instruction being performed in each stage. Each instruction typically comprises an operator and one or more operands. The operator represents a code designating the particular operation to be performed (e.g., MOVE, ADD, etc.), and the operand denotes an address or data upon which the operation is to be performed. Execution of the instruction requires several steps; e.g., the instruction must be decoded, the addresses of the operands computed, the operands fetched, and the operation executed. In a non-pipelined processor, only one instruction is processed at a time. Therefore, the instruction rate is based on the time required to perform all of these separate steps. However, in a pipelined processor, the steps are performed concurrently on multiple instructions, as they advance through the pipeline. An example of this is shown in
FIG. 1
, for a four-stage pipeline. The processing sequence for each instruction is from top to bottom. Each stage of processing is assumed to require one clock cycle, and the clock cycles are represented as time steps T
1
-T
6
. Instruction I
1
enters the first stage of the pipeline at time T
1
, where it is decoded. One clock cycle later, at time T
2
, instruction I
1
advances to the second stage of the pipeline, where the addresses of its operands are computed; simultaneously, a second instruction I
2
enters the first stage of the pipeline to be decoded. This process continues to time T
4
, where instruction I
1
is finally executed. By time T
5
, instruction I
1
has fallen out of the pipeline and instruction I
2
is executed. Note that once the pipeline is full, an instruction emerges from the pipeline for each clock cycle—four times faster than if each instruction had to be completed before processing the next one. In effect, the pipeline allows multiple instructions to be processed concurrently, and greatly enhances the bandwidth (i.e., instructions per second) of the CPU.
To operate efficiently, a pipeline must remain full to the extent possible. Anything that disrupts the flow of instructions into and out of the pipeline negates its benefits and diminishes bandwidth. In particular, if it becomes necessary to empty and refill the pipeline very frequently, performance may begin to approach that of a non-pipelined processor. This can potentially occur with an interrupt. As stated above, it is usually desirable to allow an interrupt to preempt the processor. To promptly respond to an interrupt, a pipelined processor typically discards unexecuted instructions from its pipeline, and then refills the pipeline as quickly as possible with the instructions required to service the interrupt (i.e., the ISR). After servicing the interrupt, the pipeline has to be refilled with the main program instructions that were pending when the interrupt took place. Obviously, emptying and refilling the pipeline reduces processor bandwidth. Moreover, the time required to refill the pipeline prior to executing the ISR adds to the interrupt latency.
Direct Memory Access (DMA) transfers are a type of external event capable of interrupting a CPU. A DMA transfer is typically used to move a large amount of data into or out of memory (e.g., when an image file is read from a hard disk into memory). It may be inefficient for the CPU to directly transfer blocks of data, so a special DMA memory controller typically manages the transaction. To initiate a DMA transfer, the controller interrupts the CPU. In response, the CPU gives the controller a few key parameters, such as a target address, size of the data block, etc., and allows it to carry out the data transfer. Although the DMA controller relieves the processor of having to oversee the mass data transfer, the DMA interrupt still disrupts the instruction pipeline, as described in the preceding paragraph, resulting in a loss of efficiency. In systems in which there is a great deal of DMA activity, the impact on latency and bandwidth may be significant. Efficient handling of DMA interrupts may therefore be an important factor in overall system performance in applications such as graphics processing, for example.
For a high-performance pipelined CPU, it would be desirable to avoid the above-mentioned disadvantages associated with responding to a DMA interrupt. It would be beneficial in particular, to minimize the loss in CPU bandwidth and the increased interrupt latency that result from having to empty and refill the pipeline to service the interrupt. It would be especially desirable if this could be accomplished in a straightforward manner, without extensively modifying the CPU.
SUMMARY OF THE INVENTION
The problems outlined above are in large part solved by a method for minimizing latency and loss
Efland Gregory H.
Paul Somnath
Conley & Rose, P.C.
Cypress Semiconductor Corp.
Daffer Kevin L.
Tsai Henry W. H.
LandOfFree
Low-latency DMA handling in pipelined processors does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Low-latency DMA handling in pipelined processors, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Low-latency DMA handling in pipelined processors will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3242452