DMA exclusive cache state providing a fully pipelined...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C710S022000, C711S140000, C711S141000, C711S145000

Reexamination Certificate

active

06785776

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates generally to data processing systems and in particular to input/output (I/O) mechanisms of a data processing system. Still more particularly, the present invention relates to a method and system for providing fully pipelined I/O Direct Memory Access (DMA) write operations via utilization of a DMA Exclusive cache state.
2. Description of the Related Art
A standard data processing system comprises one or more central processing units (CPU), one or more levels of caches, one or more memory, and input/output (I/O) mechanisms all interconnected via an interconnect. Traditionally, the interconnects utilized consisted primarily of a system bus and an I/O bus. In newer processing systems, however, particularly those with large numbers of CPUs and distributed memory, a switch is often utilized as the interconnecting mechanism.
In addition to the major components, data processing systems today are often equipped with an I/O controller, which controls I/O operations for the various I/O devices. More than one I/O controller may be utilized, each supporting particular I/O devices via an I/O channel, and the I/O controllers may be coupled to the interconnect via an I/O bus. Further, new processing systems typically comprise a plurality of paths (buses) for routing transactions between the I/O controller and the memory or distributed memory. Each path includes a series of latches, etc., and may each have different transmit times/latency based on the distance to/from the memory and number of latches, etc. Data is transmitted along these paths in a packet-like manner and each data packet may have different access latencies. Thus, in operation, data A written to a first memory or memory location may have a different access latency than data B written to a second memory or memory location if data A travels on a different path than data B.
Computer systems typically provide at least one system bus and a system memory area that is predominantly used by one or more processors for computation and data manipulation. I/O is sometimes performed by the processor. However, utilization of the CPU to perform input/output (I/O) transfers for these peripheral devices and subsystems places a burden on the CPU and negatively affects the CPU's efficiency. Thus, Direct Memory Access (DMA) controllers have been provided in computer systems for off-loading transaction work from the CPU to a dedicated controller, in order to increase the availability of the CPU to perform computational and other tasks.
Each DMA operation is a specialized processor operation that transfers data between memory and I/O devices. The DMA transaction operates as a master on the I/O bus and is frequently a part of the I/O controller. When, the I/O controller completes the DMA task, the I/O controller signals (i.e., sends an interrupt to) the processor that the task specified is complete.
The DMA controllers free the processor from I/O tasks and usually perform transfers more efficiently. DMA I/O transfers can also be performed by the devices themselves. This type of device is referred to as a “bus master” because it is capable of acquiring a bus and transferring data directly to and from memory or devices located on the bus.
The application software or device driver performs data communication with the device by writing or reading the data to or from memory and signaling the device or DMA controller to perform the transfer. A DMA transfer can also be performed from one device to another device using two discrete DMA transfers, one writing to memory, i.e., a DMA Write, and the second reading from memory, i.e., a DMA read. With a DMA Write, the input device data is transferred to memory from the input device by a DMA controller or by the input device if it is a bus master and the data is written to system memory.
The I/O channels provide input and output commands to and from peripheral components, respectively. Standard, logical operation of current processing systems requires that operations to memory be completed in the order in which they are received (i.e., sequential program order). Thus, the I/O channels operate as a First In First Out (FIFO) devices because the I/O writes to system memory from a device must be “ordered” to the system memory. That is, for example, an I/O DMA Write command of a 128 Byte cache line A that is sequentially followed by an I/O DMA Write command of a 4 Byte cache line B has to be completed (i.e., data written) before the write of cache line B can begin execution. The write data B request is placed in the FIFO queue at the I/O controller and waits on the receipt of a completion signal from the write data A operation. The processor begins execution of write data B command only after receipt of a completion signal.
FIG. 2A
illustrates a sample timing diagram by which the writes of data A and data B are completed according to the prior art. As shown, DMA Write A
201
is issued at time
0
(measured in clock cycles) and a corresponding snoop response
203
is generated and received several cycles later. When the clean snoop response
203
is received, often after several retries of DMA Write A
201
, the acquisition and transmission of data A to the memory block is undertaken over the next few cycles. Then, the actual writing (storage) of data A
205
is completed over several cycles. Following the completion of the write data A
205
, an acknowledgment
207
is sent to the processor to indicate the completion of the write data A operation. Once the acknowledgment
207
is received, the DMA Write B data
209
commences and takes several cycles to complete (see snoop response
211
and B data to storage
213
). Data B is then stored in memory. Since no operation is issued to the I/O bus while the DMA Write data A operation is completing, the bus remains idle for several cycles and write data B
209
is held in the FIFO queue.
Once the write A command is issued, the processor waits for the return of a tag or interrupt generated by the successful completion of the previous write data A operation. When the tag or interrupt returns, this indicates that data A storage to memory is completed, and the CPU can then issue the read data B command.
The logical structure of processing systems requires that I/O operations be ordered in the I/O channel. Thus, the I/O channel must write the data to memory “in-order” and also must wait until the successful completion of the previous operation before issuing the next operation. This waiting/polling is required because, as in the above example, if write B is issued prior to the completion of write A in current systems, write B would be completed before write A because of the smaller size of data B. This would then cause corruption of data and the corrupted data would propagate throughout the execution of the application resulting in incorrect results being generated and/or possibly a stall in the processor's execution.
The long latency in completing some write operations, particularly those for large data such as data A, coupled with the requirement that the next operation cannot begin until after the completion of the previous write operation, significantly reduces overall processor efficiency. The present architectural and operation guidelines for processing systems that require the maintenance of the order when completing operations is proving to be a significant hurdle in development of more efficient I/O mechanisms. Currently, system developers are looking for ways to streamline the write process for I/O operations. Pipelining, for example, one of the key implementation techniques utilized to make CPUs faster, has not been successfully extended to I/O transactions because of the requirement that the previous data operation be completed prior to the next operation beginning. Current DMA transactions operate as single threaded transactions (or in a serialized manner), and there is currently no known way to extend the benefits of pipelining to DMA operations. One method suggested to reduce t

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

DMA exclusive cache state providing a fully pipelined... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with DMA exclusive cache state providing a fully pipelined..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and DMA exclusive cache state providing a fully pipelined... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3355125

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.