Method and apparatus for performing high bandwidth low...

Electrical computers and digital data processing systems: input/ – Input/output data processing – Data transfer specifying

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C710S005000, C710S022000, C710S119000, C710S305000, C711S145000

Reexamination Certificate

active

06434636

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to I/O operations in high performance computer systems. More specifically, the present invention relates to ordering I/O transactions by passing tokens between CPU agents and I/O agents in a multiprocessor computer system.
DESCRIPTION OF THE RELATED ART
In a modem high performance computer system having a plurality of CPUs, I/O drivers running on each CPU need to communicate with I/O cards to initiate and complete I/O requests. As is known in the art, it is common for the operating system to use semaphores to allow different processes (whether on the same CPU or different CPUs) to control access to an I/O resource. Once a semaphore has been acquired, a driver typically communicates with the card by performing write operations. These write operations are known in the art as programmed I/O (PIO) writes. Most programming models require that the CPU send a series of PIO writes to the card for each I/O transaction, and the PIO writes must be received by the card in order and without interleaving of PIO writes belonging to a different I/O transaction. PIO writes tend to be slow because the writes must typically travel from the CPU through a high latency interconnection fabric to the I/O card, and then an acknowledgment must be sent from the card back to the CPU through the same high latency interconnection fabric.
At one point in the evolution of computer design, it was common to transfer all data from the CPU to an I/O card using PIO writes. More recently, it has become common for the driver to place data into host memory, and allow the I/O card to retrieve the data using direct memory access (DMA) operations.
Since the I/O card does not know directly when the driver has written data to host memory, the card can either poll host memory periodically via a DMA read, or the driver can perform a PIO write to the I/O card indicating that new data has been placed in host memory. The PIO write is still relatively slow because it must travel through the high latency interconnection fabric, as described above. On the other hand, polling by the I/O card wastes bandwidth if done frequently, and increases latency if done infrequently.
To minimize PIO writes or polling, it is common for the driver to place a number of I/O requests into memory, and either link them together with a linked list or place pointers to the requests in a queue. This allows the I/O card to work on a number of requests before resorting to polling or waiting for a PIO write. Many card-driver programming models even allow the driver to extend the linked list or add to the queue after the card has started working on the requests, thereby further avoiding PIO writes or polling.
Unfortunately, these techniques cannot completely eliminate PIO writes or polling. Consider, for instance, the case where the card is able to service I/O requests faster than they are being supplied by the driver. The card will eventually catch up with the current batch of I/O requests and either need to poll or wait for a PIO write before it can work on subsequent requests. Since some PIO writes are needed even in the best programming models (for example, a PIO write is typically required to notify the card to start polling for DMA operations), the performance of PIO writes is critical to the overall I/O performance of the computer system.
In the discussion above, it was assumed that a PIO write requires that the write travel from the CPU to the I/O card through a high latency interconnection fabric, and that an acknowledgment be sent from the card to the CPU through the same fabric. The I/O space into which such a write occurs is known in the art “non-posted memory mapped I/O space”, and such writes will hereinafter be referred to as “non-posted PIO writes”. Note that non-posted PIO writes from multiple CPUs will remain ordered, since a write from one CPU will not be performed until a prior write from another CPU has been acknowledged. In essence, the ordering point of I/O transactions using non-posted PIO writes is the I/O card. Unfortunately, many CPU cycles are wasted waiting for each non-posted PIO write to complete, which results in a high cycle-per-instruction count and a slow non-posted PIO write completion rate.
The latency incurred by non-posted PIO writes is reduced somewhat in modem computer systems by moving the ordering point to a position in the high latency interconnection fabric that is closer to the CPUs. Consider that in a simple modem computer system, the high latency interconnection fabric is typically provided by a chipset. The chipset typically includes a CPU agent that is coupled to each CPU, and an I/O agent that is coupled to the I/O card. The I/O agent is typically coupled to the I/O card using a relatively low speed I/O bus, such as a PCI bus. PCI busses typically operate at speeds of 33-66 MHz. On the other hand, the bus between the I/O agent and the CPU agent (often referred to as a “front side bus”) is relatively fast. Front side busses typically operate at speeds greater than 100 MHz. The link between the I/O bus and the front side bus is known in the art as a bridge. For example, in a computer system having PCI card slots, a PCI bridge links the PCI bus to the front side bus. Typically the I/O agent is located at the bridge.
When the CPU issues a PIO write to the I/O agent, the writes are directed to a memory area known in the art as “posted memory mapped I/O space”. Such writes will hereinafter be referred to as “posted PIO writes”. Posted PIO writes also maintain ordering between multiple CPUs. However, in a posted PIO write the I/O agent generates the acknowledgment. Since this transaction occurs exclusively on the higher speed front side bus, the latency of the transaction is reduced. The I/O agent then communicates with the I/O card through the I/O bus, and guarantees that the ordering of the writes between the CPUs and the I/O agent is maintained.
In more complex modem computer systems, the I/O agent may be coupled to the CPU agent by a more complex high latency fabric, such as a crossbar or a ring. In such systems, posted I/O writes provide less of an advantage because the write must still traverse the high latency fabric.
While not nearly as common in the art as non-posted and posted I/O space, “accelerated I/O space” reduces latency further by moving the ordering point to the CPU agent. Writes to accelerated I/O space will hereinafter be referred to as “accelerated PIO writes”. In an accelerated PIO write, the CPU issues a write and the write is immediately acknowledged by the CPU agent, resulting in a PIO write operation having a very low latency because the acknowledgment does not need to travel on the front side bus or a higher latency fabric. The problem with accelerated PIO writes is that the ordering point can only be located at one CPU agent at one time. Typically, it is the responsibility of drivers and operating system software to monitor whether accelerated PIO writes have at least gotten to the I/O agent before switching the ordering point from one CPU agent to another. This is typically done by issuing PIO reads and writes to status registers in the I/O agent. Note that unnecessary PIO reads and writes may occur if one CPU releases and reacquires the ordering point without the ordering point being switched to another CPU. Since the drivers and operating system software must be “aware” of accelerated I/O space to allow the ordering point to be switched between CPU agents, and implementations of accelerated I/O space can vary, this technique has not been widely used and is difficult to support using “shrink-wrapped off-the-shelf” operating systems, such as the Windows NT® operating system provided by Microsoft Corporation.
In contrast, both posted and non-posted PIO writes provide a single ordering point at which PIO writes from multiple CPUs can be ordered. Accordingly, the drivers and operating system do not need to switch the ordering point. As a matter of fact, the drivers and software do not even need to be aware of whether a PIO write is be

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for performing high bandwidth low... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for performing high bandwidth low..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for performing high bandwidth low... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2952551

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.