Electrical computers and digital data processing systems: input/ – Intrasystem connection – Bus interface architecture
Reexamination Certificate
2000-04-06
2003-10-07
Rinehart, Mark H. (Department: 2189)
Electrical computers and digital data processing systems: input/
Intrasystem connection
Bus interface architecture
C710S309000, C710S310000, C710S107000
Reexamination Certificate
active
06631437
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to communication between devices on different buses of a computer system, and, more particularly, to a method and apparatus for promoting memory read commands and advantageously prefetch data to reduce bus latency.
2. Description of the Related Art
Computer systems of the PC type typically employ an expansion bus to handle various data transfers and transactions related to I/O and disk access. The expansion bus is separate from the system bus or from the bus to which the processor is connected, but is coupled to the system bus by a bridge circuit.
A variety of expansion bus architectures have been used in the art, including the ISA (Industry Standard Architecture) expansion bus, an 8-Mhz, 16-bit device and the EISA (Extension to ISA) bus, a 32-bit bus clocked at 8-Mhz. As performance requirements increased, with faster processors and memory, and increased video bandwidth needs, high performance bus standard were developed. These standards included the Micro Channel architecture, a 10-Mhz, 32-bit bus; an enhanced Micro Channel, using a 64-bit data width and 64-bit data streaming; and the VESA (Video Electronics Standards Association) bus, a 33 MHz, 32-bit local bus specifically adapted for a 486 processor.
More recently, the PCI (Peripheral Component Interconnect) bus standard was proposed by Intel Corporation as a longer-tern expansion bus standard specifically addressing burst transfers. The original PCI bus standard has been revised several times, with the current standard being Revision 2.1, available from the PCI Special Interest Group, located in Portland, Oregon. The PCI Specification, Rev. 2.1, is incorporated herein by reference in its entirety. The PCI bus provides for 32-bit or 64-bit transfers at 33 or 66 MHz. It can be populated with adapters requiring fast access to each other and/or with system memory, and that can be accessed by the host processor at speeds approaching that of the processor's native bus speed. A 64-bit, 66-MHz PCI bus has a theoretical maximum transfer rate of 528 MByte/sec. All read and write transfers over the bus may be burst transfers. The length of the burst may be negotiated between initiator and target devices, and may be any length.
A CPU operates at a much faster clock rate and data access rate than most of the resources it accesses via a bus. In earlier processors, such as those commonly available when the ISA bus and EISA bus were designed, this delay in reading data from a resource on the bus was handled by inserting wait states. When a processor requested data that was not immediately available due to a slow memory or disk access, the processor merely marked time using wait states, doing no useful work, until the data finally became available. To make use of this delay time, a processor such as the Pentium Pro (P6), offered by Intel Corporation, provides a pipelined bus that allows multiple transactions to be pending on the bus at one time, rather than requiring one transaction to be finished before starting another. Also, the P6 bus allows split transactions, i.e., a request for data may be separated from the delivery of the data by other transactions on the bus. The P6 processor uses a technique referred to as “deferred transaction” to accomplish the split on the bus. In a deferred transaction, a processor sends out a read request, for example, and the target sends back a “defer” response, meaning that the target will send the data onto the bus, on its own initiative, when the data becomes available.
The PCI bus specification as set forth above does not provide for split transactions. There is no mechanism for issuing a “deferred transaction” signal, nor for generating the deferred data initiative. Accordingly, while a P6 processor can communicate with resources such as main memory that are on the processor bus itself using deferred transactions, this technique is not used when communicating with disk drives, network resources, compatibility devices, etc., on an expansion bus.
The PCI bus specification, however, provides a protocol for issuing delayed transactions. Delayed transactions use a retry protocol to implement efficient processing of the transactions. If an initiator initiates a request to a target and the target cannot provide the data quickly enough, a retry command is issued. The retry command directs the initiator to retry or “ask again” for the data at a later time. In delayed transaction protocol, the target does not simply sit idly by, awaiting the renewed request. Instead, the target initially records certain information, such as the address and command type associated with the initiator's request, and begins to assemble the requested information in anticipation of a retry request from the initiator. When the request is retried, the information can be quickly provided without unnecessarily tying up the system's buses.
Differentiated commands are used in accordance with the PCI specification to indicate, or at least hint at, the amount of data required by the initiator. A memory read (MR) command does not provide any immediate indication as to the length of the intended read. The read is terminated based on logic signals driven on the bus by the initiator. A memory read line (MRL) command, on the other hand, indicates that the initiator intends to read at least one cache line (e.g., 32 bytes) of data. A memory read multiple command (MRM) indicates that the initiator is likely to read more than one cache line of data. Based on the command received, the bridge prefetches data and stores it in a buffer in anticipation of the retried transaction. The amount of data prefetched depends on the amount the initiator is likely to require. Efficiency is highest when the amount of prefetched data most closely matches the amount of data required.
Prefetching in response to MRL and MRM commands is relatively uncomplicated, because, by the very nature of the command, the bridge knows to prefetch at least one, and likely more than one, cache line. The amount of data required by an initiator of an MR command, on the other hand, is not readily apparent. Initiators may issue MR commands even if they know they will require multiple data phases. For example, the PCI specification recommends, but does not require, that initiators use an MRL or an MRM command only if the starting address lies on a cache line boundary. Accordingly, a device following this recommendation would issue one or more MR commands until a cache line boundary is encountered, and would then issue the appropriate MRL or MRM command. Also, some devices, due to their vintage or their simplicity, are not equipped to issue MRL or MRM commands, and use MR commands exclusively.
To illustrate the difficulties of anticipating the amount of data required by the initiator of an MR command,
FIGS. 1A through 1D
provide timing diagrams of exemplary MR transactions on a PCI bus. For clarity, only those PCI control signals useful in illustrating the examples are shown. The PCI bus uses shared address/data (AD) lines and shared command/byte enable (C/BE#) lines. In accordance with the PCI specification, a turnaround cycle is required on all signals that may be driven by more than one agent. In the case of the AD lines, the initiator drives the address and the target drives the data. The turnaround cycle is used to avoid contention when one agent stops driving a signal and another agent begins driving the signal. A turnaround cycle is indicated on the timing diagrams as two arrows pointing at each others' tail.
FIG. 1A
illustrates an MR command in which the initiator requires multiple data phases to complete the transaction. In this illustration, the target and initiator reside on the same PCI bus, and the target is ready to supply the data when requested. The initiator asserts a FRAME# signal before the rising edge of a first clock cycle (CLK
1
) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively. During
Callison Ryan
Hausauer Brian
Lee Christopher E.
Rinehart Mark H.
LandOfFree
Method and apparatus for promoting memory read commands does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for promoting memory read commands, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for promoting memory read commands will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3130317