Electrical computers and digital data processing systems: input/ – Input/output data processing – Input/output access regulation
Reexamination Certificate
2001-12-24
2004-06-22
Vo, Tim (Department: 2181)
Electrical computers and digital data processing systems: input/
Input/output data processing
Input/output access regulation
C710S112000
Reexamination Certificate
active
06754737
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to transactions on a computer interconnect and, more specifically, to the ordering of read and write transactions on a computer bus.
DESCRIPTION OF THE RELATED ART
FIG. 1
shows the architecture of a typical computer system
8
in which a high-speed bus, such as the PCI bus
10
, interconnects several I/O device adapters
12
,
14
. Each I/O device adapter
12
,
14
is either an initiator or a target and the PCI bus (or PCI-X bus) serves to carry read and write transactions between the I/O device to which the adapter is connected. The CPU
16
for the computer system is connected to the bus
10
by means of a bridge device
18
which also provides a path between the CPU
16
and main memory
20
. Another bridge device
22
connects a slower bus
24
to which devices, such as a printer adapter
26
, and keyboard and mouse interfaces
28
, are connected.
In one version of the PCI bus
10
, an initiator (master) connects to a target (slave) via the bus to perform a transaction.
FIG. 2
shows a typical PCI read transaction
40
, a write transaction
42
, and a retry request
44
. Read transactions include an address phase
46
, a command phase
48
, one or more data phases
50
a-d
and attribute phases
52
a-d
. Each of the data phases
50
a-d
can be delayed by the target or initiator for a specific number of clocks in order to match the data transfer speed of the target to the initiator. Write transactions are similar, having an address phase
54
, a command phase
56
, one or more data phases
58
a-d
, and attribute phases
60
a-d
. The initiator or target can stall a data phase (via wait states) for up to seven clocks. (The target can stall the start of the first data phase for up to 15 clocks). Before the initiator can connect to a target to perform a data transaction, the initiator must become the owner of the bus. This implies that the initiator must be the winner of an arbitration process.
In addition to data transactions, the PCI bus supports Delayed Transactions for reads and writes. A Delayed Transaction has two parts, the request part and the completion part. In the first part
44
in
FIG. 2
, the initiator performs an address phase
62
and command phase
64
, and before the first data phase
66
, the target responds with a disconnect
68
, as shown in FIG.
2
. The initiator interprets the target disconnect to be a retry request
44
, which the initiator honors by ending the current transaction (FIG.
2
), returning the bus to the idle state, re-arbitrating for ownership and re-initiating the transaction. If the initiator again receives a retry indication from the target, the initiator repeats the above sequence
44
. Thus, the address phase
62
, may be repeated several times (causing multiple re-arbitrations as well), until the target is ready to transfer data. In the completion part of the Delayed Transaction, the initiator performs an address phase and the target replies with a data transfer rather than a disconnect
68
.
It is easily appreciated by-one skilled in the art that the above-described operation of the PCI bus is exceedingly inefficient. Throughput on the PCI bus is lost for two reasons, the insertion of wait states and the use of the retry protocol.
Wait states cause a direct loss in throughput. Just one wait state inserted in each data phase is a 50% loss in throughput during the data burst. This means that for a 32 bit PCI bus clocked at 33 MHz, the throughput during the data phase is reduced to 66 Megabytes per second from 132 Megabytes per second. If the bus were clocked at 66 MHz, the throughput loss is even greater—a full 132 Megabytes per second of loss. For devices that can sustain transfer rates of about 1 Gigabyte per second, the bus is simply unworkable.
The Delayed Transaction protocol also causes a significant loss in throughput because bus cycles that could be used for data transfers are used to support a high-overhead protocol. Bus cycles are wasted when the target replies with a disconnect, when the initiator ends the current transaction, lets the bus go idle, re-arbitrates for the bus, and initiator then re-performs the address phase of the disconnected transaction. Thus, the cost of each retry is at least 6 clocks, 4 clocks to return the bus to the idle state, at least one clock for arbitration, and at least one more clock for an address phase. During these 6 clocks an entire 4 dword burst could have occurred.
In both of these cases throughput was lost because the target was not ready to respond. Clocking the bus faster to improve the throughput only causes more clock cycles to be lost due to wait states and the inefficient Delayed Transaction protocol.
An updated version of the PCI bus, PCI-X, was developed to address these and other deficiencies. In the PCI-X specification, wait states are not permitted once data transfers have begun. A data burst, once started, must proceed at full speed on the bus. The read, write and split request transactions for PCI-X are shown for reference in FIG.
3
. Each transaction type has an Address/Cmd phase, followed by an attribute phase and a response phase. After the response phase a data transfer ensues. Only the response and first data phase are extensible by adding a limited number of wait states. After the first data phase, the remaining data phases must proceed at one bus clock per data phase.
Additionally, in the PCI-X specification, the inefficient Delayed Transactions have been replaced by Split Transactions, as shown in FIG.
3
.
In a Split Transaction, a Requester initiates a transfer
75
by performing an address/cmd phase
86
, an attribute phase
88
, a response phase
90
, an unused data phase
92
and a surrender phase
94
. Upon receiving a Split Response Request
96
from a Completer at the appropriate time in the transaction, the Requester removes itself from the bus, commits resources to the transaction and suspends the transaction until the Completer responds. This makes the bus available for use to other Requesters and Completers in the interim. When the data transfer is ready to occur at the Completer, the Completer acts as an Initiator, obtaining the bus and performing a Split Completion transaction
71
, which includes an address phase
70
, an attribute phase
72
, a response phase
74
and one or more data phases
76
a-d
during which the requested data is transferred to the Requester.
To make it easier to conform to the newer PCI-X specification and to improve the performance of the older PCI protocol, it is best that both the Requester and Completer are implemented with read and write storage buffers so that when a write or read data burst is ready to occur, it can proceed at full bus speeds. Additionally, both the Requester and Completer are likely, in most implementations, to have Initiator and Target interfaces to carry out the Split Transaction protocol and each interface is required to be registered on both inputs and outputs.
However, the use of read and write buffers and Initiator and Target interfaces on the adapter increases the chances that PCI and PCI-X read/write ordering and deadlock avoidance rules may not be met.
PCI bus ordering rules require that if write data is posted to a write buffer (such as a posted-write buffer in a PCI-to-PCI or host/PCI bridge) the data must be flushed to its final destination (memory) before a read of that same data is allowed by the same or different bus master. Also, a bridge must perform all posted writes in the same order in which they were originally posted and is only permitted to post writes to regular memory targets.
On the PCI-X bus, there are more extensive read-write ordering rules when buffers are involved because of Split Transactions. For example, for bridges between a PCI-X bus and a host bus or between two PCI-X busses, there are three sets of rules, as set forth, in summary, below. The rules are set forth in more detail on pages 573-577 of PCI-X System Architecture, Tom Shanley, ISBN 0-201-72682-3, which is incorporated by reference into the pr
Heynemann Tom A.
Knowles Michael W.
Sprouse Jeffrey A.
LandOfFree
Method and apparatus to allow dynamic variation of ordering... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus to allow dynamic variation of ordering..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus to allow dynamic variation of ordering... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3344431