Electrical computers and digital processing systems: memory – Storage accessing and control – Control technique
Reexamination Certificate
1997-12-29
2001-03-27
Ellis, Kevin L. (Department: 2751)
Electrical computers and digital processing systems: memory
Storage accessing and control
Control technique
C711S141000
Reexamination Certificate
active
06209068
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to an improved read line buffer for cache systems of processor and to a communication protocol in support of such a read line buffer.
2. Related Art
In the electronic arts, processors are being integrated into multiprocessor designs with increasing frequency. A block diagram of such a system is illustrated in FIG. 
1
. There, a plurality of agents 
10
-
40
 are provided in communication with each other over an external bus 
50
. The agents may be processors, cache memories or input/output devices. Data is exchanged among the agents in a bus transaction.
A transaction is a set of bus activities related to a single bus request. For example, in the known Pentium Pro processor, commercially available from Intel Corporation, a transaction proceeds through six phases:
Arbitration, in which an agent becomes the bus owner,
Request, in which a request is made identifying an address,
Error, in which errors in the request phase are identified,
Snoop, in which cache coherency checks are made,
Response, in which the failure or success of the transaction is indicated, and
Data, in which data may be transferred.
Other processors may support transactions in other ways.
In multiple agent systems, the external bus 
50
 may be a pipelined bus. In a pipelined bus, several transactions may progress simultaneously provided the transactions are in mutually different phases. Thus, a first transaction may be started at the arbitration phase while a snoop response of a second transaction is being generated and data is transferred according to a third transaction. However, a given transaction generally does not “pass” another in the pipeline.
Cache coherency is an important feature of a multiple agent system. If an agent is to operate on data, it must confirm that the data it will read is the most current copy of the data that is available. In such multiple agent systems, several agents may operate on data from a single address. Oftentimes when a first agent 
10
 desires to operate on data at an address, a second agent 
30
 may have cached a copy of the data that is more current than the copy resident in an external cache. The first agent 
10
 should read the data from the second agent 
10
 rather than from the external cache 
40
. Without a means to coordinate among agents, an agent 
10
 may perform a data operation on stale data.
In a snoop phase, the agents coordinate to maintain cache coherency. In the snoop phase, each of the other agents 
20
-
40
 reports whether it possesses a copy of the data or whether it possesses a modified (“dirty”) copy of the data at the requested address. In the Pentium Pro, an agent indicates that it possesses a copy of the data by asserting a HIT# pin in a snoop response. It indicates that it possesses a dirty copy of the requested data by asserting a HITM# pin. If dirty data exists, it is more current than the copy in memory. Thus, dirty data will be read by an agent 
10
 from the agent 
20
 possessing the dirty copy. Non-dirty data is read by an agent 
10
 from memory. Only an agent that possesses a copy of data at the requested address drives a snoop response; if an agent does not possess such a copy, it generates no response.
A snoop response is expected from all agents 
10
-
40
 within a predetermined period of time. Occasionally, an agent 
30
 cannot respond to another agent's request before the period closes. When this occurs, the agent 
30
 may generate a “snoop stall response” that indicates that the requesting agent 
10
 must wait beyond the period for snoop results. In the Pentium Pro processor, the snoop stall signal occurs when an agent 
30
 toggles outputs HIT# and HITM# from high to low in unison.
FIG. 2
 illustrates components of a bus sequencing unit (“BSU”) 
100
 and a core 
200
 within a processor 
10
 as are known in the art. The BSU 
100
 manages transaction requests generated within the processor 
10
 and interfaces the processor 
10
 to the external bus 
50
. The core 
200
 executes micro operations (“UOPs”), such as the processing operations that are required to execute software programs.
The BSU 
100
 is populated by a bus sequencing queue 
140
 (“BSQ”), an external bus controller 
150
 (“EBC”), a read line buffer 
160
 and a snoop queue 
170
. The BSQ 
140
 processes requests generated within the processor 
10
 that must be referred to the external bus 
50
 for completion. The EBC 
150
 drives the bus to implement requests. It also monitors transactions initiated by other agents on the external bus 
50
. The snoop queue 
170
 monitors snoop requests made on the external bus 
50
, polls various components within processor 
10
 regarding the snoop request and generates snoop results therefrom. The snoop results indicate whether the responding agent possesses non-dirty data, dirty data or is snoop stalling. Responsive to the snoop results, the EBC 
150
 asserts the result or the external bus.
As noted, the BSQ 
140
, monitors requests generated from within the processor 
10
 to be referred to the external bus 
50
 for execution. An example of one such request is a read of data from external memory to the core 
200
. “Data” may represent either an instruction to be executed by the core or variable data representing data input to such an instruction. The BSQ 
140
 passes the request to the EBC 
150
 to begin a transaction on the external bus 
50
. The BSQ 
140
 includes a buffer memory 
142
 that stores the requests tracked by the BSQ 
140
. The number of registers 
142
a-h 
in memory 
142
 determines how many transactions the BSQ 
140
 may track simultaneously.
The EBC 
150
 tracks activity on the external bus 
50
. It includes a pin controller 
152
 that may drive data on the external bus 
50
. It includes an in-order queue 
154
 that stores data that is asserted on the bus at certain events. For example, snoop results to be asserted on the bus during a snoop phase may be stored in the in-order queue 
154
. The EBC 
150
 interfaces with the snoop queue 
170
 and BSQ 
140
 to accumulate data to be asserted on the external bus 
50
.
During the data phase of a transaction, data is read from the external bus 
50
 into the read line buffer 
160
. The read line buffer 
160
 is an intermediate storage buffer, having a memory 
162
 populated by its own number of registers 
162
a-h. 
The read line buffer 
160
 provides for storage of data read from the external bus 
50
. The read line buffer 
160
 stores the data only temporarily; it is routed to another destination such as a cache 
180
 in the BSU 
100
, a data cache 
210
 in the core or an instruction cache 
220
 in the core. Data read into a read line buffer storage entry 
162
a 
is cleared when its destination becomes available.
There is a one-to-one correspondence between read line buffer entries 
162
a-h 
and BSQ buffer entries 
140
a-h. 
Thus, data from a request buffered in BSQ entry 
142
a 
will be read into buffer entry 
162
a. 
For each request buffered in BSQ buffer 
142
, data associated with the request is buffered in the buffer memory 
162
 in the read line buffer 
162
.
The one to one correspondence between the depth of the BSQ buffer 
142
 and the read line buffer 
160
 is inefficient. Read line buffer utilization is very low. The read line buffer 
160
 operates at a data rate associated with the BSU 
100
 and the core 
200
 which is much higher than a data rate of the external bus 
50
. Thus, data is likely to be read out of the read line buffer 
160
 faster than the bus 
50
 can provide data to it. The one to one correspondence of BSQ buffer entries to the read line buffer entries is unnecessary. Also, the read line buffer storage entries 
162
a-h 
consume a significant amount of area when the processor is fabricated as an integrated circuit.
It is desired to increase the depth of buffers in the BSQ 
140
. In the future, latency between the request phase and the data phase of transactions on the external bus 
50
 is expected to increase. External buses 
50
 will become more pipelined
Bachand Derek T.
Fisch Matthew A.
Hill David L.
Prudvi Chinna
Ellis Kevin L.
Intel Corporation
Kenyon & Kenyon
LandOfFree
Read line buffer and signaling protocol for processor does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Read line buffer and signaling protocol for processor, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Read line buffer and signaling protocol for processor will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2545121