Electrical computers and digital processing systems: memory – Storage accessing and control – Control technique
Reexamination Certificate
2001-06-14
2004-03-16
Gossage, Glenn (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Control technique
C711S155000, C710S052000
Reexamination Certificate
active
06708258
ABSTRACT:
FIELD OF THE INVENTION
This invention relates generally to nodes of computer networks and, more specifically, to improving the efficiency of storing packets in the nodes' computer memories by eliminating the need for additional read-modify-write (RMW) operations.
BACKGROUND OF THE INVENTION
A computer network is a geographically distributed collection of interconnected subnetworks for transporting data between nodes, such as intermediate nodes and end nodes. A local area network (LAN) is an example of such a subnetwork; a plurality of LANs may be further interconnected by an intermediate network node, such as a router or switch, to extend the effective “size” of the computer network and increase the number of communicating nodes. Examples of the end nodes may include servers and personal computers. The nodes typically communicate by exchanging discrete frames or packets of data according to predefined protocols. In this context, a protocol consists of a set of rules defining how the nodes interact with each other.
Each node typically comprises a number of basic subsystems including a processor, a main memory and an input/output (I/O) subsystem. Data is transferred between the main memory (“system memory”) and processor subsystem over a memory bus, and between the processor and I/O subsystems over a system bus. Examples of the system bus may include the conventional lightning data transport (or hyper transport) bus and the conventional peripheral component [computer] interconnect (PCI) bus. The processor subsystem may comprise a single-chip processor and system controller device that incorporates a set of functions including a system memory controller, support for one or more system buses and direct memory access (DMA) engines. In general, the single-chip device is designed for general-purpose use and is not heavily optimized for networking applications.
In a typical networking application developed using the single-chip device, packets are received from a framer, such as an Ethernet media access control (MAC) controller, of the I/O subsystem attached to the system bus. A DMA engine in the MAC controller is provided a list of addresses (e.g., in the form of a descriptor ring in a system memory) for buffers it may access in the system memory. As each packet is received at the MAC controller, the DMA engine obtains ownership of (“masters”) the system bus to access a next descriptor ring to obtain a next buffer address in the system memory at which it may, e.g., store (“write”) data contained in the packet. The DMA engine may need to issue many write operations over the system bus to transfer all of the packet data.
For example, assume the system memory comprises double data rate (DDR) synchronous dynamic random access memory (SDRAM) devices and that a portion of the memory is organized into packet buffers at system initialization. These buffers are defined to start on certain binary boundaries (e.g., 32 bytes) in order to take advantage of the system bus burst size, bus alignment and cache line size. Assume also that the system (PCI) bus has a width of 32 (or 64) bits and system memory accessed over the bus is also 32 (or 64, respectively) bits wide thereby matching the bus. Moreover, each access (transfer) of data over the system bus to the memory comprises two cycles (or four half cycles). Therefore, at 32 (64) bits per half cycle, the minimum transfer size of packet data over the system bus is 16 (32) bytes (that is 4 bytes per half cycle times 4 half cycles equals 16 bytes).
Continuing the same example, in order to write 40 bytes of data to an address, e.g., 0x100038 in system memory, 10 half cycles are needed over a 32-bit (4 byte wide) system bus. However, to write 40 data bytes to system memory address 0x100037, 11 half cycles would be required (i.e., the first half cycle with 1 byte of data, 9 half cycles with 4 bytes and the last half cycle with 3 bytes). By aligning the buffer start address with the width of the system bus, efficient use of that bus is ensured. This same efficiency carries over to the actual system memory interface, where the data can be written into the system memory using the fewest cycles if the start of the buffer matches the granularity of accesses to the system memory. From a simplistic point of view with respect to the system memory, if a memory line is 32 bits or 4 bytes wide (herein a memory “line” includes the typical data storage word and any error correcting code extension to that word, if any), usually the entire line must be fetched in order to over write only one byte while preserving the other three bytes. So, in the above example [the system] , to store the 40 bytes, the system will write to 10 full width lines when starting at address 0x100038 because the memory line width and the message length match 10 complete, full memory lines exactly. However, if the starting address is 0x100037, the first byte of the 40 will be stored in the last byte of a 4 byte wide first line, and the last three bytes of the 40 being stored will be stored in the first three bytes of the 4 byte wide eleventh line. Now if the last three bytes were modified, the entire four bytes must be fetched and the first three bytes modified while keeping the last byte intact (assuming it is part of another message, etc.); that is, a read-modify-write (RMW) operation must be used for changing the last three bytes. A similar operation must be used to modify the first byte of the stored 40-byte message. In this case the inefficiencies of the non-aligned memory are seen in the need to access eleven, rather than ten bytes, and the need for read/modify/write operations, rather than simple write operations.
The present invention becomes even more important for large packet memories incorporating error correction codes (ECC). In these memories, it is not feasible to provide byte-write capability since the ECC covers the entire widths of the memories. For example, assume the system memory, including a system memory bus interface, is arranged to accommodate a 64-bit memory “line” width. Eight (8) additional bits are needed for ECC computation by the system memory controller such that the memory and memory bus interface are organized and aligned on 72-bit line widths. Therefore, a non-aligned start address not only could cause an extra write cycle but also the inefficient RMW operation discussed herein.
By starting the packet buffers on appropriate binary boundaries, the inefficiencies of writing packet data to the beginning of the buffers are avoided. However, there is no equivalent “work around” in conventional systems when writing the end of the packet buffers in system memory. For example, if the effective memory width is 8 bytes and the length of a packet is 63 bytes, the last transfer of the packet over the system bus requires that only 7 bytes be written to the appropriate packet buffer. As noted, the processor and system controller device is general-purpose and, accordingly, does not “know” that the portion of memory is reserved solely for packet buffers. Therefore, the processor and system controller device strictly interpret the system bus operation using a RMW operation to preserve the one byte location of the buffer that was not written with the packet data, rather than “padding out” (e.g., writing null values) to that location. This represents an inefficient use of system resources and the present invention is directed to a technique that improves the efficiency of such resources
As noted, the RMW operation is quite expensive and consumes substantial over-head with respect to “turning around” the memory bus when writing data into an allocated buffer. That is, not only does the RMW operation double the traffic over the memory bus (by both reading and writing the data block), it also consumes overhead with respect to gaining access/ownership of the memory bus in order to avoid collisions over that bus. Therefore, not only is the operation expensive in terms of resource consumption, but it also adversely (and substantially) impacts throughput over
Garner Trevor
Potter Kenneth H.
Cesari and McKenna LLP
Cisco Technology Inc.
Gossage Glenn
LandOfFree
Computer system for eliminating memory read-modify-write... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Computer system for eliminating memory read-modify-write..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computer system for eliminating memory read-modify-write... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3271337