Non-coherent cache buffer for read accesses to system memory

Electrical computers and digital data processing systems: input/ – Input/output data processing – Input/output data buffering

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C710S007000, C710S039000, C712S225000, C711S141000, C711S146000

Reexamination Certificate

active

06564272

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to accesses to memory by input/output devices and more particularly to providing more efficient accesses to system memory.
2. Description of the Related Art
Computer systems rely on complex logic to communicate data within and between its various subsystems. Referring to
FIG. 1
, which illustrates relevant aspects of an exemplary prior art computer system, memory controller
10
resides on an integrated circuit
12
coupled between main memory
30
and a central processing unit (CPU) (not shown). Integrated circuit
12
typically includes an input/output (I/O) controller circuit
20
that interfaces to an I/O channel such as the Peripheral Component Interconnect (PCI) bus. The memory controller controls access to the main memory by various components of the computer system. For example, read requests from the I/O channel are provided to the memory controller circuit
10
which then accesses memory
30
to retrieve the requested data.
Note that in addition, to main memory, “system memory” in computer systems may also include cache memory. It is fairly typical for the CPU to have access to one or two levels of cache memory which provide the CPU local copies of data stored in the main memory. The availability of the local copies of data (and instructions) speeds up memory access by the CPU since the CPU only has to access the local copy rather than main memory. A variety of techniques known in the art maintain coherency between the cache memories and the main memory. When a read request is received over the I/O channel for data in main memory, the cache memories are “snooped” to determine if an updated version of the data is available. The most up to date data is then provided in response to the read request.
The access to main memory is an important aspect of the computer system that requires a high level of efficiency to ensure good system performance. One problem with main memory is that it is almost always dynamic random access memory (DRAM), and read accesses to this type of memory require a primary slow read access cycle, where the entire memory ‘page’ is accessed, and then after that period (but before any other page is accessed) additional read accesses can occur very quickly as long as they are all addressed in that existing ‘page’ (where page size is DRAM-type and size dependent). That fact encourages designs that read as much data from a DRAM as possible at one time, in a contiguous burst to reduce the average word access time.
Many CPU and CPU interface chips utilize this block read process to retrieve instructions for processors to execute, because most modern processors retrieve instructions in blocks as described above. Such accesses for blocks of instructions take a longer time (page access time) to access the first instruction in the block, e.g., several bus clock cycles, but then the remaining accesses are each very quick, taking only, e.g., one additional clock per access. That feature is further used by current cache controller designs that access memory only in cache ‘line’ sizes which are groups of data sized to the operation of the internal processor cache.
Because accesses to main memory can be of different types and sizes, some individual accesses and some block accesses as described above, the memory controller tailors its access to the type of request. It would be inefficient to read an entire cache line worth of memory (or even more), if only the first word was needed. Such an extraneous access, even if seemingly an efficient way to retrieve data, actually wastes memory and bus bandwidth. Also, non-processor accesses, from mastering devices on external busses, such as the PCI bus, Advanced Graphics Port (AGP) and Industry Standard Architecture (ISA), as well as the DMA (direct memory access) controller in personal computer (PC) systems, access memory in different ways. Accordingly, memory controllers have typically accepted non-CPU accesses to memory as either all individual accesses or as all block accesses. If the memory controller treats non-CPU accesses as all individual accesses, each access requires a re-issuance of each individual memory request. If the memory treats all non-CPU accesses as block accesses then the memory controller may end up reading more data than is needed, thus wasting bandwidth and memory resource.
When certain I/O devices require access to main memory, especially when reading, their accesses are mostly to sequential words in memory. Slower I/O devices may frequently read sequential words of data but read them one at a time. Such slower I/O devices issue a new read request for each sequential word of read data requested by the I/O device. Such slower I/O based accesses therefore always require a ‘new’ access by the memory controller to service the I/O controller request (irrespective of the memory-access policy of the memory controller). To service such memory requests, the memory controller accesses the memory using an initial access type (primary slow read access) and then continues to use the slow initial access type for each subsequent read request even though each read cycle may access the same DRAM page. The re-requesting of the same memory page is costly and robs bandwidth from other peripherals and the CPU.
Additionally, since most memory controllers are already optimized to read a full CPU cache line of data (typically four 32-bit words) at once, the additional time used to access a full CPU cache line is further wasted for each single word read request, as the I/O device uses only the single word of data, and a whole cache line (maybe the same cache line) is re-requested for each of the contiguous words of data requested by the I/O device. Thus, reading eight sequential words may require reading the same cache line eight times.
It would therefore be desirable to have an approach for I/O accesses that accounts for the type of I/O access being made (block or single word) and makes efficient use of data that has already been accessed.
SUMMARY OF THE INVENTION
In one embodiment, the invention provides an integrated circuit coupled between system memory of a computer system and an input/output channel such as the PCI bus. The integrated circuit includes an input/output request circuit that receives a read request for data in system memory from an input/output device over the input/output channel. A read ahead buffer which is coupled to the input/output request circuit, stores data from a previous read access to system memory. The input/output request circuit is coupled to selectively provide data from either the system memory or the read ahead buffer in response to the read request. The read ahead buffer is maintained as non coherent memory with respect to system memory.
In another embodiment, a method of operating a computer system is provided. The computer system includes a read ahead buffer coupled to a memory controller and an input/output controller coupled to an input/output channel. An I/O device provides an initial read request over the input/output channel which specifies an address in system memory. The memory controller retrieves an amount of data from system memory larger than specified by the read request and provides the requested data to the input/output channel and thus to the I/O device. At least a portion of the data retrieved from system memory is stored in the read ahead buffer. The read ahead buffer is marked as valid and identified by at least a portion of the address specified in the read request. When the same I/O device performs a subsequent read access, the I/O request circuit determines whether at least a portion of the address of the subsequent read request matches the portion of the address identifying the read ahead buffer and provides a tag match signal as an indication thereof. Data is then selectively provided from either the read ahead buffer or system memory to the input/output device in response to the second read request according to the tag match signal and the valid indication.


REFERENCES:
patent: 5157

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Non-coherent cache buffer for read accesses to system memory does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Non-coherent cache buffer for read accesses to system memory, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Non-coherent cache buffer for read accesses to system memory will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3071279

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.