Data transfer with highly granular cacheability control...

Electrical computers and digital processing systems: memory – Storage accessing and control – Control technique

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S129000, C711S154000, C345S562000, C345S537000

Reexamination Certificate

active

06598136

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention generally relates to data movement in a computer, and more particularly to a system and method of moving data to and from portions of memory with cacheability being controllable on an individual operational basis.
2. Description of Related Art
Reference is made to
FIG. 1
which depicts a typical personnel computer (PC) system with an x86 architecture for displaying graphics. A central processing unit (CPU)
50
having multiple registers (e.g. CS, DS, ES . . . ECX, EDI, ESI) is coupled through a CPU bus
52
to a memory controller
54
. The memory controller
54
is coupled to system memory
56
, typically DRAM, and to a relatively fast local or “mezzanine” bus
58
, typically having a protocol in accordance with the Video Electronics Standards Association VL-bus or with the Peripheral Component Interconnect (PCI) bus. The local bus
58
is coupled to a relatively slow Industry Standard Architecture (ISA) bus
60
through a bus converter
62
.
The local bus
58
couples a graphics adapter card
64
to the memory controller
54
and to the bus converter
62
. The location and color for each pixel displayed on display
66
is stored in a frame buffer memory
68
on the graphics adapter card
64
. A RAMDAC
70
on the graphics adapter card
64
converts the data stored in the frame buffer memory
68
to analog signals to drive the display
66
which is typically a cathode ray tube (CRT) or a liquid crystal display (LCD). Each time a change is made in the graphics on display
66
, the location and color for each pixel must be recalculated and stored in the frame buffer memory
68
.
The CPU
50
typically calculates the location and color definition of each changed pixel and sends the resulting information across the local bus
58
to the frame buffer memory
68
on the graphics adapter card
64
. Alternatively, a graphics accelerator
72
reduces the burden from the CPU
50
by receiving certain graphic calls (e.g. fills and line draws) through a graphics driver executed by the CPU
50
, to calculate the changes in the pixels and to fill the frame buffer memory
68
with updated graphics data.
The so-called BitBlt graphic call (“bit blit”) performs an operation by transferring blocks of graphics data from: system memory
56
to frame buffer memory
68
, frame buffer memory
68
to system memory
56
, and between different portions within the frame buffer memory
68
. The graphics accelerator
72
can effectively handle the BitBlt operation to the extent that data is already stored in the frame buffer memory
68
and the destination is also in the frame buffer memory
68
. The CPU
50
however, must still be involved to provide privilege and protection checks if the BitBlt operation requires bitmapped images to be moved from external system memory
56
to the frame buffer memory
68
and from the frame buffer memory
68
to the external system memory
56
. The CPU
50
typically handles this through recursive steps, which in x86 architecture parlance, is often a repeat move string instruction of the form:
REP MOVS [ESI (source address), EDI (destination address)] wherein a number of bytes, words, or Dwords of data specified by the ECX register starting at an address pointed to by ESI are moved to a block of memory pointed to by EDI.
The required intervention by the CPU
50
has a large latency associated with it since data must be read from the system memory
56
through the memory controller
54
over the CPU bus
52
into the internal registers of the CPU
50
. The CPU
50
must then turnaround and write the data from its registers over the CPU bus
52
through the memory controller
54
onto the local bus
58
to the frame buffer memory
68
on the graphics adapter card
64
. Likewise, data must be read from frame buffer memory
68
on the graphics adapter card
64
through the memory controller
54
over the CPU bus
52
into the internal registers of the CPU
50
. The CPU
50
must then turnaround and write the data from its registers over the CPU bus
52
through the memory controller
54
to the system memory
56
.
The process just described is further complicated by the use of a cache
74
. As a way of background, a cache
74
, simply put, is a relatively small but fast-access buffer area wherein a copy of previously accessed data, typically spatially or temporally related, is held in hope that subsequent accesses will benefit from the spatial or temporal locality. In other words, the intent of the cache
74
is to reduce the latency associated with data accesses normally made to slow memory by keeping a copy of most recent data readily available. However in the case of reading bitmapped data from system memory
56
to update the display
66
, a cache
74
is not significantly advantageous and in fact, can actually hinder performance. To this end, the amount of display information which updates the display is overwhelming compared to the size of the cache
74
and caching the display information has little, if any, impact on performance. More importantly however, by caching the display information, valuable instructions and data are evicted from the cache
74
requiring longer access times to retrieve them from secondary cache or main memory.
As a way of further background, known ways under the x86 architecture to designate data as non-cacheable include non-assertion of the cache enable (KEN# pin) by chipset logic circuitry or by setting a page cache disable (PCD) bit in the directory and page table entries (DTE and PTE). A drawback with using the KEN# pin is that it requires external chipset logic circuitry to determine cacheability. A drawback with using the PCD bit is that the finest gradation of cacheability is made on a page-by-page basis.
In a related, but not entirely relevant technique, direct memory access (DMA) transfers are known which can move the contents of one memory block directly to the contents of another memory block without substantial intervention by the CPU
50
. However, these DMA techniques are ineffective, inter alia, for systems having protection or privilege check mechanisms.
Accordingly there is a need for a system and a method of cacheability control on an individual operational basis, for moving data from a first block of memory to a second block of memory, in a system having protection and privilege check mechanisms, without substantial CPU intervention, without long bus turnaround time, and without polluting the cache.
SUMMARY OF THE INVENTION
To overcome the limitations of the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method in a processing system having a cache, of transferring blocks of data from a first block of memory to a second block of memory, employing signaling from a CPU core responsive to execution of a predetermined instruction, so that data is transferred directly from the first block of memory to the second block of memory without polluting the cache. The second block of memory is typically scratchpad memory which is preferably, although not exclusively, a partitionable area of the cache. While a destination address is preferably generated from a programmable address register provided as part of control circuitry in the scratchpad memory, it is contemplated that an instruction in accordance with the present invention, could also directly specify a destination address.
A feature of the present invention is transferring data from system memory to scratchpad memory without substantial CPU intervention while maintaining protection and privilege check mechanisms for memory address calculations.
Another feature of the present invention is transferring data from system memory to a scratchpad memory in large blocks to reduce the number of bus turnarounds while maintaining byte granularity addressability.
Another feature of the present invention is transferring data from system memory to scratchpad memory in a system

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Data transfer with highly granular cacheability control... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Data transfer with highly granular cacheability control..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data transfer with highly granular cacheability control... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3087701

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.