Electrical computers and digital processing systems: memory – Address formation – Address mapping
Reexamination Certificate
1998-09-30
2002-02-05
Kim, Matthew (Department: 2186)
Electrical computers and digital processing systems: memory
Address formation
Address mapping
C711S207000, C711S219000
Reexamination Certificate
active
06345352
ABSTRACT:
BACKGROUND
The present invention relates generally to multiprocessor systems and, more particularly, to systems and techniques for maintaining translation lookaside buffers (TLBs) in multiprocessor systems.
As the performance demands on personal computers continue to increase at a meteoric pace, processors have been developed which operate at higher and higher clock speeds. The instruction sets used to control these processors have been pared down (e.g., RISC architecture) to make them more efficient. Processor improvements alone, however, are insufficient to provide the greater performance required by computer users. The other computer subsystems which support the processor, e.g., interconnects, I/O devices and memory devices, must also be designed to operate at higher speeds and support greater bandwidth. In addition to improved performance, cost has always been an issue with computer users. Thus, system designers are faced with the dual challenges of improving performance while remaining competitive on a cost basis.
Early personal computers typically included a central processing unit (CPU), some type of memory and one or more input/output (I/O) devices. One of the common cost/performance design tradeoffs referred to above involves the consideration of how much main memory to provide to a computer. Considering current consumer desire for multimedia applications, many personal computers are designed with large amounts of main memory, e.g., 32 MB RAM. However, RAM chips are expensive and, therefore, techniques have been developed to obtain greater performance from a given memory capacity.
One such technique, which is well known to those skilled in the art, is the use of virtual memory. Virtual memory is based on the concept that, when running a program, the entire program need not be loaded into main memory at one time. Instead, the computer's operating system loads sections of the program into main memory from a secondary storage device (e.g., a hard disk drive) as needed for execution. To make this scheme viable, the operating system maintains tables which keep track of where each section of the program resides in main memory and secondary storage. As a result of executing a program in this way, the program's logical addresses no longer correspond to physical addresses in main memory. To handle this situation the CPU maps the program's effective or virtual addresses into their corresponding physical addresses.
The sections of the program which are manipulated by the CPU in the manner described above are commonly referred to as “pages”. As part of the mapping process, the CPU maintains a page table which contains various information associated with the program's pages. For example, a page table entry can contain a validity bit, which indicates whether the page associated with this particular entry is currently stored in main memory, and a dirty bit which indicates whether the program has modified the page.
Many systems store the page table in main memory. Thus, accessing a page potentially requires two main memory accesses: a first to determine the location of a particular page and a second to access that page. To reduce the overhead associated with this activity, some systems provide a special cache memory, known as a translation lookaside buffer (TLB), which holds page table entries for the most recently accessed pages that are currently stored in main memory. The CPU forwards virtual addresses to the TLB which produces a physical page location indication if it holds an entry for the page of interest. Otherwise, the CPU consults the page table in main memory to obtain access information for this page. When a page is removed from main memory, for example, a TLB entry (if one exists) associated with that page is purged.
The advent of multiprocessor architectures for personal computers is a recent trend in the design of these systems, intended to satisfy consumers' demand for ever faster and more powerful personal computers. In a typical multiprocessor computer system each of the processors may share one or more resources. Note, for example, the multiprocessor system depicted in FIG.
1
. Therein, an exemplary multiprocessor system
5
is illustrated having seven nodes including a first CPU
10
, a bridge
12
for connecting the system
5
to other I/O devices
13
, first and second memory devices
14
and
16
, a frame buffer
18
for supplying information to a monitor, a direct memory access (DMA) device
20
for communicating with a storage device or a network and a second CPU
22
having an SRAM device
24
connected thereto. According to the conventional paradigm, these nodes would be interconnected by a bus
26
. Caches can be provided as shown to isolate some of the devices from the bus and to merge plural, small bus accesses into larger, cache-line sized accesses.
As multiprocessor systems grow more complex, i.e., are designed with more and more nodes, adapting the bus-type interconnect to handle the increased complexity becomes problematic. For example, capacitive loading associated with the conductive traces on the motherboard which form the bus becomes a limiting factor with respect to the speed at which the bus can be driven. Thus, an alternative interconnect architecture is desirable.
One type of proposed interconnect architecture for multiprocessor personal computer systems replaces the bus with a plurality of unidirectional point-to-point links and uses packet data techniques to transfer information. FIGS.
2
(
a
) and
2
(
b
) conceptualize the difference. FIG.
2
(
a
) depicts four of the nodes from
FIG. 1
interconnected via a conventional bus. FIG.
2
(
b
) illustrates the same four nodes interconnected via unidirectional point-to-point links
30
,
32
,
34
and
36
. These links can be used to provide bus-like functionality by connecting the links into a ring (which structure is sometimes referred to herein as a “ringlet”) and having each node pass-through packets addressed to other nodes. Ringlets overcome the aforementioned drawback of conventional bus-type interconnects since their individual links can be clocked at high speeds regardless of the total number of nodes which are linked together.
Like single processor systems, multiprocessor systems can use virtual memory techniques to enhance memory performance. Thus, each processor in the multiprocessor system may have its own TLB, which creates the potential for noncoherency between the various TLB caches. For example, if the first CPU
10
changes an entry, e.g., marks that entry invalid or changes a page address, in its TLB (not shown in FIG.
1
), then it would be desirable to update the corresponding entry in the TLB of the second CPU
22
.
Conventionally, multiprocessor systems have accomplished this task by broadcasting special TLB-purge instructions on the device interconnect which identify the virtual address that should be invalidated. This conventional mechanism for maintaining coherence between the various processors in a multiprocessor system has several drawbacks. For example, the broadcast TLB solution lacks robustness since no positive feedback is provided by the recipient CPUs that the TLB purge was received and performed. More specifically, these conventional solutions simply provided the recipient CPUs with a “wired-OR” busy signal line that was driven when the CPU was busy. If the broadcasting CPU didn't see a busy signal, it presumed that the TLB purge was received and performed, which assumption may be inaccurate.
A second drawback associated with these conventional TLB-purge solutions involves the manner in which read/write dependencies are handled, particularly in conjunction with bridges between different systems. Consider the situation where, for example, a CPU has a pending read transaction at the time that the TLB-purge command is broadcast. In this situation, the recipient CPU will assert a busy signal and complete its read transaction, whereupon the TLB-purge command is rebroadcast. This functionality becomes more complicated where the TLB-pu
James David V.
North Donald N.
Anderson Matthew D.
Apple Computer Inc.
Burns Doane Swecker & Mathis L.L.P.
Kim Matthew
LandOfFree
Method and system for supporting multiprocessor TLB-purge... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for supporting multiprocessor TLB-purge..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for supporting multiprocessor TLB-purge... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2980609