Electrical computers and digital processing systems: memory – Address formation – Address mapping
Reexamination Certificate
1997-10-01
2002-06-25
Yoo, Do Hyun (Department: 2187)
Electrical computers and digital processing systems: memory
Address formation
Address mapping
C711S203000, C711S206000, C711S207000
Reexamination Certificate
active
06412056
ABSTRACT:
FIELD OF THE INVENTION
The present invention generally relates to computer systems and methods and, more particularly, to computer systems and methods for memory state checking.
BACKGROUND OF THE INVENTION
Distributed computer systems typically comprise multiple computers connected to each other by a communications network. In some distributed computer systems, the networked computers can concurrently access shared information, such as data and instructions. Such systems are sometimes known as parallel computers. If a large number of computers are networked, the distributed system is considered to be “massively” parallel. As an advantage, massively parallel computers can solve complex computational problems in a reasonable amount of time.
In such systems, the memories of the computers are collectively known as a distributed shared memory. It is a problem to ensure that the information stored in the distributed shared memory are accessed in a coherent manner. Coherency, in part, means that only one computer can modify any part of the data at any one time, otherwise the state of the information would be nondeterministic.
FIG. 1
shows a typical distributed shared memory system
100
including a plurality of computers
110
. Each computer
110
includes a uniprocessor
101
, a memory
102
, and input/output (I/O) interfaces
103
connected to each other by a bus
104
. The computers are connected to each other by a network
120
. Network
120
may be a local area network, a wide area network, to or a nationwide or international data transmission network, or the like, such as the Internet. Here, the memories
102
of the computers
110
constitute the shared memory.
Some distributed computer systems maintain data coherency using specialized control hardware. The control hardware may require modifications to the components of the system such as the processors, their caches, memories, buses, and the network. In many cases, the individual computers may need to be identical or similar in design, e.g., homogeneous.
As a result, hardware controlled shared memories are generally costly to implement. In addition, such systems may be difficult to scale. Scaling means that the same design can be used to conveniently build smaller or larger systems.
More recently, shared memory distributed systems have been configured using conventional workstations or PCS (Personal Communication System) connected by a conventional network as a heterogeneous distributed system. Shared memory distributed systems have also been configured as a cluster of symmetric multiprocessors (SMP).
In most existing distributed shared memory systems, logic of the virtual memory (paging) hardware typically signals if a process is attempting to access shared information which is not stored in the memory of the local SMP or local computer on which the process is executing. In the case where the information is not available locally, the functions of the page fault handlers are replaced by software routines which communicate messages with processes on remote processors.
With this approach, the main problem is that data coherency can only be provided at large (coarse) sized quantities, because typical virtual memory page units are 4K or 8K bytes. This size may be inconsistent with the much smaller sized data units accessed by many processes, for example, 32, 64 or 128 bytes. Having coarse page sized granularity increases network traffic and can degrade system performance.
In existing software distributed shared memory systems, fine grain information access and coherency control are typically provided by software-implemented message passing protocols. The protocols define how fixed size information blocks and coherency control information is communicated over the network. Procedures which activate the protocols can be called by “miss check code.” The miss check code is added to the programs by an automated process.
States of the shared data can be maintained in state tables stored in memories of each processor or workstation. Prior to executing an access instruction, e.g., a load or a store instruction, the state table is examined by the miss check code to determine if the access is valid. If the access is valid, then the access instruction can execute, otherwise the protocols define the actions to be taken before the access instruction is executed. The actions can be performed by protocol functions called by the miss handling code.
The calls to the miss handling code can be inserted into the programs before every access instruction by an automated process known as instrumentation. Instrumentation can be performed on executable images of the programs.
FIG. 2
shows an example miss check code
200
for a program which is to execute on a RISC (Reduced Instruction Set Computer) type of computer. In this implementation, all of the memories of the distributed computers are partitioned so that the addresses of the shared memory are always higher than the addresses of the non-shared memory. In addition, the implementation maintains coherency state information for fixed size quantities of information, for example, “blocks” or “lines.” Obviously, the fixed size or granularity of the blocks used by any particular application can be set to be smaller or larger than 128 bytes. Partitioning the addresses of shared memory, and using fixed blocks simplifies the miss check code, thereby reducing overhead. First, in step
201
, the content of any registers that are going to be used by the miss check code
200
on a stack is saved. In step
202
, the target address of the access instruction, using the offset and base as specified in the operands of the instruction is determined. The access instruction in this example is a store. A store access is valid if the processor modifying the data stored at the target address has exclusive ownership of the data.
In steps
203
-
204
, a determination as to whether the target address is in non-shared memory is made. If the target address is in the non-shared memory, the rest of miss check code
200
is skipped, the registers are restored in step
231
, and the memory access instruction is executed in step
232
. In this case, the overhead is about seven instructions.
Otherwise, if the target address is in shared memory, then in step
205
, the index of the line including the target address is determined. If the size of the block is an integer power of two, for example 128 bytes, the block index can be computed using a simple shift instruction.
As shown in step
206
, the block index can be used to reference the corresponding entry of the state table. In the exemplary implementation, each entry in the state table is a byte. Obviously, if the number of different states is small, for example, the states can be indicated with two bits, then the size of the state table can be reduced. However, by making the entries smaller, it becomes more difficult to extract state information, since most computers do not conveniently deal with addressing schemes and data operations which are less than eight bits.
In step
207
-
208
, the table entry is loaded, and in step
209
, a determination is made as to whether the state of the block containing the target address is, for example, EXCLUSIVE. If the state of the block containing the target address is EXCLUSIVE, for example, the method skips step
220
, and the registers from the stack are restored in step
231
. In this case, the overhead is about 13 instructions. Otherwise, the miss handling code is called to gain exclusive control over the data in step
220
.
As indicated above, code is inserted in the application executable to intercept each load and store of information to see if the information is available locally, is read only or is readable and writable. Numerous techniques have been employed to reduce the runtime overhead of these software checks as much as possible. Nonetheless, such systems still experience a 10-40% overhead due to these software state checks.
Thus, what is needed is a simple hardware support in a software distributed shared memory system that eff
Gharachorloo Kourosh
Scales Daniel J.
Compac Information Technologies Group, LP
Encarnacion Yamir
Pennie & Edmonds LLP
Yoo Do Hyun
LandOfFree
Extended translation lookaside buffer with fine-grain state... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Extended translation lookaside buffer with fine-grain state..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Extended translation lookaside buffer with fine-grain state... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2963999