Electrical computers and digital processing systems: memory – Storage accessing and control – Shared memory area
Reexamination Certificate
2001-07-30
2003-07-22
Kim, Hong (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Shared memory area
C711S170000, C711S202000, C709S214000
Reexamination Certificate
active
06598130
ABSTRACT:
FIELD OF THE INVENTION
The present invention pertains to memory management and utilization in large scale computing systems and, more particularly, to an improved technique for referencing distributed shared memory.
DESCRIPTION OF THE RELATED ART
Even as the power of computers continues to increase, so does the demand for ever greater computational power. In digital computing's early days, a single computer comprising a single central processing unit (“CPU”) executed a single program. Programming languages, even those in wide use today, were designed in this era, and generally specify the behavior of only a single “thread” of computational instructions. Computer engineers eventually realized that many large, complex programs typically could be broken into pieces that could be executed independently of each other under certain circumstances. This meant they could be executed simultaneously, or “in parallel.”
Thus, the advent of parallel computing. Parallel computing typically involves breaking a program into several independent pieces, or “threads,” that are executed independently on separate CPUs. Parallel computing is sometimes therefore referred to as “multiprocessing” since multiple processors are used. By allowing many different processors to execute different processes or threads of a given application program simultaneously, the execution speed of that application program may be greatly increased.
In the most general sense, multiprocessing is defined as the use of multiple processors to perform computing tasks. The term could apply to a set of networked computers in different locations, or to a single system containing several processors. As is well-known, however, the term is most often used to describe an architecture where two or more linked processors are contained in a single enclosure. Further, multiprocessing does not occur just because multiple processors are present. For example, having a stack of PCs in a rack serving different tasks, is not multiprocessing. Similarly, a server with one or more “standby” processors is not multiprocessing, either. The term “multiprocessing”, therefore, applies only when two or more processors are working in a cooperative fashion on a task or set of tasks.
In theory, the performance of a multiprocessing system could be improved by simply increasing the number of processors in the multi-processing system. In reality, the continued addition of processors past a certain saturation point serves merely to increase communication bottlenecks and thereby limit the overall performance of the system. Thus, although conceptually simple, the implementation of a parallel computing system is in fact very complicated, involving tradeoffs among single-processor performance, processor-to-processor communication performance, ease of application programming, and managing costs. Conventionally, a multiprocessing system is a computer system that has more than one processor, and that is typically designed for high-end workstations or file server usage. Such a system may include a high-performance bus, huge quantities of error-correcting memory, redundant array of inexpensive disk (“RAID”) drive systems, advanced system architectures that reduce bottlenecks, and redundant features such as multiple power supplies.
Parallel computing embraces a number of computing techniques that can be generally referred to as “multiprocessing” techniques. There are many variations on the basic theme of multiprocessing. In general, the differences are related to how independently the various processors operate and how the workload among these processors is distributed.
Two common multiprocessing techniques are symmetric multiprocessing systems (“SMP”) and distributed memory systems. One characteristic distinguishing the two lies in the use of memory. In an SMP system, at least some portion of the high-speed electronic memory may be accessed, i.e., shared, by all the CPUs in the system. In a distributed memory system, none of the electronic memory is shared among the processors. In other words, each processor has direct access only to its own associated fast electronic memory, and must make requests to access memory associated with any other processor using some kind of electronic interconnection scheme involving the use of a software protocol. There are also some “hybrid” multiprocessing systems that try to take advantage of both SMP and distributed memory systems.
SMPs can be much faster, but at higher cost, and cannot practically be built to contain more than a modest number of CPUs, e.g, a few tens. Distributed memory systems can be cheaper, and scaled arbitrarily, but the program performance can be severely limited by the performance of the interconnect employed, since it (for example, Ethernet) can be several orders of magnitude slower than access to local memory.) Hybrid systems are the fastest overall multiprocessor systems available on the market currently. Consequently, the problem of how to expose the maximum available performance to the applications programmer is an interesting and challenging exercise. This problem is exacerbated by the fact that most parallel programming applications are developed for either pure SMP systems, exploiting, for example, the “OpenMP” (“OMP”) programming model, or for pure distributed memory systems, for example, the Message Passing Interface (“MPI”) programming model.
However, even hybrid multiprocessing systems have drawbacks and one significant drawback lies in bottlenecks encountered in retrieving data. In a hybrid system, multiple CPUs are usually grouped, or “clustered,” into nodes. These nodes are referred to as SMP nodes. Each SMP node includes some private memory for the CPUs in that node. The shared memory is distributed across the SMP nodes, with each SMP node including at least some of the shared memory. The shared memory within a particular node is “local” to the CPUs within that node and “remote” to the CPUs in the other nodes. Because of the hardware involved and the way it operates, data transfer between a CPU and the local memory can be 10 to 100 times faster than the data transfer rates between the CPU and the remote memory.
This performance problem is exacerbated by the manner in which programming is performed on such computing systems. Typically, programming languages permit a programmer to specify which data items (e.g., arrays and scalars) are stored in local and shared memory. However, programming languages and operating systems strive to make the difference between local and remote shared memory transparent to the programmer. While this greatly simplifies the programming effort, it also masks from the programmer the performance difference between local and remote memory utilization for shared data items.
An alternative technique designs the programming environment so that the programmer can distinguish, in the program source code, the difference between accessing shared memory within an SMP node, and remote memory in the other nodes of the hybrid systems. One such method is to use both the OpenMP and MPI programming models in the same program. The main drawback is that even simple programs become exceedingly complex and error prone when this technique is used.
Thus, there is a strong design motivation to keep the allocation of memory for shared data items between local and remote memories beyond the programmer's reach. The allocation of shared data items is, in fact, frequently undertaken without regard to whether the allocated memory will be local or remote to the CPU that will be using the data item. Consequently it is often difficult, for a programmer of parallel applications to realize the potential performance gains that might result from tighter control over whether CPU accesses for shared data items are made to local, rather than remote, shared memory.
The present invention is directed to resolving, or at least reducing, one or all of the problems mentioned above.
SUMMARY OF THE INVENTION
The invention comprises a technique for allocating memory in a multiprocessing computing system. In
Bircsak John A.
Harris Kevin W.
Wibecan Brian F.
Hewlett--Packard Development Company, L.P.
Kim Hong
LandOfFree
Technique for referencing distributed shared memory locally... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Technique for referencing distributed shared memory locally..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Technique for referencing distributed shared memory locally... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3100883