Electrical computers and digital processing systems: memory – Storage accessing and control – Shared memory area
Reexamination Certificate
2000-02-28
2002-09-24
Nguyen, Than (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Shared memory area
C711S148000, C711S147000, C711S153000, C711S163000
Reexamination Certificate
active
06457107
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention generally relates to improved distributed computing systems and in particular to improved memory management in distributed computing systems. Still more particularly, the present invention relates to a method, system, and program product for improving memory page sharing in a distributed computing environment.
2. Description of the Related Art
Multiprocessor computer systems are well known in the art, and provide for increased processing capability by allowing processing tasks to be divided among several different system processors. In conventional systems, each processor is able to access all of the system resources; i.e., all of the system resources, such as memory and I/O devices, are shared between all of the system processors. Typically, some parts of a system resource may be partitioned between processors, e.g., while each processor will be able to access a shared memory, this memory is divided such that each processor has its own workspace.
More recently, symmetric multiprocessor (SMP) systems have been partitioned to behave as multiple independent computer systems. For example, a single system having eight processors might be configured to treat each of the eight processors (or multiple groups of one or more processors) as a separate system for processing purposes. Each of these “virtual” systems would have its own copy of the operating system, and may then be independently assigned tasks, or may operate together as a processing cluster, which provides for both high-speed processing and improved reliability. Typically, in a multiprocessor system, there is also a “service” processor, which manages the startup and operation of the overall system, including system configuration and data routing on shared buses and devices, to and from specific processors.
Typically, when an SMP system is divided into multiple virtual systems, each of the virtual systems has its own copy of the operating system, and the same operating system is used for each virtual system. Since each processor is running the same operating system, it is relatively easy to provide for resource allocation among the processors.
The name “multiprocessor” is used to connote a parallel computer with a “shared common memory”; the name “multicomputer” is used to connote a parallel computer with an “unshared distributed memories” or NO Remote Memory Access (NORMA).
Shared memory multiprocessors (often termed as “tightly coupled computers”) are further classified into three categories: UMA, NUMA, and COMA. UMA machines feature “Uniform Memory Access”, which implies that the latency for a memory access is uniform for all processors. Alternately, NUMA machines feature “Non-Uniform Memory Access”, which implies that the latency for a memory access depends on the identity of the “location” of the processor and memory. Notice that a portion of the global shared memory of a NUMA machine may be uniformly accessible (i.e. part of a NUMA may be UMA). There are several memory organizations possible for NUMA machines. The most common is a distributed global memory, in which each processor maintains locally a “piece” of that memory. Access to the “local memory” is quite fast whereas access to “remote memory” (maintained by some other processor) is much slower (typically 2 orders of magnitude slower), as it requires navigation through a communication network of some sort. In addition to local memory, a NUMA machine may have a cache memory. If the collective size of the local cache memory of all processors is big enough, it may be possible to dispense with main memory altogether. This results in a COMA (Cache-Only Memory Access) machine (a.k.a. ALLCACHE machines).
UMA/NUMA/COMA multiprocessor machines are further classified as being either symmetric or asymmetric. A symmetric multiprocessor gives all processors “equal access” to the devices (e.g. disks, I/O) in the system; an asymmetric multiprocessor does not. In a symmetric system, executive programs (e.g. OS kernel) may be invoked on any processor.
Non-uniform memory access (NUMA) is a method of configuring a cluster of microprocessors in a multiprocessing system so that they can share memory locally, improving performance and the ability of the system to be expanded. NUMA is used in a symmetric multiprocessing (SMP) system. Ordinarily, a limitation of SMP is that as microprocessors are added, the shared bus or data path get overloaded and becomes a performance bottleneck. NUMA adds an intermediate level of memory shared among a few microprocessors so that all data accesses don't have to travel on the main bus. To an application program running in an SMP system, all the individual processor memories look like a single memory.
There are two outstanding problems with Non-Uniform Memory Access (NUMA) computers, latency and coherency. Both of these problems are magnified when false sharing occurs.
In a distributed computing environment, including multiprocessor computers, each CPU has its own physical memory and cannot directly see the physical memory of another CPU. The virtual address space, or virtual memory, of the distributed environment is distributed across the physical memory of the CPUs which are participating in the environment. A CPU can claim ownership of an address range (typically the machine page size, such as 4 Kilobytes), which we will call a “page”, and that portion of the virtual address range is sent to that CPU for storage in it's physical memory. Thus, only one CPU can view the contents of a particular page of physical memory at any time.
For example, if the requesting CPU only needs to access the first 512 bytes of a 4 Kilobyte page it must still retrieve and claim ownership of the entire 4 Kilobyte page.
This introduces the problem of “False Sharing”, wherein multiple processors each require access to the same block simultaneously, even if they actually access unrelated parts of that block. In this example, the CPU has claimed 4 Kilobytes of storage when it only needs access to 512 bytes. False sharing leads to reduced cache utilization, increased network traffic, and delays while waiting for data to be retrieved.
If the page being shared is frequently used, thrashing can occur and performance will suffer. Thrashing is a behavior characterized by the extensive exchange of data between processors competing for the same data block, which occurs so frequently that it becomes the predominant activity. This will considerably slow down all useful processing in the system. It would therefore be desirable to provide a software-based memory management system which reduces thrashing and false sharing.
SUMMARY OF THE INVENTION
It is therefore one object of the present invention to provide improved distributed computing systems.
It is another object of the present invention to provide improved memory management in distributed computing systems.
It is yet another object of the present invention to provide a method, system, and program product for improving memory page sharing in a distributed computing environment.
The foregoing objects are achieved as is now described. The preferred embodiment provides a method, system, and computer program product for reducing false sharing in a distributed computing environment, and in particular to a multi-processor data processing system. A method is proposed to define a virtual address range, within the system memory available to the processors, which will have a finer granularity than the hardware page size. These smaller sections, called “sub-pages,” allow more efficient memory management. For example, a 64 Kilobyte range may be defined by the memory management software to have a 512 byte granularity rather than 4 Kilobytes, with each 512-byte sub-page capable of being separately managed.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.
REFERENCES:
patent: 5313578 (1994-05-01), Christopher
patent: 5440710 (1995-08-01), Richter et al.
pa
Beadle Bruce A.
Brown Michael Wayne
Ullmann Cristi Nesbitt
Wynn Allen Chester
Bracewell & Patterson L.L.P.
Dawkins Marilyn Smith
Nguyen Than
LandOfFree
Method and apparatus for reducing false sharing in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for reducing false sharing in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for reducing false sharing in a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2891967