Process for allocating memory in a multiprocessor data...

Electrical computers and digital processing systems: memory – Address formation – Address mapping

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S216000, C711S209000, C711S170000, C711S171000, C711S172000, C711S173000, C711S147000, C711S148000, C711S153000

Reexamination Certificate

active

06272612

ABSTRACT:

TECHNICAL FIELD
The present invention relates to a process for allocating memory in a multiprocessor data processing system, more specifically a process for allocating a memory having non-uniform access.
Within the scope of the invention, the term “non-uniform” is intended in a temporal sense, as will be shown. Likewise, the term “a memory” is intended in a general sense. It can indicate a distributed memory, a memory hierarchy (for example comprising banks of memories with different access times), or a set of memories of different types.
BACKGROUND OF THE INVENTION
As is well known in the data processing field, it is possible to increase the power of a machine by increasing the number of processors of which it is composed. “Symmetrical Multiprocessor” (SMP) allows different processors in the same machine to access its memory symmetrically by means of a system bus. These are machines with uniform access memory, inasmuch as the access time to the memory is substantially the same for all of the data accessed.
For this reason, the architecture is called “UMA” (a “Uniform Memory Access”).
FIG. 1
attached to the present specification schematically illustrates an example of the “UMA” type architecture.
The data processing system
1
, which hereinafter will be called the “SMP” module, comprises a certain number of central processing units or processors, or “CPU”. Four central processing units (CPUS) are represented in the example of FIG.
1
:
10
through
13
. Associated with these central processing units
10
through
13
is a main memory
14
accessible by all of them via an access line L.
Since all the accesses take place within the module
1
, that is, locally, and if the total available memory space is homogeneous in terms of access time, (which constitutes the initial hypothesis, since this is a “UMA” architecture), the access time remains substantially the same, no matter which central processor
10
through
13
has sent a request.
Although only four central processors
10
through
13
have been represented in
FIG. 1
, it should be clear that this number is completely arbitrary. It can be increased or decreased. However, the performance curve of machines of this type does not increase in linear fashion as a function of the number of processors. An increased number of processors causes the system to consume more time for problems of accessibility to its resources that it has available for running applications. The consequence of this is to considerably lower the performance curve when the number of processors exceeds an optimum value, often estimated at about 4. The prior art proposes various solutions to this problem.
One known solution consists of grouping a plurality of machines into clusters so as to have them communicate with one another through a network. Each machine has an optimal number of processors, for example four, and its own operating system. It establishes a communication with another machine every time it performs an operation on data maintained by this other machine. The time required for these communications and the need to work on consistent data causes latency problems for high-volume applications such as, for example, distributed applications which require numerous communications. Latency is the time that separates the instant at which a request for access to the memory is sent, and the instant at which a response to this request is received.
Another known solution is that of machines with a “Non-uniform Memory Access” (NUMA) architecture. These are machines with non-uniform access memory, inasmuch as the access time to the memory varies depending on the location of the data accessed. A “NUMA” type machine is constituted by a plurality of modules, each module comprising an optimal number of processors and a physical part of the total memory of the machine. A machine of this type has non-uniform memory access because a module generally has easier and faster access to a physical part of the memory that it does not share with another module than to a part that it shares. Although each module has a private system bus linking its processors with its physical memory, an operating system common to all the modules allows all of the private system busses to be considered as a single, unique system bus of the machine. A logical address assigns a place of residence to a given physical memory location of a module. For a specific processor, accesses to a local memory part physically located in the same module as the processor are distinguished from accesses to a remote memory part, physically located in one or more modules other than the one in which the processor is located.
FIG. 2
attached to the present description schematically illustrates an example of this type of architecture, that is, a “NUMA” architecture. To simplify the drawing, it has been assumed that the data processing system
1
′ comprises only two modules, Ma and Mb, of the above-mentioned “SMP” type, and that the two modules are identical. It must be understood, however, that the data processing system
1
′ can comprise a greater number of modules and that the modules Ma and Mb can be different (particularly in terms of the number of central processors).
The module Ma comprises four central processors
10
a
through
13
a
, and a main memory
14
a
. Likewise, the module Mb comprises four central processors
10
b
through
13
b
, and a main memory
14
b
. The two memories
14
a
and
14
b
(and more generally the n main memories) communicate with one another by means of what is called a “link”
2
, generally via so-called remote cache memories
15
a
and
15
b
, respectively. The link
2
does not correspond to simple physical links, but comprises various standard electronic circuits (control circuits, interface circuits, etc.) that do not need to be described any further because they are well known in the prior art.
It is easy to understand that, in an architecture of this type, if an application is running in the module Ma, for example, the access time to the “near” memory
14
a
(local access) is, a priori, less than the access time to the “far” memory
14
b
located in the module Mb, no matter which central processor
10
a
through
13
a
is involved. It is specifically necessary to pass through the link
2
when the data are physically stored in another module, which substantially increases the transfer time.
In modern data processing systems, the allocation of the memory for a given application is carried out on the basis of a virtual memory space. This allocation is placed under the control of the operating system or “OS.” A dynamic correspondence is then established between the virtual memory space and the physical memory. For this purpose, it is customary to use address correspondence tables called dynamic “mapping”. Various types of memory configurations have been proposed, including organization by regions or by segments. To explain the concepts without in any way limiting the scope of the invention, the case of a “segment” type configuration will be described below. In practice, a segment is defined as a space of contiguous virtual addresses, of fixed and predetermined length.
More precisely, in the prior art, the above-mentioned dynamic correspondence or “mapping” is carried out in accordance with rules common to all the applications, no matter what their types, without taking into account the location of the physical memory. In practice, if a process intends to access a virtual address and no entry in the address correspondence table is found, an exception is generated, which is formalized by the detection of a page fault, according to the “UNIX” (registered trademark) terminology. The term “page” can be defined more generally as being a “range of contiguous addresses.” A page constitutes a subdivision of a segment. However, for purposes of simplification, the term “page” will be used below. After a detection of a page fault, a device called a handler allocates physical memory in accordance with the above-mentioned common rules. This simple allocation method is entirel

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Process for allocating memory in a multiprocessor data... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Process for allocating memory in a multiprocessor data..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Process for allocating memory in a multiprocessor data... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2463616

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.