Instrumentation device for a machine with non-uniform memory...

Electrical computers and digital processing systems: memory – Storage accessing and control – Shared memory area

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S122000, C711S141000, C711S206000, C365S236000

Reexamination Certificate

active

06195731

ABSTRACT:

CROSS REFERENCE TO RELATED APPLICATION
The subject matter of this application is related to U.S. application Ser. No. 09/082,938, in the names of Thierry BORDAZ and Jean-Dominique SORACE, filed concurrently herewith and assigned to the Assignee of the present invention and corresponding to French application 97 06388 filed May 26, 1997.
1. Field of the Invention
The invention relates to an instrumentation device for a machine with non-uniform memory access, in the data processing field.
2. Background of the Invention
In the data processing field, it is possible to increase the power of a machine by increasing the number of processors of which it is composed. One type of machine known as a symmetrical memory processor (SMP) allows various processors in the same machine to access its memory symmetrically by means of a system bus. These are machines with uniform memory access, in that the memory access time is substantially the same for all the data accessed. However, the performance curve of such machines does not increase in a linear way as a function of the number of processors. A high number of processors requires the machine to manage more problems of accessibility to its resources than it has resources available for running applications. The result of this is that the performance curve drops considerably when the number of processors exceeds an optimum value, often estimated to be on the order of four. The prior art offers various solutions to this problem.
One known solution consists of grouping a plurality of machines into clusters, in order to have them communicate with one another through a network. Each machine has an optimal number of processors, for example four, and its own operating system. It establishes a communication with another machine every time it performs an operation on data maintained by this other machine. The time required for these communications and the need to work on consistent data causes latency problems for high-volume applications such as, for example, distributed applications which require numerous communications. Latency is the duration that separates the instant at which a request for access to the memory is sent, and the instant at which a response to this request is received.
Another known solution is that of machines of the non-uniform memory access (NUMA) type. These are machines with non-uniform memory access, in that the memory access time varies according to the location of the data accessed. A NUMA type machine is constituted by a plurality of modules, each module comprising an optimal number of processors and a physical part of the total memory of the machine. A machine of this type has non-uniform memory access because it is generally easier for a module to access a physical part of the memory that it does not share with another module than to access a part that it shares. Although each module has a private system bus linking its processors and its physical memory, an operating system common to all the modules makes it possible to consider all of the private system busses as a single, unique system bus of the machine. A logical addressing assigns a place of residence to a predetermined physical memory location of a module. For a specific processor, accesses to a local memory part physically located in the same module as the processor are distinguished from accesses to a remote memory part, physically located in one or more modules other than that in which the processor is located.
One particular type of NUMA machine is the cache coherency non-uniform memory access (CCNUMA) type, that is, the type of machine having cache coherency. A shared caching mechanism ensures that at a given instant, a valid, that is updated, copy of this block is not necessarily located in its physical memory location of residence. Thus, one or more valid copies of the block can migrate from one module to another in response to application requests and system requests. The performance of the machine depends directly on the speed with which a module accesses a valid copy of a block it is processing. It is advisable to set up the operating system of the machine and the applications run by this machine in such a way that each module processes, insofar as possible, copies of clocks located in its physical memory which, whenever possible, are valid. The accesses to these valid copies are the fastest, since they require the fewest transactions with other modules.
The design of an operating system, and subsequently of applications, requires properly taking into account the repercussions it has on the performance of the machine. It is possible to consider testing the operating system or the applications using programs that simulate the behavior of the machine with this operating system or with these applications. Thus it may be possible to learn how to adapt the operating system and/or the applications, for example by playing with the allocation of addresses, the creation of software tables or the sequencing of tasks. However, it is difficult to anticipate all the cases that will occur in the effective operation of the machine.
SUMMARY OF THE INVENTION
A first object of the invention is to provide a machine with non-uniform memory access constituted by a plurality of modules, each module comprising a unit with a table for managing local accesses to a memory part local to the module and a table for managing accesses to a memory part remote from the module, by means of a system bus, characterized in that the machine comprises:
a counter of hits in the local memory part not requiring a transaction with a remote module;
a counter of misses in the local memory part requiring at least one transaction with a remote module;
a counter of hits in the remote memory part not requiring a transaction with a remote module;
a counter of misses in the remote memory part requiring at least one transaction with a remote module.
This makes it possible to measure in real time the ratio of fast memory accesses, which do not require a transaction to ensure cache coherency, and slow memory accesses, which require at least one transaction to ensure cache coherency. However, a problem can always arise if the use of the resources required for the incrementation of these counters diminishes the performance of the machine.
A second object of the invention relates to a machine with non-uniform memory access, characterized in that the four counters are physically located in this unit. Thus, the incrementation of the counters does not require the use of any additional resource via the system bus.
In accordance with the present invention, there is provided a process for calculating the average memory access time is therefore comprised of
multiplying the contents:
of the counter of hits in the local memory part by the average hit time in the local memory part,
of the counter of misses in the local memory part by the average miss time in the local memory part,
of the counter of hits in the remote memory part by the average hit time in the remote memory part,
of the counter of misses in the remote memory part by the average miss time in the remote memory part,
adding the four results thus obtained,
and dividing this sum by the sum of the contents of the four access counters.
It is possible to consider determining the average time for each type of access with the aid of a bus analyzer implemented during the testing of the machine. The average times determined in this way are then supplied as characteristic parameters with the machine. The times required for hits without a transaction with a remote module, whether in the local memory part or in the remote memory part, are practically constant since they depend only on the load of the bus local to the module. The average of these access times, calculated a priori by means of a standard bus analyzer, is therefore representative of the subsequent behavior of the machine in operation, with an acceptable level of reliability. However, the misses accompanied by transactions with remote modules result in latencies, which depend on the latencies of the transac

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Instrumentation device for a machine with non-uniform memory... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Instrumentation device for a machine with non-uniform memory..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Instrumentation device for a machine with non-uniform memory... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2587182

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.