Computer architecture with dynamic sub-page placement

Electrical computers and digital processing systems: memory – Storage accessing and control – Shared memory area

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06766424

ABSTRACT:

TECHNICAL FIELD
The present invention relates generally to high performance parallel computer systems and more particularly to dynamic page placement in cache coherent non-uniform memory architecture systems.
BACKGROUND ART
Many high performance parallel computer systems are built as a number of nodes interconnected by a general interconnect network (e.g., crossbar and hypercube), where each node contains a subset of the processors and memory in the system. While the memory in the system is distributed, several of these systems (called NUMA systems for Non-Uniform Memory Architecture) support a shared memory abstraction where all the memory in the system appears as a large memory common to all processors in the system.
These systems have to address the problem of where to place physical pages within the distributed memory system since the local memory is close to each processor. Any memory that is not local to the processor is considered remote memory. Remote memory has a longer access time than local memory, and different remote memories may have different access times. With multiple processors sharing memory pages and a finite size memory local to each processor, some percentage of the physical pages required by each processor will be located within remote physical memory. The chances that a physical page required by a processor is in local memory can be improved by using static page placement of physical memory pages.
Static page placement attempts to locate each physical memory page in the memory that causes the highest percentage of memory accesses to be local. Optimal physical memory page placement reduces the average memory access time and reduces the bandwidth consumed inside of the processor interconnect between processor nodes where there is uniform memory access time. The static page placement schemes include Don't Care, Single Node, Line Interleaved, Round Robin, First Touch, Optimal, etc., which are well known to those skilled in the art.
Dynamic page placement may be used after the initial static page placement to replicate or migrate the memory page to correct the initial placement or change the location due to changes in the particular application's access patterns to the memory page. The page placement mechanism, which is involved in the decision and copying/movement of the physical pages, may be in the multi-processor's operating system (OS) or in dedicated hardware.
A replication is the copying of a physical page so that two or more processors have a local copy of the page. As long as the memory accesses are reads, multiple copies of data can be allowed without causing coherence difficulties. As soon as a write to the page is sent to the memory system, either all copies of the page must be removed or an update coherence algorithm must be in place to make sure all of the pages have the same data.
A page migration is the movement of a physical memory page to a new location. The migration is usually permanent and does not require special handling as is required for writes to replicated pages.
An approach to dynamic page placement is described in the paper by Ben Verghese, Scott Devine, Anoop Gupta, and Mendel Rosenblum, “Operating System Support for Improving Data Locality on CC-NUMA Compute Servers”, In ASPLOS VII, Cambridge, Mass., 1996.
In the past, dynamic page placement has been implemented by using a group of history counters for each memory page to keep track of how many accesses are made from each processor. It should be noted that memory accesses from a processor can also be thought of as cache misses to the processor's lowest level of cache. When these counters reach the preset thresholds discussed later, the page placement mechanism is made aware that there is a page in memory that has enough data on how it is accessed for the page placement mechanism to determine the optimal page placement. Once the optimal page placement has been determined, the page in memory can be migrated or replicated to the optimal uniform memory access (UMA) cell. A UMA cell is a grouping of memories which can be accessed by processors in the multi-processor system with the same access latency.
Dynamic page placement increases memory locality and therefore reduces latency and network traffic to improve performance. However, the technique performs best with a small page size. Even for an application like a database with a large data footprint, the standard buffer size is 2K bytes. With 2K byte data structures, there will be 512×1024 data structures in a 1-gigabyte page. With this many data structures on a single page, it is desirable to arrange the structures in such a way as to maximize locality to increase system performance. At worst case, with poor locality, the number of local memory accesses will be 1 divided by the number of UMA cells in the DSM system. For a 128-processor system with 2 processors per UMA cell, at worst case only {fraction (1/64)}
th
of the memory accesses are local.
In addition, with so many data structures located on a single memory page, it is likely that all processors will be accessing each physical page equally. Therefore, static and dynamic page placement techniques will be unable to find a UMA cell to place a page to maximize local memory accesses and therefore improve performance. Further, memory page hotspotting will occur. Hotspotting is the creation of a bandwidth bottleneck due to multiple processors attempting to access the same memory structure at the same time.
Working against small page sizes is the fact that many current processors only contain 96 to 128 entry translation look-aside buffers (TLBs), which are the processor caches that translate virtual to physical addresses and keep track of recently used translations of virtual page numbers. The small size of the TLB requires large pages for good performance when running multiple applications or applications with large data or instruction footprints, for example 1-gigabyte physical pages.
To track the changes in the application's access patterns to the memory page, histories need to be maintained for every page in memory. A set of history counters is located close to the memory system for every physical page in memory and one counter is required for every UMA cell in the multi-processor system. Whenever a memory access is generated from a processor within a UMA cell, the counter representing the page and the cell for the processor performing the access is incremented.
There are two solutions for locating the counters: either within the memory itself or located in a separate hardware structure, such as the memory controller or the directory controller. Placing the counters within the memory has the advantage of keeping the cost down by using the existing DRAM in memory and the number of counters are automatically scaled with the installation of more memory. Unfortunately, this placement has the disadvantage of halving the memory bandwidth because of the accessing and updating of the counters. Placing the counters outside of memory adds a significant amount of hardware to the system because the hardware must be designed for the maximum amount of installable memory and also for the minimum physical page size.
Those skilled in the art currently teach that the future of multiprocessor systems lies in increasing the physical page size to offset the availability of only 96 to 128 TLB entries per processor rather than using a small page size to improve the percentage of local memory accesses. This is due to the long latency of handling TLB misses.
DISCLOSURE OF THE INVENTION
The present invention provides a multiprocessor system, where the latencies to access areas of memory have different values, with the capability of having the operating system use large page sizes while dynamic page placement manipulates subsets of the large pages without affecting the translation look-aside buffers of the processors. A sub-page support structure is inserted between the processor and the network interface to remote memory that on a remote memory access determines if a loca

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Computer architecture with dynamic sub-page placement does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Computer architecture with dynamic sub-page placement, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computer architecture with dynamic sub-page placement will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3241040

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.