Using hardware counters to estimate cache warmth for...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C709S241000

Reexamination Certificate

active

06615316

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
This invention relates to a computer system and method of estimating cache warmth for thread schedulers in a processor. More specifically, the invention determine how much of a thread's context is contained in a particular cache to enhance cache affinity and thread migration decision in an operating system kernel.
2. Description Of The Prior Art
Multiprocessor systems by definition contain multiple processors, also referred to herein as CPUs, that can execute multiple processes or multiple threads within a single process simultaneously, in a manner known as parallel computing. In general, multiprocessor systems execute multiple processes or threads faster than conventional uniprocessor systems, such as personal computers (PCs), that execute programs sequentially. The actual performance advantage is a function of a number of factors, including the degree to which parts of a multithreaded process and/or multiple distinct processes can be executed in parallel and the architecture of the particular multiprocessor system at hand. One critical factor is the caches that are present in modern multiprocessors. Cache memories store data frequently or recently executed by their associated processors. A cache is said to be warm with respect to a particular process when it contains data required for the execution of the process. Conversely, a cache is said to be cold relative to a particular process when it contains little or no data required for the execution of that process. When a cache is cold, access to the cache will miss. Accordingly, performance can be optimized by running processes and threads on CPUs whose caches already contain the memory that those processes and threads are going to be using.
Shared memory multiprocessor systems offer a common physical memory address space that all processors can access. Multiple processes therein, or multiple threads within a process, can communicate through shared variables in memory which allow the processes to read or write to the same memory location in the computer system. Message passing multiprocessor systems, in contrast to shared memory systems, have a distinct memory space for each processor. Accordingly, messages passing through multiprocessor systems require processes to communicate through explicit messages to each other.
The architecture of shared memory multiprocessor systems may be classified by how their memory is physically organized. In distributed shared memory (DSM) machines, the memory is divided into modules physically placed near one or more processors, typically on a processor node. Although all of the memory modules are globally accessible, a processor can access local memory on its node faster than remote memory on other nodes. Because the memory access time differs based on memory location, such systems are also called non-uniform memory access (NUMA) machines. In centralized shared memory machines, on the other hand, the memory is physically in one location. Centralized shared memory computers are called uniform memory access (UMA) machines because the memory is equidistant in time for each of the processors. Both forms of memory organization typically use high-speed caches in conjunction with main memory to reduce execution time.
The use of NUMA architecture to increase performance is not restricted to NUMA machines. A subset of processors in an UMA machine may share a cache. In such an arrangement, even though the memory is equidistant from all processors, data can circulate among the cache-sharing processors faster (i.e., with lower latency) than among the other processors in the machine. Algorithms that enhance the performance of NUMA machines can thus be applied to any multiprocessor system that has a subset of processors with lower latencies. These include not only the noted NUMA and shared-cache machines, but also machines where multiple processors share a set of bus-interface logic as well as machines with interconnects that “fan out” (typically in hierarchical fashion) to the processors.
The James et al. U.S. Pat. No. 6,073,225 teaches a method of optimizing memory and process assignments in NUMA computer systems. James et al. describes a method for collecting hardware statistics in hardware counters. The Hejna, Jr. et al. U.S. Pat. No. 5,287,508 teaches a method and apparatus for driving scheduling decisions off of a cache miss counter. The system disclosed schedules processes according to the priority of the process as well as the cache miss count associated with a particular processor.
However, there is a need for a computer system comprising multiple processors and an improved method of estimating cache warmth, and manipulating the state of the system based upon the lifetime of the cache line. Accordingly, an efficient yet accurate mathematical model is desirable for incorporating hardware counter information for estimating cache warmth of a processor and utilizing the estimate information to schedule processes on a processor.
SUMMARY OF THE INVENTION
It is therefore an object of the invention to provide a method of operating a computer system comprising multiple processors. It is a further object of the invention to provide a method of creating a mathematical model to estimate cache warmth and utilize information obtained from hardware counters to more accurately build the model. The system utilizes hardware measurements of cache related events, including misses, invalidations, hits and rollout counters in a model of cache behavior. The information gathered from the model is then used to estimate the amount of state that a given process or thread has in a given CPU or node cache. Based upon the information obtained from the hardware counters and generated by the model, changes in the use of computing resources are made, including scheduling processes on a processor, transferring processes between processors, moving memory among processor nodes, and retargetting interrupt handlers from one processor to another. Other objects of the invention includes providing a computer system and article of manufacture for use with the model for estimating cache warmth.


REFERENCES:
patent: 5185861 (1993-02-01), Valencia
patent: 5287508 (1994-02-01), Hejna, Jr. et al.
patent: 6073225 (2000-06-01), James et al.
patent: 6243788 (2001-06-01), Franke et al.
Kim et al., A Virtual Cache Scheme For Improving Cache Affinity . . . , Abstract, Proceedings of HPCS '98: 12th Annual International Symposium on High Performance Computing Systems.
Squillante, et al., Using Processor-Cache Affinity Information . . . , Feb. 1993, pp 131-143.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Using hardware counters to estimate cache warmth for... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Using hardware counters to estimate cache warmth for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Using hardware counters to estimate cache warmth for... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3093213

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.