Excavating
Patent
1996-11-14
1998-06-16
Lall, Parshotam S.
Excavating
395704, 371 211, G06F 1134
Patent
active
057685002
ABSTRACT:
Fueled by higher clock rates and superscalar technologies, growth in processor speed continues to outpace improvement in memory system performance. Reflecting this trend, architects are developing increasingly complex memory hierarchies to mask the speed gap, compiler writers are adding locality enhancing transformations to better utilize complex memory hierarchies, and applications programmers are re-coding their algorithms to exploit memory systems. All of these groups need empirical data on memory behavior to guide their optimizations. This paper describes how to combine simple hardware support and sampling techniques to obtain such data without appreciably perturbing system performance. By augmenting a cache miss counter with a compare register and interrupt line such that the processor is interrupted when the counter matches the compare value, we can sample system state and develop cache miss profiles that associate cache misses with specific processes, procedures, call stacks, addresses, or user defined aspects of system state. This idea is implemented in the Mprof prototype that profiles data stall cycles, first level cache misses, and second level misses on the sun Sparc 10/41. Simple case studies are provided to illustrate Mprof's features.
REFERENCES:
patent: 3868480 (1975-02-01), Murgio et al.
patent: 4514835 (1985-04-01), Bottigheimer
patent: 4528553 (1985-07-01), Hasting et al.
patent: 4939755 (1990-07-01), Alcita et al.
patent: 5355487 (1994-10-01), Keller et al.
patent: 5446876 (1995-08-01), Levine et al.
Clements, "Microprocessor Systems Design", 1987 pp. 385-393.
Alan Mink et al.; Multiprocessor Performance-Measurement Instrumentation Computer Magazine; Sep. 1990; vol. 23, Issue 9; pp. 63-75.
T. A. Cargill et al., "Cheap Hardware Support for Software Debugging and Profiling", 1987, pp. 82-83.
Margaret Martonosi et al., "MemSpy: Analyzing Memory System Bottlenecks in Programs", Performance Evaluation Review, vol. 20, No. 1, Jun. 1992, pp. 1-12.
Jack E. Veenstra et al., "A Performance Evaluation of Optimal Hybrid Cache Coherency Protocols", ASPLOS V Proceedings, Oct. 12-15, 1992, pp. 149-160.
Scott McFarling, "Program Optimization for Instruction Caches", ASPLOS-III Proceedings, Apr. 3-6, 1989, pp. 183-191.
Allen D. Malony et al., "Performance Measurement Intrusion and Perturbation Analysis", IEEE Transactions On Parallel and Distributed Systems, vol. 3, No. 4, Jul. 1992, pp. 433-450.
Aaron J. Goldberg et al., "Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications", IEEE Transactions on Parallel and Distributed Systems, vol. 4, No. 1, Jan. 1993, pp. 28-40.
Daniel Lenoski et al., "The DASH Prototype: Logic Overhead and Performance", 1993, IEEE.
Edward Rothenberg et al., "Techniques for Improving the Performance of Sparse Matrix Factorization on Multiprocessor Workstations", Proceedings Supercomputing '90, Nov. 12-16, 1990, pp. 232-241.
David Callahan et al., "Analyzing and Visual Performance of Memory Hierarchies", Performance Instrumentation and Visualization, Chapter 1, pp. 1-26.
Michael E. Wolf et al., "A Data Locality Optimizing Algorithm", Sigplan Notices, vol. 26, No. 6, Jun. 1991, pp. 30-44.
Steven McCanne et al., "A Randomized Sampling Clock for CPU Utilization Estimation and Code Profiling", 1993 Winter USENIX, Jan. 25-29, 1993, pp. 387-394.
"Measurement Techniques", Chapter 2, pp. 26-99.
Robert J. Hall et al., "Call Path Profiling of Monotonic Progam Resources in UNIX", 1993 Summer Usenix, Jun. 21-25, 1993, p. 1-13.
Karl Pettis et al., "Profile Guided Code Positioning", Sigplan Notices, vol. 25, No. 6, Jun. 1990, pp. 16-27.
Anant Agrawal et al., "ATUM: A New Technique for Capturing Adress Traces Using Microcode", Proceedings of 13th Annual International Symposium on Computer Architecture, 1986, pp. 119-127.
David R. Cheriton et al., "Restructuring a Parallel Simulation to Improve Cache Behavior in a Shared-Memory Multiprocessor: The Value of Distributed Synchronization", Proceedings of the International Symposium on Shared Memory Multiprocessing, 1991, pp. 109-118.
P. Magnusson et al.; Efficient Memory Simulation in SimICS; Proceedings of 28th Annual Simulation Symposium; pp. 62-73, Apr. 1995.
M. Martonosi et al.; Tuning memory Performance of Sequential and Parallel Programs; pp. 32-40, Apr. 1995.
A. Goldberg et al.; Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications; IEEE Transactions on Parallel and Distributed Sys. vol. 4, No. 1; pp. 28-40, Jan. 1993.
Agrawal Prathima
Goldberg Aaron Jay
Trotter John Andrew
Coulter Kenneth R.
Lall Parshotam S.
Lucent Technologies - Inc.
LandOfFree
Interrupt-based hardware support for profiling memory system per does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Interrupt-based hardware support for profiling memory system per, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Interrupt-based hardware support for profiling memory system per will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-1737111