Data processing: structural design – modeling – simulation – and em – Simulating electronic device or electrical system
Reexamination Certificate
1999-05-04
2001-09-04
Teska, Kevin J. (Department: 2123)
Data processing: structural design, modeling, simulation, and em
Simulating electronic device or electrical system
C716S030000
Reexamination Certificate
active
06285974
ABSTRACT:
TECHNICAL FIELD
This invention relates generally to computer test systems, and more particularly to techniques for hardware verification in multiprocessor (“MP”) computer systems.
CROSS-REFERENCE TO RELATED PATENT APPLICATIONS
The present application is related to the following application:
METHOD AND APPARATUS FOR DETECTING COHERENCY VIOLATION ON TEST FLOOR, U.S. patent Application Ser. No. 08/762,902 filed Dec. 10, 1996.
BACKGROUND OF THE INVENTION
Most system designs are represented by a model written in some hardware description language (“HDL”) that can be later transformed into silicon. The presilicon model is extensively verified through simulation before it is fabricated (“taped-out”). Since the fabrication process is very expensive, it is necessary to keep the number of tape-outs to a minimum by exposing all bugs either in simulation or in early releases of the hardware. While software simulators are slow, they permit unrestricted use of checker probes into the model-under-test. As a result, any violation exposed during simulation can be detected via the probes. On the other hand, hardware exercise programs can run at a very high speed but their checking abilities are limited to the data observed in the testcase.
Various testing methods and background information is found in A. Saha. N. Malik. J. Lin, C. Lockett and C. G. Ward. “Test floor Verification of Multiprocessor Hardware”, IPCCC 1996: IBM, “PowerPC Architecture”, Morgan Kaufman Publishers, 1993; L. lamport, “How to make a multiprocessor computer that correctly executes multiprocessor programis”, IEEE Transaction on Computers, September 1979; W. W. Collier, “Reasoning about Parallel Architectures”, Prentice-Hall Inc, 1990; Kourosh Gharachorloo, et. al, “Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors”, Proc. 17th Annual Symposium on Computer Architecture, May 1990; Janice Stone and Robert Fitzgerald, “An Overview of Storage in PowerPC”, Technical Report, IBM T.J. Watson Research Center, February 1993; A. Saha, N. Malik, B. O'krafka, J. Lin, R. Raghavan and U. Shamsi, “A Simulation Based Approach to Architectural Verification of Multiprocessor Systems”, IPCCC 1995; J. T. Yen, et. al, “Overview of PowerPC 620 Multiprocessor Verification Strategy”, Proc. International Test Conference, 1995; D. T. Marr, et. al, “Multiprocessor Validation of the Pentium Pro”, IEEE Computer Magazine, November 1996; G.Cai, “Architectural and Multiprocessor Design Verification of the PowerPC 604 Data Cache”, IPCCC 1995; and Ram Raghavan et al., “Multiprocessor System Verification through Behavioral Modeling and Simulation”, IPCCC 1995. However, present methods suffer from a variety of drawbacks which will be described in greater detail herein.
Below is a set of definitions used throughout this application:
True Sharing: When two (or more) processors compete to access the same location within a cache block, the accesses are said to be true sharing. The outcome of these accesses is non-deterministic until runtime.
False-Sharing: When two (or more) processors access different locations within a cache block, the accesses are said to be false sharing. The outcome of these accesses is deterministic and can be computed a priori by sequentially running each processor's test stream on a uni-processor.
Non-Sharing: When two (or more) processors access different locations in different cache blocks, the accesses are called non-sharing accesses. The outcome of these accesses is deterministic and can be computed much the same way as the false sharing case.
Barrier: A barrier is a section of code (written using synchronization primitives) placed within each participating processor's stream. Its purpose is to ensure no participating processor continues past it until all participating processors have reached it in their respective streams. Also, when a processor reaches a barrier, all storage accesses initiated prior to the barrier must be performed with respect to all the other processors.
Every architecture defines ordering rules for storage accesses to memory locations. The most restrictive form of ordering, Sequential Consistency, limits the performance of programs by requiring all storage accesses to be strictly ordered. Several new techniques, like weak-ordering, have relaxed this requirement such that, under certain conditions, storage accesses may execute out-of-order. Any required ordering is enforced by synchronization primitives which are an integral part of these architectures.
Thus, it is important to understand some weak-ordering rules which are provided below:
Rule 1: dependent storage accesses from a processor must perform in order, and all non-dependent accesses may perform out-of-order. However, in the presence of synchronization primitives, all storage accesses initiated prior to the synchronization primitive must perform before it performs, and all storage accesses initiated after a synchronization primitive must perform after it performs. By dependent, it is meant that these accesses are to the same location (address dependency) or there is some explicit register-dependency among them.
Rule 2: accesses to the same location are said to compete with each other when at least one of them is a store operation. Competing storage accesses from different processors can perform in any order. As a result, these accesses must be made non-competing by enclosing them within critical sections which are governed by lock and unlock routines.
Rule 3: all accesses to a particular location are coherent if all stores to the same location are serialized in some order and no processor can observe any subset of those stores in a conflicting order. That is, a processor can never load a “new” value first and later load an “older” value.
A commonly used hardware exerciser is shown in FIG.
1
. The hardware exerciser
100
is a program consisting of a Random Test Generator (“RTG”)
102
and a simple Functional Simulator
108
. It can be executed on either the hardware-under-test (“HUT”)
104
or on another machine known to function correctly.
The RTG
102
produces random streams of storage access instructions for each processor in the system. Due to buffering delays, the global order of storage accesses from different processors to a particular location is non-deterministic. Consequently, when two processors compete (as defined in Rule 2), the load operation may read the value held before the store operation or the value written by the store operation. This asynchronous nature of storage access ordering restricts the testcases generated by the RTG
102
to be false or non-shared such that the expected results are deterministic.
The Functional Simulator
108
is a simple reference model of the architecture and not the actual system. Given a multiprocessor (“MP”) testcase, it computes a set of deterministic expected values. Since the storage accesses are falsed
on-shared, the expected results can be computed by sequentially executing each processor's stream on the Functional Simulator
108
. After computing the expected results, the MP testcase is loaded on the HUT
104
and executed. When the testcase completes, the expected values from the Functional Simulator
108
are compared by the comparator
110
in the checker
106
with the ones obtained from the actual MP test run. If a mismatch occurs, a violation has been detected.
A variation of the above exerciser supports some restricted true sharing with the use of barriers. If two processors execute competing accesses (true sharing), the RTG
102
identifies these accesses and partitions them across barriers. In the example shown in
FIG. 2A
, two processors P
0
and P
1
, compete for memory locations A and B. In
FIG. 2B
, the code sequence is modified so that the accesses to A and B are serialized using barriers.
If the stream between two barriers is defined as a barrier-window, only one processor is allowed to access a particular location within a barrier-window. As a result, the accesses to a location across all processors are the same as sequentially
Mandyam Sriram
O'Krafka Brian Walter
Raghavan Ramanathan
Ramirez Robert James
Tokugawa Miwako
Bracewell & Patterson L.L.P.
Broda Samuel
Emile Volel
International Business Machines - Corporation
Teska Kevin J.
LandOfFree
Hardware verification tool for multiprocessors does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Hardware verification tool for multiprocessors, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Hardware verification tool for multiprocessors will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2470644