Parallelism performance analysis based on execution trace...

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S035000, C714S038110, C709S241000

Reexamination Certificate

active

06230313

ABSTRACT:

TECHNICAL FIELD
The present invention relates generally to analyzing the performance of the execution of a program, and more particularly to analyzing the degree and efficiency of the parallelism during the execution.
BACKGROUND OF THE INVENTION
Parallel computer architectures generally provide multiple processors that can each be executing different tasks simultaneously. One such parallel computer architecture is referred to as a multithreaded architecture (MTA). The MTA supports not only multiple processors but also multiple streams executing simultaneously in each processor. The processors of an MTA computer are interconnected via an interconnection network. Each processor can communicate with every other processor through the interconnection network.
FIG. 1
provides a high-level overview of an MTA computer system
100
. Each processor
101
is connected to the interconnection network and memory
102
. Each processor contains a complete set of registers
101
a
for each stream such that the register values at any given time indicate the current stream state. In addition, each processor also supports multiple protection domains, each with counters reflecting the current protection domain state
101
b
, so that multiple user programs can be executing simultaneously within that processor. Each processor may also have processor-specific counters reflecting the current processor state
101
c
. The computer system also includes various input devices
105
, a display device
110
, and a permanent storage device
120
.
Each MTA processor can execute multiple threads of execution simultaneously. Each thread of execution executes on one of the 128 streams supported by an MTA processor. Every clock cycle, the processor selects a stream that is ready to execute and allows it to issue its next instruction. Instruction interpretation is pipelined by the processor, the network, and the memory. Thus, a new instruction from a different stream may be issued in each cycle time period without interfering with other instructions that are in the pipeline. When an instruction finishes, the stream to which it belongs becomes ready to execute the next instruction. Each instruction may contain up to three operations (i.e., a memory reference operation, an arithmetic operation, and a control operation) that are executed simultaneously.
The state of a stream includes one 64-bit Stream Status Word (“SSW”), 32 64-bit General Registers (“R
0
-R
31
”), and eight 32-bit Target Registers (“T
0
-T
7
”). Each MTA processor has 128 sets of SSWs, of general registers, and of target registers. Thus, the state of each stream is immediately accessible by the processor without the need to reload registers when an instruction of a stream is to is be executed.
The MTA uses program addresses that are 32 bits long. The lower half of an SSW contains the program counter (“PC”) for the stream. The upper half of the SSW contains various mode flags (e.g., floating point rounding, lookahead disable), a trap disable mask (e.g., data alignment and floating point overflow), and the four most recently generated condition codes. The 32 general registers are available for general-purpose computations. Register R
0
is special, however, in that it always contains a 0. The loading of register R
0
has no effect on its contents. The instruction set of the MTA processor uses the eight target registers as branch targets. However, most control transfer operations only use the low 32 bits to determine a new PC. One target register (T
0
) points to the trap handler, which may be an unprivileged routine. When the trap handler is invoked, the trapping stream starts executing instructions at the program location indicated by register T
0
. Trap handling is thus lightweight and independent of the operating system (“OS”) and other streams, allowing the processing of traps to occur without OS interaction.
Each MTA processor supports as many as 16 active protection domains that define the program memory, data memory, and number of streams allocated to the computations using that processor. The operating system typically executes in one of the domains, and one or more user programs can execute in the other domains. Each executing stream is assigned to a protection domain, but which domain (or which processor, for that matter) need not be known by the user program. Each task (i.e., an executing user program) may have one or more threads simultaneously executing on streams assigned to a protection domain in which the task is executing.
The MTA divides memory into program memory, which contains the instructions that form the program, and data memory, which contains the data of the program. The MTA uses a program mapping system and a data mapping system to map addresses used by the program to physical addresses in memory. The mapping systems use a program page map and a data segment map. The entries of the data segment map and program page map specify the location of the segment in physical memory along with the level of privilege needed to access the segment.
The number of streams available to a program is regulated by three quantities slim, scur, and sres associated with each protection domain. The current numbers of streams executing in the protection domain is indicated by scur; it is incremented when a stream is created and decremented when a stream quits. A create can only succeed when the incremented scur does not exceed sres, the number of streams reserved in the protection domain. The operations for creating, quitting, and reserving streams are unprivileged. Several streams can be reserved simultaneously. The stream limit slim is an operating system limit on the number of streams the protection domain can reserve.
When a stream executes a CREATE operation to create a new stream, the operation increments scur, initializes the SSW for the new stream based on the SSW of the creating stream and an offset in the CREATE operation, loads register (T
0
), and loads three registers of the new stream from general purpose registers of the creating stream. The MTA processor can then start executing the newly created stream. A QUIT operation terminates the stream that executes it and decrements both sres and scur. A QUIT_PRESERVE operation only decrements scur, which gives up a stream without surrendering its reservation.
The MTA supports four levels of privilege: user, supervisor, kernel, and IPL. The IPL level is the highest privilege level. All levels use the program page and data segment maps for address translation, and represent increasing levels of privilege. The data segment map entries define the minimum levels needed to read and write each segment, and the program page map entries define the exact level needed to execute from each page. Each stream in a protection domain may be executing at a different privileged level.
Two operations are provided to allow an executing stream to change its privilege level. A “LEVEL_ENTER lev” operation sets the current privilege level to the program page map level if the current level is equal to lev. The LEVEL_ENTER operation is located at every entry point that can accept a call from a different privilege level. A trap occurs if the current level is not equal to lev. The “LEVEL_RETURN lev” operation is used to return to the original privilege level. A trap occurs if lev is greater than the current privilege level.
An exception is an unexpected condition raised by an event that occurs in a user program, the operating system, or the hardware. These unexpected conditions include various floating point conditions (e.g., divide by zero), the execution of a privileged operation by a non-privileged stream, and the failure of a stream create operation. Each stream has an exception register. When an exception is detected, then a bit in the exception register corresponding to that exception is set. If a trap for that exception is enabled, then control is transferred to the trap handler whose address is stored in register T
0
. If the trap is currently disabled, then control is transferred to the trap handler when th

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Parallelism performance analysis based on execution trace... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Parallelism performance analysis based on execution trace..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Parallelism performance analysis based on execution trace... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2528753

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.