Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1999-06-08
2002-06-11
Beausoleil, Robert (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C717S152000
Reexamination Certificate
active
06405326
ABSTRACT:
FIELD OF THE INVENTION
The present invention is in the general field of timing related bug detectors which aim at detecting data races in multi-threaded computer programs applications.
BACKGROUND OF THE INVENTION
A general computer program is a list of statements, instructions, and commands to be executed properly and in a well ordered fashion. The operating system (OS hereafter) is the computer software that manages all the activities taking place in the computer. The OS is responsible to run the program on the computer.
The task of the computer program is the results it generates during its execution.
The computer may have more than one processor available to fulfill the OS needs and requirements. The OS might allocate more than one processor to execute a given program. If one processor is allocated to run the program, then the program's instructions are executed one-by-one in a well ordered fashion to generate the expected results. This sequential run of the program generates the sequential results, which are the results that are designed to be generated by the program. The order the program's instructions are executed in the sequential run is the sequential order of the program's instructions. A computer program may be split into several structures each consisting of several instructions of the computer program. Each of the program's structures usually, but not necessarily, have a well defined task.
In general, a program can be described as a set of structures, along with their respective relationships and interconnections. In addition, due to the nature of these interconnections, a program can be also described as a hierarchy of several levels. In this case, the program set of structures, is distributed over these levels, where each structure is connected to one or more of the structures located in the level above it. This hierarchy is defined by the order that these structures are to be executed.
FIG. 1A
illustrates a naive example of hierarchical program (
90
) where each level consists of one structure and
FIG. 1B
illustrates another, more complex, hierarchical program (
20
) where level (
22
) contains two parallel structures (A (
24
) and B (
26
)), and level (
28
) which contains two parallel levels (C (
30
) and D (
32
)).
A general computer program may contain two or more parallel structures, as is exemplified in FIG.
1
B. In the more general case, a program's structure may include several levels each containing two or more parallel structures.
A thread is a sequence of structures that are to be executed one after the other in a sequential fashion. Thus, a thread may consist of a sequence of structures that belong to consecutive levels in the hierarchy, and which are connected to each other. The results that are generated during the execution of this well ordered sequence of a program's structures are the thread's results. Reverting now to
FIG. 1A
, the program consists of only one thread starting with a first structure called begin (
12
), and its last structure is the end structure (
14
) of the program. This thread is called a total thread seeing that it concerns only one total order.
In the other example of
FIG. 1B
, the program has two different threads {A, C} and {B, D}. A thread is a program segment defined to execute as a ‘light’ program, with its own local variables, possibly, but not necessarily, on a different processor. Thus, if a partition of the structures is given, a thread is an assignment of each partition structure to a processor. The partition should meet the requirement that its structures can be ordered in an order that does not contradict the order that is defined by the hierarchy. For example, with reference to
FIG. 1B
{A, C} and {B, D} is an adequate partition considering that A→C and B→D do not contradict the hierarchy of
FIG. 1B
, and accordingly the assignment of {A, C} to a first processor and {B, D} to a second processor is feasible.
In addition to the fact that the thread consists of several structures executed one after the other, the thread is also associated with a well defined memory domain. A cell is the smallest unit of the memory that the computer program refers to. The thread's memory domain is the part of the computer memory, which the thread writes to and/or reads to data.
Therefore, a thread is defined by the following three major components:
(1) The sequence order of thread's structures
(2) The thread's memory domain. This memory domain or parts of it may be used also by other threads
(3) The output domain where the thread writes its relevant results. The output domain is never used in a “read mode”
The thread's execution trace is a list of all its sequential structures' instructions that where executed during its full execution. Similarly, the program's execution trace is a list of all the program instructions that where executed during its full execution of the program. Here, each instruction is accompanied by:
(1) The appropriate time that it was executed (statement's execution time stamp)
(2) The ID of the thread that has executed this instruction, and
(3) The map of each of the thread's memory domain at each of the time stamps.
Part of the program's execution trace is the memory trace, which is the list of the memory maps, each taken in a different time, ordered sequentially.
In case the program contains at one of its points N parallel structures, then it can be split into at most N parallel threads. Therefore if a multi-processors computer is available to execute this program, then the OS can allocate each of the parallel threads to a different processor. Alternatively, in the case of single processor architecture, the OS can simulate the allocation of threads to respective processors.
Two parallel threads are connected to each other if parts of the memory domains overlap. These parts make-up the two-threads overlap memory domain or common memory. At a specific memory cell that belongs to the overlap memory domain of two threads, the following scenarios might happened:
(1) both threads write into this cell
(2) one writes into the cell and the other reads information out of it
(3) both read data from this memory cell
A data race between two parallel threads is the situation where the two threads are connected and both contain scenario (1) and/or scenario (2) on their overlap common memory. In this case, the two parallel threads compete, regardless of whether they are implemented in a single-processor or multi-processor architecture.
A competing point of two competing threads is the memory cell which belongs to their overlap memory domain and there is a data race on this cell. Two threads may have more then one competing point. For example, assume that structures A and B in
FIG. 1B
belong to two connected threads, TA and TB respectively. In case of scenario (1) if TA reads and TB writes to the same competing point, then TA can get the value of the contents of the competing point either before or after TB wrote values into this common cell, depends on the order of execution. Thus, when terminated, TA might contain different values at its memory domain for the different cases that might take place.
When parallel structures are allocated to different parallel processors, and if no synchronization exists, the parallel processors can start and end the execution of their allocated structures in some undetermined time, giving rise to different possible interleavings among the parallel structures and consequently to parallel threads. In the case that the parallel threads compete, the results of one or more of the threads may be different than that of sequential program results which is obviously undesired. Thus, in general, the existence of a competing point in a multi-threaded parallel program is a source for inconsistency in its results. Depending on the computer's OS's activities taking place at the same time that the program is executed, different resul
Azagury Alan C.
Factor Michael
Farchi Eltan
Talmor Varam
Beausoleil Robert
Browdy and Neimark
Chu Gabriel L.
International Business Machines Corporation Limited
LandOfFree
Timing related bug detector method for detecting data races does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Timing related bug detector method for detecting data races, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Timing related bug detector method for detecting data races will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2976541