Electrical computers and digital processing systems: multicomput – Computer-to-computer data routing – Least weight routing
Reexamination Certificate
2001-11-08
2004-10-19
Follansbee, John (Department: 2126)
Electrical computers and digital processing systems: multicomput
Computer-to-computer data routing
Least weight routing
C702S182000, C714S039000, C717S124000
Reexamination Certificate
active
06807583
ABSTRACT:
FIELD OF THE INVENTION
This invention relates generally to process execution and more particularly to determining causality for information stored during concurrent and distributed software process execution.
BACKGROUND OF THE INVENTION
In application execution and analysis, tracing is a term having many similar but distinct meanings. Tracing implies a following of process execution. Often such tracing incorporates recording information relating to a process during execution. In essence, a process that executes and has information there about recorded is considered a traced process.
In the past, tracing of computer software application programs has been performed for two main purposes—debugging and optimisation. In debugging, the purpose of tracing is to trace back from an abnormal occurrence—a—bug to show a user a flow of execution that occurred previous to the abnormal occurrence. This allows the user to identify an error in the executed program. Unfortunately, commands executed immediately previous to an abnormality are often not a source of the error in execution. Because of this, much research is currently being conducted to better view trace related data in order to more easily identify potential sources of bugs.
Debuggers are well known in the art of computer programming and in hardware design. In commonly available debuggers, a user sets up a trace process to store a certain set of variables upon execution of a particular command while the program is in a particular state. Upon this state and command occurring, the variables are stored. A viewer is provided allowing the user to try to locate errors in the program that result in the bug. Usually, debuggers provide complex tracing tools which allow for execution of a program on a line by line basis and also allow for a variety of break commands and execution options. Some debuggers allow modification of parameters such as variable values or data during execution of the program. These tools facilitate error identification and location.
Unfortunately, using multiprocessor or networked systems, it is difficult to ensure that a system will function as desired and also, it is difficult to ascertain that a system is actually functioning as desired. Many large, multiprocessor systems appear to execute software programs flawlessly for extended periods of time before bugs are encountered. Tracing these bugs is very difficult because a cause of a bug may originate from any of a number of processors which may be geographically dispersed. Also, many of these bugs appear intermittently and are difficult to isolate. Using a debugger is difficult, if not impossible, because multiple debugging sessions must be established and coordinated.
In contrast for optimisation, it is important to know which commands are executed most often in order to optimise a software program. For example, when an application during normal execution executes a first subroutine once, a second subroutine twice, and a third subroutine seventy times, each subroutine requiring a similar time span for execution, optimising the subroutine which runs seventy times is clearly most important. In system optimisation, tracing is not actually performed except in so far as statistics of routine execution and execution times are maintained. These statistics are very important because they allow for a directed optimisation effort at points where the software executes slowest or where execution will benefit most. Statistics as captured for program optimisation, are often useful in determining execution bottlenecks and other unobvious problems encountered. Examples of optimisation based modelling or tracing include systems described in the following references:
P. Dauphin, R. Hofmann, R. Klar, B. Mohr, A. Quick, M. Siegle, and F. Sotz. “ZM4/Simple: A general approach to performance measurement and evaluation of distributed systems.” In T. Casavant and M. Singhal, editors, Readings in Distributed Computing Systems, pages 286-309. IEEE Computer Society Press, Los Alamitos, Calif., 1994;
M. Heath and J. Etheridge. “Visualizing the performance of parallel programs.” IEEE Software, 8(5):29-39, September 1991;
C. Kilpatrick and K. Schwan. “ChaosMON—application-specific monitoring and display of performance information for parallel and distributed systems.” Proceedings of the ACMI ONR Workshop on Parallel and Distributed Debugging, May 1991; and,
J. Yan. “Performance tuning with an automated instrumentation and monitoring system for multicomputers AIMS.” Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences, January 1994.
Software performance models of a design prior to product implementation reduce risk of performance-related failures. Performance models provide performance predictions under varying environmental conditions or design alternatives and these predictions are used to detect problems. To construct a model, a software description in the form of a design document or source code is analysed and translated into a model format. Examples of model formats are a simulation model, queuing network model, or a state-based model like a Petri-Net. The effort of model development makes it unattractive, so performance is usually addressed only in a final product. This has been termed the “fix-it-later” approach and the seriousness of the problems it creates is well documented.
In order to determine that a process is in fact executing as desired or to construct a performance model for optimisation requires an understanding of causality within a software application. Commonly, the only causal connection determined automatically is precedence. For example, in determining system statistics, it is easily recorded which subroutine was executed when. This results in knowledge of precedence when the entire process is executed on a single processor. However, given this knowledge, it is difficult to determine anything other than precedence.
Time and Causality
For concurrent or distributed software computations a common synchronised time reference is unavailable. A system operating on the earth and another system operating in space illustrate this problem. When the system on earth performs an activity and transmits a message to the system in space, an evident time delay occurs between message transmission and message reception. Once a system is in space, synchronising its time source precisely with that of an earth bound system is difficult. When the system in space is moving, such a synchronisation is unlikely. A same problem, though on a smaller scale, exists in earth bound networks. Each computer is bound to an independent time source and synchronisation of time sources is difficult. With advances in computer technology and processing speeds, these synchronisation difficulties are becoming no less significant than those experienced with space bound systems.
The lack of a common time reference, as well as other problems with observing a distributed system, have led to a notion of causality that is probability based. This “probabilistic causality” is a probability estimate of an event having occurred. Probabilistic causality uses a database of information (e.g., application structure, network configuration), a sophisticated data reduction algorithm (i.e., expert system), and trace records to make an educated guess at the source of problems in a complex system based on observable events. Although probabilistic causality is useful for network fault diagnosis it should not be confused with the stricter definition of causality that is being espoused here which is not probability based. Examples of probabilistic causality are found in U.S. Pat. Nos. 5,661,668 and 5,483,637.
In order to determine causality, it is beneficial to determine which events happened before which other events, described here as precedence causality. Precedence is a commonly known form of causality; for example, an executable instruction is not executed until a previous instruction is executed given no branching instructions. This precedence based causality is used heavily for debugging. Often, on
Hrischuk Curtis
Woodside Charles Murray
Carleton University
Follansbee John
Kraft Clifford H.
Opie G. L.
LandOfFree
Method of determining causal connections between events... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of determining causal connections between events..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of determining causal connections between events... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3293542