Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1999-10-28
2003-07-29
Beausoliel, Robert (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S038110, C702S185000
Reexamination Certificate
active
06601188
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to methods of analysis of defects (debugging) in an operating system for computers, which is especially useful for debugging a multitasking operating system which supports multiple processes concurrently and lacks the support for registering a call back function through the operating system's exception handler.
BACKGROUND OF THE INVENTION
Debugging an application is an integral part of its development. A considerable amount of resources are spent testing an application before it is considered ready for the market. During testing, it is common to have numerous test cases which are run against the product in order to flush out as many bugs as possible. If a test case fails due to an application exception, it is often difficult to recreate the problem inside of a debugger. In order to catch an exception in a debugger, the debugger must be attached to the running process and run along with it in memory. Also, if the application has multiple processes a debugger must be in control of each of the processes in case one of them encounters an exception. When running a large number of test cases, this quickly becomes a problem.
Unfortunately, program exceptions still sometimes occur on a customer's system. Because of this, remote debugging is also an important part of the software industry. Program exceptions can occur for a number of reasons including operating system bugs, hardware failures, and program defects.
When a program receives an exception at a customer site, it is often on a production system and getting the problem diagnosed and resolved expeditiously is very important. One of the main inhibitors of quick resolution is the lack of accurate information. Stopping a process in the exact state of the exception is both very useful and can be done for multiple processes.
Once a process has been stopped, then information such as the virtual memory (including the stack and heap), register dumps and libraries for the process can be dumped into files which can then be packaged up and sent back to the vendor or the product.
Windows® has a feature which allows a program to register itself for use with the operating system exception handler. This program will then get called in the event that an exception occurs. The program can then choose to attach a debugger or other program to the process which caused the exception. For operating systems which do not support the registration of a call back function with the operating system this can be done by using other methods. In the Sun Microsystem Solaris Operating Environment (an operating system), a set of signals or a set of faults can be traced which has the effect of stopping the process if it encounters one of the signals or faults in the set. This can be done for any number of processes.
When the operating system detects an exception, it stops the process in the exact state that caused the exception. A monitor program such as comprised in the invention described below can then detect that the process has stopped and dump relevant information as well as notify the user that a process has stopped. If the process which has stopped due to an exception is on a development system a debugger can be attached. If the process resides on a customer system, the monitor program can dump the process's information which can then be sent to a developer for further study.
By using the operating system's ability to stop a process in the event that it hits an exception, a set of processes can be “watched” or monitored by a single external program. In the event that one of the monitored processes stops due to an exception, the monitor program can then take action. The resulting action of the monitor program is configurable and depends on the specific need of the user.
If the user of the monitor program is a developer, then dumping information for the trapped process may not be necessary. Instead, the monitor program can be run such that it does not dump any information but only notifies the user that a process has stopped. The developer can then attach a debugger to the process and gather information about the exception through the debugger.
Debugging a exception remotely such as at a customer's site is often a difficult undertaking. Typically, an application will install a signal handler for each process which can then dump out relevant information such as a stack trace, register dumps, etc. This type of signal handler gets called when certain signals are encountered, usually indicating a program exception. This information is then sent back to the vendor where debugging is attempted. One of the problems with this method is that a signal handler is called which changes the process from it's original state where it received the exception making some of the information stale. Another problem is that sometimes the stack trace for a process is corrupted and the process is not able to dump relevant information for itself. Since this form of debugging is remote, attaching a debugger is often not possible.
In the case of remote debugging, the monitoring process can initiate dumping of the process's virtual memory, call stack, current registers, etc. This information is dumped into files created under a directory structure which can then be packaged up and sent back to the vendor. The information can include but is not limited to: virtual memory, register dumps, call stack, reason for dumping, and libraries which were loaded during the exception.
Currently debuggers are designed to control one process and its children. Controlling a related set of processes entails attaching multiple debuggers, one per process or attaching to the first process and allowing the debugger to follow each child as needed. Even in the case where the debugger follows each child, one debugger is needed for each process which is expensive on the system resources. This method can change the environment enough to prevent the occurrence of the exception of interest.
If the system is remote and the starting of a debugger is not an option, the application has to rely on the signal handlers to dump as much information as possible. As previously mentioned, this method can be unreliable and the information for the process can be somewhat stale.
In contrast to the UNIX operating environment, in Windows an operating system exception handler is provided with which a debugging utility can register itself with in order that the debugging utility can be notified when an application running under the operating system encounters an exception. An exception can be due to a hardware fault or an illegal machine instruction or invalid memory access which can cause an interruption to the operation of the application program.
U.S. Pat. No. 5,526,485 assigned to Microsoft Corporation appears to disclose a debugging system in which the monitor program registers itself with the operating system, to be called by the operating system in response to exceptions generated by an application program running on the operating system. When called in this manner, the monitor program first checks to see if a debugging program is already running. If one is, the monitor program returns and the operating system calls any remaining registered programs, such as the running debugging program. If there are no running debugging programs, the monitor program loads and starts a debugging program to debug the previously loaded application program. This is distinguishable from the present invention which does not require the monitor program to register itself with the operating system, and is not called by the operating system in response to an exception.
Presently, for the UNIX operating system and other UNIX-like or UNIX derived operating systems like SOLARIS, or AIX, HP-UX, IRIX, etc. there is no support for the registering of a monitor program utility with the operating system. Accordingly if an exception occurs in the running of an application or process in these operating systems only that application or process is notified of the exception. No notifi
Beausoliel Robert
Duncan Marc
LandOfFree
Method and apparatus for external crash analysis in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for external crash analysis in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for external crash analysis in a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3023922