Debugging tool for linguistic applications

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C717S152000, C717S152000, C703S022000, C704S008000, C704S009000

Reexamination Certificate

active

06286131

ABSTRACT:

FIELD OF THE INVENTION
This invention relates generally to debugging tools and, more particularly, relates to a debugging tool for use in connection with linguistic applications.
BACKGROUND OF THE INVENTION
Evaluation of linguistic or natural language processing (“NLP”) applications, e.g. spell checker, grammar checker, etc., plays an increasingly important role in both the academic and industrial natural language communities. Specifically, the growing language technology industry needs measurement tools to allow researchers, engineers, managers, and customers to track development, evaluate and assure quality, and assess suitability for a variety of applications. Currently, two tools are used for evaluating and testing NLP applications, namely, test suites and test corpora. Test suites can generally be described as focused data sets made up by researchers, etc. for testing a specific aspect of a NLP application while test corpora can generally be described as naturally occurring sets of text.
One specific approach for evaluating NLP applications is discussed in a paper entitled “TSNLP-Test Suites for Natural Language Processing” by Lehmann et al., published on Jul. 15, 1996, which paper is incorporated herein by reference in its entirety. The TSNLP approach is based on the assumption that, in order to yield informative and interpretable results, any test items used for an actual test or evaluation must be specific to the application and the user since every NLP application (whether commercial or under development) exhibits specific features which make it unique and every user (or developer) of a NLP system has specific needs and requirements. The TSNLP approach is also guided by the need to provide test items that are easily reusable.
To achieve these two goals of specificity and reusability, the TSNLP paper suggests the abandonment of the traditional notion of test items as a monolithic set in favor of the notion of a database in which test items are stored together with a rich inventory of associated linguistic and non-linguistic annotations. The test item database thus serves as a virtual database that provides a means to extract relevant subsets of the test data suitable for some specific task. Using the explicit structure of the data and given TSNLP annotations, the database engine allows for the searching and retrieving of data from the virtual database, thereby creating a concrete database instance according to arbitrary linguistic and extra-linguistic constraints.
To provide for the control over the test data when performing an evaluation of an NLP application, the TSNLP paper emphasizes the value of using test suites in lieu of test corpora since test suites provide the ability to focus on specific linguistic phenomena. This focus is particularly achieved by following the requirement that as many linguistic parameters as possible within the test suite be kept under control. For example, since vocabulary is a controllable linguistic parameter, the TSNLP approach requires the restriction of vocabulary in size as well as domain. Additionally, the TSNLP approach attempts to control the interaction of phenomena by requiring that the test items be as small as possible.
The TSNLP paper also suggests the desirability of providing progressivity that is the principle of starting from the simplest test items and increasing their complexity. In the TSNLP approach, this aspect is addressed by requiring that each test item focus only on a single phenomenon that distinguishes it from all other test items. (For each phenomenon within a test item the application under test should generate a phenomenon response, e.g., for each misspelled word within a sentence a spell checker should generate a list of alternative word suggestions). In this manner, test data users apply the test data in a progressive order resulting in the special attribute presupposition in the phenomenon classification.
While the approach for evaluating NLP applications as taught in the TSNLP paper does work for its intended purpose, the above-noted requirements cause the TSNLP approach to suffer the disadvantage of not allowing for the efficient testing of real user sentences with multiple errors on a large scale. In addition, since the base TSNLP approach only provides for queries that tally failures, the TSNLP approach for evaluating NLP applications provides information which may not completely reflect the behavior of the NLP application. For example, a test suite comprising “This are a test.” may produce an actual result of “This is an test.” when utilized as an input to an NLP application which, utilizing the TSNLP approach, would result in a flagged Subject-Verb failure without alerting the developer that the NLP application had a failed A/An correction and a bad rewrite. This inability to track uncommon patterns in the behavior of an NLP application on a more granular level renders the TSNLP approach for evaluating NLP applications susceptible to minor changes in the output of the underlying NLP application. Accordingly, the TSNLP a approach still requires an undesirably large amount of resources and time to identify and fix individual symptom bugs in an NLP application.
These deficiencies are also found in another tool for tracking problems found when evaluating NLP applications, dubbed “RAID”, which has been used internally within Microsoft. Specifically, RAID similarly requires that each test item focus only on a single phenomenon which distinguishes it from all other test items. This is required because the database scheme and associated simple querying method implemented in RAID fails to allow for the tracking of complex relationships between system bugs, which the user sees, and underlying product bugs. Accordingly, RAID likewise suffers the disadvantage of not allowing for the efficient testing of real user sentences with multiple errors on a large scale. Furthermore, the base implementation of RAID also is limited to queries that tally failures which, as discussed previously, renders this method of evaluating NLP applications highly susceptible to minor changes in the output of underlying NLP application(s).
SUMMARY OF THE INVENTION
To overcome these noted disadvantages and deficiencies, the present invention is directed to a tool which can be used to automatically generate information useful as an aide in debugging a computer-executable application. The computer-executable application, preferably in the form of an NLP application, accepts an input and generates an output as a function of the input. In particular, the tool is preferably used to initiate the execution of the computer-executable application and supply as an input thereto test objects selected from a database. The output generated as a result of the test objects is formed into actual result objects. The tool then initiates a comparison between the actual result objects and either expected result objects, which have been generated by the developer, etc., or to archived actual result objects, which were generated by a previous use of the tool.
By comparing the actual result objects with the expected result objects and/or the actual result objects with the archived actual result objects meaningful information is generated which is valuable for use in evaluating the current state of the computer-executable application. For example, the comparisons will have the effect of notifying developers of new bugs and/or suspicious patterns in the behavior of the application which may be indicative of bugs. In addition, developers can utilize the comparison results to get a better sense of the magnitude of the user-perceived impact of a bug, because the tests can be run over a set of real-world test inputs that are balanced to represent a real user corpora. Furthermore, the comparison results will provide developers with the ability to readily discern the impact of a fix to the application.
The ability to track this information is further enhanced by providing the various objects with various tags to which labels may be dynamically mapped. In this mann

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Debugging tool for linguistic applications does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Debugging tool for linguistic applications, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Debugging tool for linguistic applications will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2510667

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.