System and method for lexing and parsing program annotations

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S009000

Reexamination Certificate

active

06353925

ABSTRACT:

The present invention relates generally to compilers and program analyzers, and more particularly to an improved system and method for lexing and parsing computer programs that include tool-specific annotations.
BACKGROUND OF THE INVENTION
A compiler or a source-level program analyzer is capable of parsing source programs, which are written in a particular programming-language. Compilers generally include a lexer and a parser. Similarly, other types of programming tools include a lexer and parser. The lexer reads the source-level program and generates tokens based upon the programming-language statements in the source-level program. The lexer passes the generated tokens to the parser, which assembles the tokens into an abstract syntax tree (AST). The abstract syntax tree is further processed by one or more tools, such as a compiler back-end or a program correctness tester.
Tool specific annotations are typically used in the source program to give the tools special instructions; for example, “generate the following machine code instruction at this point in the target code,” “generate code that uses a machine register for this program variable,” “ignore possible errors of type x in this program statement,” or “check that this parameter is always a non-zero integer.” As new tools are devised, and as new features are added to those tools, the lexer and parser used by the tools will often require corresponding revisions.
The present invention addresses the problem of revising the lexer and parser for a programming-language when new tools are created, or new annotation-based features are added to tools. In particular, using the present invention, tool-specific annotations are effectively separated from programming-language-specific statements. Further, the present invention makes it relatively simple to implement a wide range of tool-specific annotations, including annotations that employ a complex programming-language.
Two conventional approaches that allow tool-specific annotations are known. In a first approach, tool-specific annotations are recognized and processed by the lexer. In a second approach, tool-specific annotations are recognized and processed by the parser.
An example of the first conventional approach to supporting tool-specific annotations is the way a “#line N” tool-specific annotation may be handled by a C compiler. There, the C compiler lexer may keep track of the line number information of every token it recognizes. If the C compiler lexer reads the “#line N” annotation, then the C compiler lexer changes an internal counter to N, as if the next line were N, and proceeds to read the next token. Since the lexical structure of the “#line N” is so simple, a standard lexer, such as the C compiler lexer, can recognize the tool-specific annotation.
An example of the second conventional approach to supporting tool-specific annotations in a compiler is the way a compiler for the Modula-3 language handles an “<* ASSERT P *>” tool-specific annotation. It is treated as if it were a Modula-3 program statement. Although “P” is an expression, it can be parsed appropriately because the annotation is recognized by the Modula-3 parser.
The conventional methods for recognizing tool-specific annotations, while functional, are less than satisfactory in practice. If a new tool (such as a type-checker or an error-checker) is created for a particular programming-language, extensive recoding of the standard programming-language lexer and parser may be required to handle program annotations specific to that tool. Even a simple modification made to the syntax of the annotations used by an existing tool may require extensive modification of the lexer and parser of that tool.
SUMMARY OF THE INVENTION
In the system and methods of the present invention, tool-specific annotations are recognized by the lexer for the programming-language, but the lexing and parsing of the tool-specific annotations are handled by a separate, tool-specific annotation processor.
A compiler or other programming tool includes a lexer capable of detecting computer programming-language units present in a character stream. The lexer generates a stream of tokens based upon these units. The lexer is further capable of detecting the units of computer programming-language statements such as identifiers. As the lexer detects tool-specific annotations in the character stream, it passes them to the back-end annotation processor. The back-end annotation processor is designed to lex and parse the annotations for a specific tool (or set of tools). In a system having a plurality of tools that use different tool-specific annotations, the back-end of the system will have a corresponding set of tool-specific annotation processors.
When the back-end annotation processor receives a tool-specific annotation from the lexer, the annotation processor generates an annotation token based upon the tool-specific annotation and returns the annotation token to the lexer. The lexer in turn adds the annotation token to the end of a list of tokens it has generated so far. The lexer passes the mixed stream of tokens, some generated within the lexer, and some generated by the back-end annotation processor, to the parser. The parser assembles the stream of tokens and annotation tokens into an abstract syntax tree and passes the tree to the aforementioned tool. The tool processes the annotation tokens as well as the other tokens in the abstract syntax tree.
In a preferred embodiment, at least one of the annotation processors has the capability of generating an annotation token that includes an abstract syntax tree within the annotation token. The abstract syntax tree within the annotation token may be referred to as a secondary abstract syntax tree and the abstract syntax tree assembled by the parser may be referred to as the primary abstract syntax tree. In this embodiment, the annotation token including a secondary abstract syntax tree is incorporated into the primary abstract tree in a context-sensitive manner by the parser.
In a preferred embodiment, an annotation processor includes an annotation lexer and an annotation parser. Preferably, the annotation lexer is context-free and the annotation parser is context-sensitive.


REFERENCES:
patent: 5259766 (1993-11-01), Sack et al.
patent: 5325531 (1994-06-01), McKeeman et al.
patent: 5487000 (1996-01-01), Takahashi et al.
patent: 5748975 (1998-05-01), Van De Vanter
patent: 5752058 (1998-05-01), Van De Vanter
patent: 5761511 (1998-06-01), Gibbons et al.
patent: 5768564 (1998-06-01), Andrews et al.
patent: 5813019 (1998-09-01), Van De Vanter
patent: 5857212 (1999-01-01), Van De Vanter
patent: 6021286 (2000-02-01), Kay
patent: 6138272 (2000-10-01), Tonouchi
patent: 6149318 (2000-11-01), Chase et al.
patent: 6182281 (2001-01-01), Nackman et al.
Devanbu, Genoa—A Customizable, Front-End-Retargetable Source Code Analysis Framework, Apr. 1999, ACM, p. 177-212.*
Wile, Abstract Syntax from Concrete Syntax, 1997, ACM, P. 471-480.*
Blanche, Proof Nets for Controlling Ambiguity in Natural Language Processing, 1998, p. 459-465.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for lexing and parsing program annotations does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for lexing and parsing program annotations, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for lexing and parsing program annotations will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2881863

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.