Categorization based text processing

Data processing: artificial intelligence – Knowledge processing system – Knowledge representation and reasoning technique

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C706S045000, C706S048000

Reexamination Certificate

active

06618715

ABSTRACT:

DESCRIPTION
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to extracting formatted information from unformatted text files, where the appropriate formatting processor is determined by categorizing the textual input into one or more predefined categories.
2. Background Description
Natural language computer interfaces require a natural language analysis engine that can analyze user input text, extract and format information that drives some back end application or process. User input text could be derived, for example, from the output of a speech recognizer or other system that generates text, e.g., an optical character recognition (OCR) system. There is no solution to the general problem of understanding natural language via a computer program. There are two main basic approaches to the problem of computer-based natural language analysis:
(1) Use a general purpose grammar/parser of a particular language and then interpret the output of the parser with a semantic interpreter that uses domain specific knowledge to build an internal, formatted representation of the information needed by the back-end applications or processes. General English parsers are described, for example, by Michael C. McCord in “Slot Grammar: A system for simpler construction of practical natural language grammars”, pp. 118-145 in
Natural Language and Logic: International Scientific Symposium, Lecture Notes in Computer Science
, R. Studer, Editor, Springer Verlag, Berlin (1990). The problem with this approach is that general purpose natural language grammars/parsers will typically deliver a large number of parses or structures, representing high level syntactic information; e.g., subject-verb-object-modifier patterns, all but one or a few of which must then be eliminated by the post-parsing semantic interpretation process. This can be extremely computationally inefficient.
(2) Build special purpose so-called semantic grammars that are much less ambiguous than general grammars and support very simple semantic interpretation processes. Semantic grammars are discussed by J. S. Brown, R. R. Burton and J. De Kleer in “Pedagogical, Natural Language and Knowledge Engineering Techniques in Sophie I, II, and III”, in Intelligent Tutoring Systems, D. Sleeman and J. S. Brown, Editors, Academic Press, London (1982). The problem with semantic or domain-specific grammars is that a new one must be built for each domain; i.e., there is a portability issue.
There are significant practical problems with both approaches in many real world applications that use natural language interfaces for input. In many real world applications, e.g., electronic mail (e-mail) auto-response or auto-routing systems, or Web-based (the World Wide Web (WWW) portion of the Internet, or simply “the Web”) self-service product and services ordering applications, a user input could be about a variety of topics and even worse a single input might refer to a number of topics. For a general purpose parser-based system, the issue is how to invoke the right semantic processing routines in an efficient manner. For a special-purpose semantic grammar-based system, the issue is how to invoke the right grammar(s) for interpretation. Running all the grammars on the data is in general extremely inefficient and can lead to errors in interpretation.
David D. Lewis and Richard M. Tong in “Text Filtering in MUC-3 and MUC-4”, pp. 51-66, in
Fourth Message Understanding Conference
(MUC-4), McLean, Va., Jun. 16-18, 1992, describe the emergence of text filtering as an explicit topic of discussion. The processes described, however, do not lend themselves to a solution to the problem of how to invoke the right semantic processing routines in an efficient manner. In the processes described, text documents are categorized into only two types: relevant versus non-relevant. Documents considered relevant are then processed by natural language processing algorithms. There is no suggestion of invoking non-linguistic processes based on categorization; e.g., invoking database queries to gather information for back end application or for humans is not part of the message understanding work. Dynamically categorizing an input document into zero, one or more categories is also not suggested by the message understanding work, nor is the assignment of confidence labels.
What is needed is a configurable system that can efficiently and effectively determine for a given electronically represented text document (e-mail, Web form, scanned facsimile, output of speech recognition, etc.) which linguistic analysis and extraction processes, and even other application specific processes, should be invoked.
SUMMARY OF THE INVENTION
It is therefore an object of the invention to provide a configurable system that can efficiently and effectively determine for a given electronically represented text document which linguistic analysis and extraction processes should be invoked.
It is another object of the present invention to provide a rules based system that can efficiently and effectively determine for a given electronically represented text document which application specific processes should be invoked to provide more accurate answers to a user's query.
Assuming a rules based classifier, where each category or topic is represented by a set of rules, in the preferred embodiment of the invention in applications, e.g., routing, the categorization effecting the routing can be effectively combined with processes extracting other information. For example, if a user sends an e-mail asking about “apply for new home mortgage”, the categorization component would identify the general topic for routing as “Home Mortgage” and also invoke extractors extracting name, and other information of relevance for new home mortgage applications. Such information may include, for example, any information indicating the amount of the desired mortgage, whether the person is a current bank customer, location of the property, and the like. In contrast, if the person specifies an interest in “refinancing their current home mortgage”, the categorizer might also place this in the “Home Mortgage” category but invoke extractors specific to refinancing inquiries.


REFERENCES:
patent: 5778157 (1998-07-01), Oatman et al.
patent: 6161130 (2000-12-01), Horvitz et al.
D. Lewis et al., “Text Filtering in MUC-3 and MUC-4”, pp. 51-66, in Fourth Message Understanding Conference (MUC-4), McLean, Virginia, Jun. 16-18, 1992.
M. McCord, “Slot Grammar: A system for simpler construction of practical natural language grammars”, pp. 118-145 inNatural Language in Computer Science, R. Studer, Editor, Springer Verlag, Berlin (1990).
J.S. Brown, et al., “Pedagogical, Natural Language and Knowledge Engineering Techniques in Sophie I, II, and III”, inIntelligent Tutoring Systems, D. Sleeman and J.S. Brown, Editors, Academic Press, London (1982).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Categorization based text processing does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Categorization based text processing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Categorization based text processing will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3049651

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.