Architecture of a framework for information extraction from...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S009000

Reexamination Certificate

active

06553385

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to knowledge information processing and, more particularly, to a general architecture of a framework for information extraction from natural language (NL) documents. The framework can be configured and integrated in applications and may be extended by user built information extractors.
2. Background Description
Businesses and institutions generate many documents in the course of their commerce and activities. These are typically written for exchange between persons without any plan for machine storage and retrieval. The documents, for purposes of differentiation, are described as “natural language” documents as distinguished from documents or files written for machine storage and retrieval.
Natural language documents have for some time been archived on various media, originally as images and more recently as converted data. More specifically, documents available only in hard copy form are scanned and the scanned images processed by optical character recognition software to generate machine language files. The generated machine language files can then be compactly stored on magnetic or optical media. Documents originally generated by a computer, such as with word processor, spread sheet or database software, can of course be stored directly to magnetic or optical media. In the latter case, the formatting information is part of the data stored, whereas in the case of scanned documents, such information is typically lost.
There is a significant advantage from a storage and archival stand point to storing natural language documents in this way, but there remains a problem of retrieving information from the stored documents. In the past, this has been accomplished by separately preparing an index to access the documents. Of course, the effectiveness of this technique depends largely on the design of the index. A number of full text search software products have been developed which will respond to structured queries to search a document database. These, however, are effective only for relatively small databases and are often application dependent; that is, capable of searching only those databases created by specific software applications.
The natural language documents of a business or institution represents a substantial resource for that business or institution. However, that resource is only [a] as valuable as the ability to access the information it contains. Considerable effort is now being made to develop software for the extraction of information from natural language documents. Such software is generally in the field of knowledge based or expert systems and uses such techniques as parsing and classifying. The general applications, in addition to information extraction, include classification and categorization of natural language documents and automated electronic data transmission processing and routing, including E-mail and facsimile.
SUMMARY OF THE INVENTION
It is therefore an object of the present invention to provide a framework for information extraction from natural language documents which is application independent and provides a high degree of reusability.
It is another object of the invention to provide a framework for information extraction which integrates different Natural Language/Machine Learning techniques, such as parsing and classification.
According to the invention, there is provided an architecture of a framework for information extraction from natural language documents which is integrated in an easy to use access layer. The framework performs general information extraction, classification/categorization of natural language documents, automated electronic data transmission (e.g., E-mail and facsimile) processing and routing, and parsing.
Inside the framework, requests for information extraction are passed to the actual extractors. The framework can handle both pre- and post processing of the application data, control of the extractors, enrich the information extracted by the extractors. The framework can also suggest necessary actions the application should take on the data. To achieve the goal of easy integration and extension, the framework provides an integration (outside) application program interface (API) and an extractor (inside) API. The outside API is for the application program that wants to use the framework, allowing the framework to be integrated by calling simple functions. The extractor API is the API for doing the actual processing. The architecture of the framework allows the framework to be extended by providing new libraries exporting certain simple functions.


REFERENCES:
patent: 4736320 (1988-04-01), Bristol
patent: 4965763 (1990-10-01), Zamora
patent: 5371807 (1994-12-01), Register et al.
patent: 5680628 (1997-10-01), Carus et al.
patent: 5682539 (1997-10-01), Conrad et al.
patent: 5974412 (1999-10-01), Hazlehurst et al.
patent: 5991710 (1999-11-01), Papineni et al.
patent: 6006221 (1999-12-01), Liddy et al.
patent: 6052693 (2000-04-01), Smith et al.
patent: 6070133 (2000-05-01), Brewster et al.
patent: 6076088 (2000-06-01), Paik et al.
patent: 6081773 (2000-06-01), Hirai et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Architecture of a framework for information extraction from... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Architecture of a framework for information extraction from..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Architecture of a framework for information extraction from... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3022827

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.