System for automatically organizing data in accordance with...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06185560

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to information retrieval systems and, in particular, to “report mining” systems for processing report-based data so that the data is susceptible of electronic access and interrogation.
Much business information is contained in reports of many different types. For example, such reports may include external reports for use in communicating with the outside world, such as invoices, statements, purchase orders, financial reports and the like, and internal reports for use in management of the business. While such reports may be presented in printed media, they are typically generated with the aid of computers and are essentially page-based documents designed to present function-related information in a format easily understandable by the end user for satisfying the user's requirements. Thus, space-saving techniques are commonly used in the design of reports to fit them on a printed page. For example, headers are printed only at the beginning of a section or the top of the page; transactions of a particular type may be grouped together and labeled only once, etc. To make sense of the report, the end user mentally links these various pieces of information together as he or she reads.
Computer storage of such reports can be effected through a technology known as Computer Output to Laser disk (“COLD”) storage, but this technique treats computer reports under the same paradigm as any scanned document, i.e., the page paradigm. Formerly, report pages were often converted to a picture format (such as TIFF), which takes up a great deal of storage space. Today, most COLD systems continue under the page paradigm despite the fact that the format restriction has lifted, since it requires much less space to store a page of binary spool file than to store a picture format. When page-based COLD storage systems are asked to find a transaction that meets certain criteria, the computer retrieves either the line that relates to the header or the line that relates to the transaction. It is unable to link the two to put the information into the full context.
Furthermore, much information which is buried in reports is simply unavailable to computer access because, unlike a relational database, the report-based data is not organized in an easily searchable manner. It is possible to reorganize report information by rekeying it into other database-type systems for analysis, but this is an expensive and time-consuming process. Furthermore, the resulting database, while having many promising attributes for information retrieval, is designed to optimize the performance of on-line transaction processing systems, and not to support an end user's ad hoc problem-solving tasks. Also, relational databases typically lack the query tools necessary to empower end users, since they require a comprehension of the technical data schema of the database, which usually requires the services of a database expert.
Accordingly, report-mining systems have been provided which essentially process report-based data into a virtual database, which permits the data to be accessible for query by ordinary end users, as if the data were in a database, while retaining the inherent logic of the report design and the look and feel of the picture or image format of the report. One such report mining system is provided by Microbank Software, Inc. under the trade designation “STORQM 2.X” This system is based on the premise that there exists an organizational hierarchy in a report, i.e., that all of the data in a report appear in a structured fashion and are related by being in the same report. Thus, related data fields typically appear together on a report in a pattern of fields. The system defines a pattern as being a set of contiguous data fields, i.e., a block of data that can be defined over many contiguous lines, a single line or a portion of a line. Patterns can also be defined at certain pre-specified and fixed locations on a report page. The system operates to identify and define these patterns and the organizational hierarchy or “view”, that exists among them in a report, and then utilizes the pattern definitions and the hierarchy to create virtual “records” that can be derived in response to queries.
While that system is effective, it requires considerable user activity and entails certain inflexibilities and ambiguities. Thus, for example, the user must select from the report a collection of data blocks which the user believes should comprise a pattern, so that the user essentially manually initially determines the patterns before the system abstracts the patterns from the sample data blocks selected by the user.
Also, because of the way patterns are defined, the system can allow a region of text in a report to be matched by more than one pattern, which implies that the same region of text can have different semantic meanings. Furthermore, there is little indication in the system when a region that is meant to match a pattern does not, which impairs the confidence in the reliability of the extracted data. Also, there is no cross validation between patterns defined, i.e., many patterns can overlap in definition and or be exactly identical. Also, the hierarchies or “views” of the overall report abstracted by the system can overlap or even be exactly identical, which can impair the querying function.
During the data extraction process using the prior system the construction of virtual records can be upset by interrupting patterns, such as headers which are repeated for readability rather than because they carry useful information. Also, the system does not allow interruptions between lines in a multi-line pattern. Such interruptions can often occur in the form of page breaks and insignificant headers, thereby artificially forcing definition of separate patterns before and after the page break or header.
SUMMARY OF THE INVENTION
It is a general object of the invention to provide an improved information retrieval system of the report mining type which avoids the disadvantages of prior systems while affording additional structural and operating advantages.
An important feature of the invention is the provision of a system of the type set forth, which permits fully automated abstraction of patterns inherent in report-based data streams.
In connection with the foregoing feature, another feature of the invention is the provision of a system of the type set forth, which is line-centered, in that it abstracts patterns existing as complete text lines in a report.
In connection with the foregoing feature, yet another feature of the invention is the provision of a system of the type set forth, which stipulates that each and every line of data in a report must match a pattern, and which addresses exceptions by either creating a new pattern or modifying an existing pattern to include the exception.
Another feature of the invention is the provision of a system of the type set forth, which abstracts from a report a well-defined collection of non-overlapping patterns.
Another feature of the invention is the provision of a system of the type set forth, which effectively disregards page breaks and non-significant text blocks in defining patterns.
Still another feature of the invention is the provision of a system of the type set forth, which automatically generates virtual tables defining line patterns by type, based on location and frequency of occurrence in the report, and establishes links among those definitions to facilitate data extraction.
Another feature of the invention is the provision of a system of the type set forth, which creates a virtual database of structural patterns inherent in report-based data and generates virtual records from the virtual database in response to user queries.
In connection with the foregoing features, another feature of the invention is the provision of a method of processing report-based data to achieve the foregoing features.
Some of these and other features of the invention may be attained by providing, in a system of the type including a compu

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System for automatically organizing data in accordance with... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System for automatically organizing data in accordance with..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for automatically organizing data in accordance with... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2562039

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.