Data processing: financial – business practice – management – or co – Automated electrical financial or business practice or...
Reexamination Certificate
1995-06-30
2002-01-01
Trammell, James P. (Department: 2161)
Data processing: financial, business practice, management, or co
Automated electrical financial or business practice or...
Reexamination Certificate
active
06336094
ABSTRACT:
CROSS REFERENCE TO APPENDIXES
Appendixes A, B, C, and D, which are part of the present disclosure, consists of three sheets attached herein and are listings of the software aspects of the preferred embodiment of the present invention.
COPYRIGHT NOTICE
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to methods for recognizing and parsing information in a data file, in particular, a method for identifying information such as financial tables in a financial statement contained in an uncoded text file, and parsing and decomposing the information into its constituent parts.
2. Description of the Prior Art
Financial statements of a number of U.S. public corporations are now available electronically from a number of sources and can be obtained via the internet. In the future, all corporations will be required under the law to file their financial statements electronically. A financial statement is required to contain certain tables of information such as balance sheets, income statements, and cash flow statements, and there may be information explaining the tables and other pertinent information regarding the company.
In the electronic format, a file containing the financial statement is typically uncoded, meaning that there are no codes in the file specifically indicating the type of information represented by each line or column of text. Although the file is typically in plain ASCII text and ASCII text is conducive for reading by a person, it is not conducive for processing by a computer. In order to have the computer extract the desired information from the file, the content of the file must be identified, meaning that the various tables in the file must be recognized and the content within each table must be parsed and be broken down to their constituent parts. Once the data is recognized and broken down, it can be normalized and manipulated. For example, the normalized data can be placed in a spreadsheet program or a database program and the performance of the company can be illustrated and analyzed by various mathematical, statistical, or financial models. The relationship between various financial statement entries can be compared and hypothetical situations can be generated and tested. Furthermore, industry analysis can be performed as well by gathering and collating data from the financial statements of several companies. Thus, there is great incentive for identifying and parsing the content of a file containing a financial statement.
There are two important considerations in the process of identifying and parsing of a file containing a financial statement. The first consideration is speed; the second consideration is accuracy.
Once the financial statement of a company is released, it will have immediate impact upon the valuation of the stock of the company. It may also, when combined with information relating to other companies, impact the valuation of the industry. Thus, it is time-critical to have the financial statement available in a form that can be manipulated for analysis. Furthermore, if a large number of financial statements must be processed, a method for processing of the statements must have reasonable computational speed. The financial statement must also be accurately recognized and processed. Inaccurate financial information can have a disastrous impact on the decision making process. It is therefore important that means be available for facilitating timely and accurate analysis of the statements.
A method currently employed by a database company for processing financial statements requires that the information be categorized and manually entered. This is a labor-intensive process that is slow and prone to human error. Hence, there is a need for a fast and accurate method for recognizing and parsing of files containing financial statements.
There are several problems associated with the processing of a file containing a financial statement. First of all, a file containing a financial statement would include tables such as balance sheets, income statements, and cash flow statements. These tables and their locations must be identified and the line items that compose these tables must be identified as well. Referring to
FIG. 1
a
, a portion of an ASCII file containing a balance sheet is illustrated. Within each table, there may be several years of information set out in column form with column headers. The column headers and boundaries for each column need to be identified in order to identify the content of each column for each line item. Note that although the ASCII files may contain some codes indicated in angle brackets, these codes are not always present and are not sufficient as indicators for a program to properly parse the information in the files.
Another problem in the processing of the file is that each entry or line item in the table needs to be identified and recognized. Because the label of a line item in the table may be longer than one line of text, running over to two or more lines of text, the several lines of text need to be properly amalgamated to form the label.
After the entries for a table have been identified, the components of the table and the relationship among the components needs to identified. One approach to this problem is to parse the mathematical structure of the table. In the prior art, parsing typically starts from the top of the table and proceeds to the bottom of the table. This approach proves to be time-consuming and the results produced are unsatisfactory. If there is a mistaken assumption made at the beginning of the parsing process, the mistaken assumption may not be discovered until further down the table, wasting previous efforts. In addition, the number of permutations of parsing path possibilities for this approach is quite large.
After the components making up the table are verified by the parsing process, the components composing the table must be identified and categorized so that the computer can properly process the data.
SUMMARY OF THE INVENTION
It is therefore an objective of the present invention to provide an automated method for identifying financial statements stored in uncoded electronic format such as an ASCII file.
It is another objective of the present invention to provide an automated method for identifying financial tables such as balance sheets, income statements, and cash flow statements of a financial statement stored in uncoded format.
It is yet another objective of the present invention to provide an automated method for identifying the line items that compose a financial table.
It is still another objective of the present invention to provide an automated method for amalgamating several lines of text to form the label of a line item.
It is still another objective of the present invention to provide an automated method for parsing the mathematical structure of a financial table.
It is still another objective of the present invention to provide an automated method for recognizing the components of the tables.
Briefly, a preferred embodiment of the present invention provides a process for processing a file containing a financial statement in uncoded format such as a financial statement stored in an ASCII file. Referring to
FIG. 2
, the starting locations of the tables in the financial statement as indicated by their table titles are identified (block
10
). When all the table titles are identified, a table title is then selected for processing (block
12
). Typically after the table title, there are the associated column headers for the table, and they are analyzed and determined (block
14
). After the column headers, there are lines of text that need to be differentiated into line item
Ferguson Don Carl
Kornfeld William
Elisca Pierre E.
Johnson John M.
Kaye Scholer LLP
Price Waterhouse World Firm Services BV. Inc.
Silberman Gregory
LandOfFree
Method for electronically recognizing and parsing... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for electronically recognizing and parsing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for electronically recognizing and parsing... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2843390