Machine learning of document templates for data extraction

Image analysis – Learning systems – Trainable classifiers or pattern recognizers

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S209000

Reexamination Certificate

active

07149347

ABSTRACT:
The present system can perform machine learning of prototypical descriptions of data elements for extraction from machine-readable documents. Document templates are created from sets of training documents that can be used to extract data from form documents, such as: fill-in forms used for taxes; flex-form documents having many variants, such as bills of lading or insurance notifications; and some context-form documents having a description or graphic indicator in proximity to a data element. In response to training documents, the system performs an inductive reasoning process to generalize a document template so that the location of data elements can be predicted for the training examples. The automatically generated document template can then be used to extract data elements from a wide variety of form documents.

REFERENCES:
patent: 5140650 (1992-08-01), Casey et al.
patent: 5159667 (1992-10-01), Borrey et al.
patent: 5258855 (1993-11-01), Lech et al.
patent: 5369508 (1994-11-01), Lech et al.
patent: 5416849 (1995-05-01), Huang
patent: 5625465 (1997-04-01), Lech et al.
patent: 5692073 (1997-11-01), Cass
patent: 5721940 (1998-02-01), Luther et al.
patent: 5748809 (1998-05-01), Hirsch
patent: 5768416 (1998-06-01), Lech et al.
patent: 5848186 (1998-12-01), Wang et al.
patent: 5852676 (1998-12-01), Lazar
patent: 5950196 (1999-09-01), Pyreddy et al.
patent: 6009196 (1999-12-01), Mahoney
patent: 6064497 (2000-05-01), Hannah
patent: 6094505 (2000-07-01), Lech et al.
Kopec et al. Document Image Decoding Approch to Character Template Estimation, IEEE 0-7803-3258, pp. 213-216.
Li et al. A Document Classification and Extraction system with Learning Ability, New Jersey Institute of Tecnology, pp. 1-4.
Dengel, A., “ANASTASIL: A System for Low-Level and High-Level Geometric Analysis of Printed Documents,” inStructured Document Image Analysis, H.S. Baird, H. Bunke, K. Yamamoto (Eds.), Springer-Verlag, Berlin, 1992.
Esposito, F., Malerba, D., and Semeraro, G., “Multistrategy Learning for Document Recognition,”Applied Artificial Intelligence, vol. 8, pp. 33-94, 1994.
Summers, K., “Near-Wordless Document Structure Classification,”in Proc. of the 3rd Int. Conf. On Document Analysis and Recognition, IEEE Computer Society Press, Los Alamitos, CA, 1995.
Wnek, Janusz, “High Performance Inductive Document Classifier,”0SAIC Science and Technology Trends II, 5 pp., Abstract [retrieved on Oct. 5, 2001], Retrieved from the Internet: http://publications.saic.com/satt.nsf/06.../6cae12cebbf7494f88256665007e88c2?Open Document (1 p.).
“Unstructured Document Processing,” © 1995-2001 [retrieved on Aug. 29, 2001], Retrieved from the Internet: http://www.miteksys.com/dynafind.html, 2 pages.
“Mitek Announces New Product Strategy Focused on E-Commerce Solutions—New Partnerships and Products Expected to Contribute to Future Revenue,” Release Date: Feb. 23, 2000 [retrieved on Mar. 27, 2001], Retrieved from the Internet: http://www.miteksys.com/press49.html, 6 pages.
“How to Automatically Process Unstructured Forms,” techinfocenter.com, © 2000, 16 pages.
“Processing Unstructured Documents,” A Technical White Paper, May 6, 1999, 8 pages.
“A Perspective on Document Understanding,” A Technical White Paper, Apr. 26, 1999, 7 pages.
Hurst, Matthew Francis, “The Interpretation of Tables in Text,”PhD Thesis, the University of Edinburgh, 2000, 325 pp.
Lopresti, Daniel and Nagy, George, “A Tabular Survey of Automated Table Processing,” inGraphics Recognition: Recent Advances, Springer-Verlag, Berlin, 2000, vol. 1941 of Lecture Notes in Computer Science, pp. 93-120.
Wnek, Janusz, “InTeGen v.1.2: Inductive Template Generator,”User's Guide, SAIC CM No.: SAIC-99/1346&-InTeGen-UG-01-U-R0C2, SAIC, Vienna, VA, Nov. 24, 1999, 23 pp.
Kieninger, Thomas G., “Table Structure Recognition Based on Robust Block Segmentation,” 1998,Proceedings of SPIE, vol. 3305, Document Recognition V, pp. 22-32.
Hori, Osamu and Doermann, David S., “Robust Table-Form Structure Analysis Based on Box-Driven Reasoning,” 1995, ICDAR-95 Proceedings, pp. 218-221.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Machine learning of document templates for data extraction does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Machine learning of document templates for data extraction, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Machine learning of document templates for data extraction will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3666413

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.