Extraction of information from documents

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

07469251

ABSTRACT:
An information extraction model is trained on format features identified within labeled training documents. Information from a document is extracted by assigning labels to units based on format features of the units within the document. A begin label and end label are identified and the information is extracted between the begin label and the end label. The extracted information can be used in various document processing tasks such as ranking.

REFERENCES:
patent: 6189002 (2001-02-01), Roitblat
patent: 6651057 (2003-11-01), Jin et al.
patent: 7062485 (2006-06-01), Jin et al.
patent: 2006/0224605 (2006-10-01), Marcy et al.
Y. Li, H. Zaragoza, R. Herbrich, J. Shawe-Taylor, & J. Kandola. The Perceptron Algorithm with Uneven Margin. In Proceedings of the Nineteenth International Conference on Machine Learning, pp. 379-386, 2002.
S. Robertson, H. Zaragoza, and M. Taylor, Simple BM25 Extension to Multiple Weighted Fields, In Proceedings of ACM Thirteenth Conference on Information and Knowledge Management, CIKM 2004.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Extraction of information from documents does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Extraction of information from documents, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Extraction of information from documents will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4050137

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.