Data processing: presentation processing of document – operator i – Presentation processing of document – Edit – composition – or storage control
Reexamination Certificate
2006-02-23
2010-06-22
Hutton, Doug (Department: 2176)
Data processing: presentation processing of document, operator i
Presentation processing of document
Edit, composition, or storage control
Reexamination Certificate
active
07743327
ABSTRACT:
In a method for identifying a table of contents in a document (10), text fragments are extracted (12) from the document. There are identified (20, 30, 34, 38): (i) a substantially contiguous group of text fragments as table of content entries and (ii) a different group of text fragments as linked text fragments linked with corresponding table of content entries. During the identifying, a number of text fragments that are candidates for identification as linked text fragments is reduced based on at least one reduction criterion (130). The identified table of contents entries and linked text fragments (110) are validated based on at least one validation criterion (162) related to distribution of the linked text fragments.
REFERENCES:
patent: 5434962 (1995-07-01), Kyojima et al.
patent: 5491628 (1996-02-01), Wakayama et al.
patent: 5832520 (1998-11-01), Miller
patent: 5923334 (1999-07-01), Luken
patent: 6298357 (2001-10-01), Wexler et al.
patent: 6336124 (2002-01-01), Alam et al.
patent: 6487566 (2002-11-01), Sundaresan
patent: 6490603 (2002-12-01), Keenan et al.
patent: 6539387 (2003-03-01), Oren et al.
patent: 2002/0143818 (2002-10-01), Roberts et al.
patent: 2003/0093760 (2003-05-01), Suzuki et al.
patent: 2003/0208502 (2003-11-01), Lin
patent: 2004/0003028 (2004-01-01), Emmett et al.
patent: 2004/0024780 (2004-02-01), Agnihotri et al.
patent: 2004/0205461 (2004-10-01), Kaufman et al.
patent: 2006/0253441 (2006-11-01), Nelson
Déjean et al., “Structuring Documents According to Their Table of Contents,” Doc. Eng. '05, Bristol, UK, Nov. 2-4, 2005.
Déjean et al., “A System for Converting PDF Documents into Structured XML Format,” 7thIAPR Workshop on Document Analysis Sytems, Nelson, New Zealand, Feb. 13-15, 2006.
Chanod et al., “From Legacy Documents to SML: A Conversion Framework,” 9thEuropean Conf. on Research and Advanced Technology for Digital Libraries, Vienna, Austria, Sep. 18-23, 2005.
Adler, S., et al., “Extensible stylesheet language (XSL), Version 1.0,” W3C 2001, http://www.w3.org/TR/2001/REC-xsl-20011015/.
Aiello, M., Monz, C., Todoran, L., Worring, M., “Document understanding for a broad class of documents”, International Journal on Document Analysis and Recognition (IJDAR), vol. 5, 2002, Springer-Verlag, pp. 1-16.
Anjewierden, A., “AIDAS: Incremental logical structure discovery in PDF documents”, Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), Seattle, 2001.
Belaïd, A., Pierron, L., Valverde, N., “Part-of-speech tagging for table of contents recognition”, International Conference on Pattern Recognition (ICPR 2000), vol. 4, Sep. 3-8, 2000 Barcelona, Spain.
Dori, D., Doermann, D., Shin, C., Haralick, R., Phillips, I., Buchman, M., Ross, D., “The representation of document structure: A generic object-process analysis”, Chapter XX,Handbook on Optical Character Recognition and Document Image Analysis, World Scientific Publishing Company, 1995/1996, pp. 000-000.
Dori, D., Doermann, D., Shin, C., Haralick, R., Phillips, I., Buchman, M., Ross, D., “The representation of document structure: A generic object-process analysis”, Chapter 16,Handbook of Character Recognition and Document Image Analysis, World Scientific Publishing Company, 1997, pp. 421-456.
Klink, S., Dengel, A., Kieninger, T., “Document structure analysis based on layout and textual features”, Pcroceedings of Fourth IAPR International Workshop on Document Analysis Systems, DAS 2000, Rio de Janeiro, Brazil, 2000, pp. 99-111.
U.S. Appl. No. 11/032,817, filed Jan. 10, 2005, DeJean et al.
U.S. Appl. No. 11/033,016, filed Jan. 10, 2005, Dejean et al.
U.S. Appl. No. 11/116,100, filed Apr. 27, 2005, Dejean et al.
U.S. Appl. No. 11/032,814, filed Jan. 10, 2005, Dejean et al.
U.S. Appl. No. 11/137,566, filed May 26, 2005, Meunier.
U.S. Appl. No. 10/756,313, filed Jan. 14, 2004, Chidlovskii et al.
Lin, C.C., Niwa, Y., Narita, S., “Logical structure analysis of book document images using contents of information”, 4thInternational Conference on Document Analysis and Recognition (ICDAR'97), Ulm, Germany, Aug. 1997, pp. 1048, 1054.
Lin, X., “Header and footer extraction by page-association”, Hewlett-Packard Company Technical Report, 2002, www.hpl.hp.com/techreports/2002/hpl-2002-129.pdf.
Lin, X., “Text-mining based journal splitting”, Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003), vol. II, Aug. 3-6, 2003, Edinburgh, Scotland.
Lin, X., Simske, S., “Automatic document navigation for digital content re-mastering”, SPIE COnference on Document Recognition and Retrieval XI, Jan. 18-22, 2004, San Jose, CA.
Power, R., Scott, D., Bouayad-Agha, N., “Document Structure”, Computational Linguistics, vol. 29, No. 2, 2003, pp. 211-260.
Satoh, S., Takasu, A., Katsura, E., “An automated generation of electronic library based on document image understanding”, Proceedings of the Third International Conference on Document Analysis and Recognition (ICDAR'95), vol. 1, Aug. 14-15, 1995, Tokyo, Japan, pp. 163-166.
Summers, K.M., “Automatic discovery of logical document structure”, PhD thesis, Cornell University, Computer Science Department, Aug. 1998, pp. 1-181.
Virk, R., “Converting PDF files into XML”,CambridgeDocs, 2004, www.cambridgedocs.com.
Dejean Herve
Meunier Jean-Luc
Fay Sharpe LLP
Hutton Doug
Smith Tionna
Xerox Corporation
LandOfFree
Table of contents extraction with improved robustness does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Table of contents extraction with improved robustness, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Table of contents extraction with improved robustness will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4162546