Section extraction tool for PDF documents

Image analysis – Image transformation or preprocessing – Selecting a portion of an image

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C715S252000

Reexamination Certificate

active

06801673

ABSTRACT:

FIELD OF THE INVENTION
The invention is generally related to electronic data files. More particularly, the invention is related to extraction of a section of a portable document format document.
BACKGROUND OF THE INVENTION
Electronic files may be created using a variety of techniques. Thus, it may be desirable to store data from an electronic file in a format that is independent of the process used to create it so that it may be accessible to a range of users. One format that allows such access is the portable document format. The portable document format (“pdf”) is a file format for representing documents in a manner independent of the application software, hardware, and operating system used to create the documents and independent of the output device on which they are displayed or printed.
A PDF workflow assumes a one-way production process where the PDF file contains a rendition that is laid out for final presentation, i.e., no logical structural information is preserved. Consequently, one problem with storing documents in a pdf format is that it is difficult to reuse parts of documents because elements with semantic affinity are not stored as one logical group of elements. Although it is possible to store the original editable document as an attribute in the PDF file, this is not generally done, since the original program for creating the pdf document is unavailable anyway, or because this introduces a vulnerability for computer viruses. Without the original editable document, removing a portion of the pdf document for use in another document or file is not easily accomplished. For example, it may be desirable for a user to insert a graph or chart from a pdf document into a document of the user's own creation or make a slide presentation with the graph or chart. The PDF specification makes an allowance to include structural information, however, very few pdf documents are created with such structural information due to size constraints and/or creation processes. Thus, most pdf documents do not generally support sharing or repurposing the content of the document and it is generally not possible to extract a figure, an illustration or a paragraph from a chapter as an integrated object from PDF.
There are a few techniques available for reusing pdf document content. However, some of these processes are complicated and require extensive user interaction, while others extract a raster rendition of the selected document portion from the display bitmap, thereby losing all original document structure and attribute information, as well as resolution, which is usually limited to the 72 dpi screen resolution.
SUMMARY OF THE INVENTION
An aspect of an embodiment of the invention is to provide a method for extracting a section of a portable document format (“pdf”) document.
In one embodiment, the method may include receiving indication of a user defined region on a pdf file page, determining if each element on the pdf page is within the user defined region, designating an extraction region including all elements determined to be within the user defined region, and placing the extraction region into a new pdf file.
Those skilled in the art will appreciate these and other advantages and benefits of various embodiments of the invention upon reading the following detailed description of preferred embodiments with reference to the below-listed drawings.
Another aspect of the invention includes checking the extracted region for accuracy. In one embodiment, both the extracted region and the region in the original document may be converted to bitmap images and compared bit by bit.


REFERENCES:
patent: 5896462 (1999-04-01), Stern
patent: 5963669 (1999-10-01), Wesolkowski et al.
patent: 6035061 (2000-03-01), Katsuyama et al.
patent: 6044375 (2000-03-01), Shmueli et al.
patent: 6073148 (2000-06-01), Rowe et al.
patent: 6583890 (2003-06-01), Mastie et al.
patent: 6633890 (2003-10-01), Laverty et al.
patent: 6654758 (2003-11-01), Teague
patent: 6708309 (2004-03-01), Blumberg
patent: 6732102 (2004-05-01), Khandekar
patent: 0890898 (1999-01-01), None
Hui Chao et al.; “PDF Document Layout Study with Page Elements and Bounding Boxes”; Hewlett-Packard Labs, Imaging Systems Laboratory; 3 pages, Sep., 2001.
Hui Chao et al: “PDF Document Layout Study with Page Elements and Bounding Boxes” Workshop on Document Layout Interpretation and its Applications, Online! Sep. 9, 2001, XP002249458 http://www.science.uva.nl/events/dlia retrieved on Jul. 28, 2003.
“Copying and Pasting text and graphics to another application” ADOBE ACROBAT V3.0 Helpfile 1997, XP002249459.
Liang J et al: “Document layout structure extraction using bounding boxes of different entites” Applications of Computer Vision, 1996. WACV '96. Proceedings3rd IEEE Workshop on Sarasota, FK USA Dec. 2-4, 1996, Los Alamitos, CA USA XP010206444.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Section extraction tool for PDF documents does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Section extraction tool for PDF documents, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Section extraction tool for PDF documents will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3277368

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.