User-enclosed region extraction from scanned document images

Image analysis – Image segmentation – Separating document regions using preprinted guides or markings

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S306000, C382S203000

Reexamination Certificate

active

06351559

ABSTRACT:

BACKGROUND AND SUMMARY OF THE INVENTION
The present invention relates generally to document processing. More particularly, the invention relates to the extraction of a user-enclosed portion of text or non-text regions from bitmap images.
People like to scribble marks and notes on documents while reading. For example, when a person is reading a book or a magazine, he or she might draw circles with a pen over the parts that are of particular interest. When the person underlines a few lines, circles a paragraph, writes notes on a page these notes and marks often convey important cues to the content of the document and may serve as keys or references for communications with other people. As more and more paper documents are now being converted and archived in electronic media, it is useful if these underlines, highlights, circles, handwritten or Post-It notes on a paper document can be automatically identified, located, their associated contents extracted and preserved in a document management system.
The present invention describes a technique for locating and extracting one type of user-drawn marks from a scanned document—the user enclosed regions. The invention is based on the bi-connected component analysisi in graph theory. The invention first represents the content of the input image as run-length segments. The invention then constructs line adjacency graphs from the segments. Finally, the invention detects user-enclosed cicles as bi-connected components of the line adjacency graphs.
The present invention is useful in applications such as in an electronic filing system or in storage management for document databases. Currently, the burden of cutting and pasting a selective region from a page (for example, an article from a newspaper) for archival is on the user. However, a user is sometimes only interested in specific portions on a page. The circled region extraction technique offers a means for a user to simply mark the regions of interest and let the imaging process identify the regions and save them alone. Alternatively, different compression strategies may be applied to user-enclosed regions to preserve the quality of the image in these regions.
The present invention analyzes the image of the scanned document in its native bitmap format using a connected component module. The invention is writing system independent. It is capable of extracting user-enclosed regions from document images without regard to what character set or alphabet or even font style has been used. The connected component module then stores the components of the image that are connected. The connected component data is stored in a datastructure in the form of a line adjacency graph to expedite the further processing of the connected component data.
The connected component data is then analyzed by a graph traversal module to extract geometric properties of each connected component and store the geometric properties in a datastructure. The geometric features extracted are those geometric features that are necessary for further analysis by the invention.
The invention then separates the largest bi-connected component from the user-enclosed regions of the document image by utilizing a bi-connected component module. The bi-connected component module detects any enclosure regardless of shape. Furthermore, the user drawn enclosure can cross lines of text or graphics on the document paper and still be recognized as a bi-connected component. The bi-connected component module utilizes a depth-first search that allows the detection of the largest bi-connected component to be done in an efficient manner.
Following the bi-connected component module, a detection analysis filter further refines the extraction process by qualifying each user-enclosed candidate. These additional heuristics eliminate from the possible selection of user-enclosed regions those bi-connected areas that are not above a minimum size, or are photographic images.
After having selected the user-enclosed regions the extraction module separates the bitmap portion that lies within the user-enclosed region and stores that extracted portion of the document image in a storage medium for future reference and manipulation by the user. The present invention enables the user to save a large amount of disk storage space by extracting only the portions of the document image that the user is interested in.
For a more complete understanding of the invention, its objects and advantages, reference may be made to the following specification and to the accompanying drawings.


REFERENCES:
patent: 5048099 (1991-09-01), Lee
patent: 5619592 (1997-04-01), Bloomberg et al.
patent: 5748809 (1998-05-01), Hirsch
patent: 5841900 (1998-11-01), Rahgozar et al.
patent: 5892843 (1999-04-01), Zhou et al.
patent: 6230170 (2001-05-01), Zellweger et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

User-enclosed region extraction from scanned document images does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with User-enclosed region extraction from scanned document images, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and User-enclosed region extraction from scanned document images will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2977380

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.