User drawn circled region extraction from scanned documents

Image analysis – Image segmentation

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S203000

Reexamination Certificate

active

06597808

ABSTRACT:

BACKGROUND AND SUMMARY OF THE INVENTION
The present invention relates generally to the electronic documents and imaging. More particularly, the invention relates to a system and method for identifying a user-circled region on a scanned document or bit mapped image, to allow information within the encircled region to be further analyzed as by optical character recognition.
With the rapid development of computer technologies, more and more documents have been archived in electronic form. The electronic form raises an entirely new set of issues concerning information retrieval.
Unlike alphanumeric text files, which can be keyword searched easily, files containing bitmapped images or scanned information cannot be so readily searched. This is because bitmapped images and scanned documents represent printed words as graphical pictures made up of tiny black, white or colored dots that are not directly recognizable by a computer as characters or letters of the alphabet. Fortunately, in many cases these graphical pictures of characters and letters can be converted into computer-recognizable text characters by employing optical character recognition (OCR) software. Even with fast computers, the optical character recognition conversion process is a slow one. In many cases it is simply not practical to convert entire bitmapped image for scanned documents using optical character recognition software.
In cases where it is not practical to employ optical character recognition conversion, document image databases may be indexed by the user manually assigning a character-based text name or label to each bitmapped image or scanned image. The computer then associates the text name or label with the image data, so that the image data can be later retrieved by searching upon the text name or label.
Manual entry of appropriate text names and labels is labor intensive and often requires additional computer hardware, including keyboard or numeric keypad connected to the document imaging station. Taking a fresh look at the indexing problem, some have suggested embedding technology within the document imaging system that would recognize user-circled regions. Such regions, if identified, could serve to localize, and thus minimize, the optical character recognition process. The user would circle the graphical image of a desired keyword, and the system would then perform optical character recognition on the image data within the circle, converting it into a label that would then be associated with the image data file.
Unfortunately, while the idea has merit, it has proven quite difficult to accurately and repeatedly detect the user-circled region, particularly given the wide variation in handwriting quality from one user to the next. The problem is further compounded by the fact that some user-drawn circles may overlap or intersect with other circles (whether user-drawn or not); and because some user-drawn “circles” may not be fully closed. The latter problem can arise where the user simply does not fully finish drawing a circle, or where the user-drawn circle is broken by image dropout. Image dropout can occur, for example, when a light or thin circle is drawn and then processed by low resolution scanning equipment.
The present invention addresses these problems by a system and method that “traces” a circle candidate, identifies key geometric features of the traced shape and then generates a substitute circle using the identified points as a guide. The substitute circle, having been computer generated, forms a fully closed circle structure that may be more reliably relied upon for extracting text contained within the circled region.
In instances where the user does not draw a fully closed circle, the system can employ one of a variety of endpoint proximity algorithms to ascertain whether the users intent was to draw a fully enclosed circle, or not.
For a more complete understanding of the invention, its objects and advantages, refer to the following specification and to the accompanying drawings.


REFERENCES:
patent: 5048099 (1991-09-01), Lee
patent: 5201011 (1993-04-01), Bloomberg et al.
patent: 5579407 (1996-11-01), Murez
patent: 5619592 (1997-04-01), Bloomberg et al.
patent: 5680470 (1997-10-01), Moussa et al.
patent: 5848413 (1998-12-01), Wolff
patent: 5873077 (1999-02-01), Kanoh et al.
patent: 6351559 (2002-02-01), Zhou et al.
Tarjan, Robert, Depth-First Search and Linear Graph Algorithms, Siam Journal on Computing, Jun. 1972, pp. 146-160.
Theo Pavlidis, Algorithms for Graphics and Image Processing, Computer Science Press., 1982, p. 1-3.
Aho et al., The Design and Analysis of Computer Algorithms, pp. 179-187.
Aho et al., Data Structures and Algorithms, Jun. 1983, pp. 244-246, 252, 417.
http://www.rightfax.com/Products/ocr.htl, printed Aug. 17, 1998.
http://www.panasonic.co.jp/mgcs/fax/dx1000/dx1000.html, printed Jul. 12, 1999.
http://www.copiers-phones.com/cleveland/copiers/ifax/internet-fax.htm, printed Jul. 26, 1999.
Dr. Dobb's Essential Books on Graphic & Programming, Dr. Dobb's Journal, Miller Freeman, Inc., 1995.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

User drawn circled region extraction from scanned documents does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with User drawn circled region extraction from scanned documents, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and User drawn circled region extraction from scanned documents will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3085235

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.