Image analysis – Image segmentation – Distinguishing text from other regions
Reexamination Certificate
2000-06-19
2004-02-10
Chang, Jon (Department: 2623)
Image analysis
Image segmentation
Distinguishing text from other regions
C382S218000, C382S294000, C382S135000
Reexamination Certificate
active
06690824
ABSTRACT:
BACKGROUND OF THE INVENTION
The invention relates to a process for obtaining by electronic means the automatic recognition of characters, even if printed in a variable position on a highly contrasted structured background drawing. The process consists in firstly producing a model of the background, obtained by capturing with an electronic camera the images of several samples, on which images there is only the background. Thereafter, the models of the symbols (for example alphanumeric characters) to be recognized are produced, either capturing the images of a set of characters printed on white background, or using the commercially available computer files of the characters of the chosen fonts.
At the time of recognition, the position of each of the characters to be recognized is firstly measured with respect to the position of the printing of the drawing of the background. Each character to be recognized is thereafter compared with models obtained by combining the models of the symbols with the model of the background, with the same relative position of the unknown character. Recognition of the character together with background is therefore achieved by comparison with models of the characters combined with the same background in the same position, using any well-known recognition techniques.
The present invention relates to a process for the automatic recognition of the characters printed on any medium, even if the background exhibits highly contrasted structures, which therefore interfere considerably with the structure of the characters. There are several well-known character recognition techniques, as described in L. Stringa, “Procedure for Producing A Reference Model”, U.S. Pat. No. 5,778,088, the content of which is hereby incorporated by reference. The great majority of known systems approach the problem by trying to separate the characters from the background by means of sometimes very ingenious and sophisticated thresholds. Unfortunately, this technique fails when the contrast of the structures of the background is very considerable, especially if the position of the characters can vary with respect to the said structures. Consequently, the images of the characters sometimes contain some signs of the background (those which exceeded the threshold) or sometimes they are not complete, since a part of the structure of the characters has not exceeded the threshold. Such for example is the case with bank notes, the printing of whose serial numbers takes place in a phase separated from (usually following) the printing of the remainder, and generally with a different printer. The registration cannot therefore be perfect, and consequently the serial numbers “move” with respect to the background: if they are printed on a structured area of the note, that is to say on a drawn area, they move with respect to the structure (the drawing) of the background. Moreover, in the cases cited, even the search for and the segmenting of the characters are at risk of failing on account of the structures of the background.
Indeed, even if with a vast amount of variations, the extraction and recognition procedure almost always involves the following stages:
capture of the images of the document, and more generally, of the object containing the characters to be recognized. Capture is achieved by means of an electronic camera, and is usually followed by computations aimed at improving the contrast and reducing the noise
search over the image (henceforth electronic) for the position of the characters to be recognized. The search is often based on an analysis of the abrupt changes of illumination (such as switching from white to black), in particular of their spatial distributions
segmentation of the area identified into subareas, each containing a single character. Segmentation is achieved for example by analyzing the projection of the density of black onto a segment parallel to the base of the line of characters: the minima of this density can be correlated with the white space between characters.
each character thus isolated is compared with prototypes (models) of all the letters and/or of all the numerals, either in terms of superposability (techniques known as “template-matching”), or in terms of sequence of characteristic structures, such as vertical, horizontal or oblique line-type, etc. (techniques known as “features extraction” or structural analysis).
In any case it is obvious that if the part of the image segmented as character contains structures which are foreign to the shape of the actual character (for example lines belonging to the structure of the background), the risk of failure of the comparison with said prototypes is very high. This is a risk that may also be a consequence of the loss of discriminating parts of the structure of the character subsequent to overly drastic thresholding in the characters/background separation phase.
This is why the previous approaches to the automatic recognition of characters printed on highly structured backgrounds with high contrast are not sufficiently profitable.
SUMMARY OF THE INVENTION
According to the present invention, the objects on which the characters to be recognized are printed are analyzed optically by well known optoelectronic means, such as for example a CCD camera (linear or matrix type, black and white or color), with the desired resolution for producing electronic images of the characters to be recognized. In what follows, the “term” image will be used in the sense of electronic image, in particular a discrete set of density values, in general organized as a rectangular matrix. Each element of the matrix, the so-called pixel, is a measure of the intensity of the light reflected by the corresponding part of the object. For color images, the description generally consists of three matrices corresponding to the red, green and blue components of each pixel. For simplicity, the following description relates to the black and white case: the extension to color is achieved by repeating the same operations on the three matrices. Aim of the invention is the automatic recognition in electronic images of characters printed on a highly structured background whose contrast may even be comparable with the contrast of structures of the characters, as in the example of
FIG. 2
c
. The first step of the process underlying the present invention consists in producing a model of the background which can be obtained capturing images of one or more samples on which only the drawing of the background is present, without any character (see for example
FIG. 2
b.
In particular, it is possible to use as model the average of the images of the so-called samples: in the case of black and white images there will be a single average-matrix, whilst in the case of color images there will be three average-matrices, for example red, green and blue. The models of the symbols (for example letters and/or numerals) to be recognized are produced subsequently, either capturing the images of a set of characters printed on a white background, or using directly the electronic images of computer files which are nowadays commercially available for most “fonts”. In the first case, the model of each symbol to be recognized can be constructed as the average of the images of a certain number of specimens of the same symbol printed on white background.
Once the models of the symbols and the model of the background have been constructed, the first phase of the process, which might well be called the “learning phase”, is terminated.
During the recognition phase, the following steps are carried out:
capturing of the image of the sample to be recognized, which contains the unknown characters printed on the background in a position which is itself also unknown (example
FIG. 3
a
).
registering of the model of the background with the image captured, by means of any of the well-known techniques for registering images, for example using the method of maximum correlation
subtraction of the (registered) model from the image captured: the difference image, where the background will be alm
Bugnion S.A.
Chang Jon
KBA-Giori S.A.
Moetteli John
LandOfFree
Automatic recognition of characters on structured background... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Automatic recognition of characters on structured background..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automatic recognition of characters on structured background... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3315664