Image analysis – Image segmentation – Distinguishing text from other regions
Reexamination Certificate
2000-02-15
2003-02-11
Chang, Jon (Department: 2623)
Image analysis
Image segmentation
Distinguishing text from other regions
C382S164000, C358S464000
Reexamination Certificate
active
06519362
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates, in general, to image analysis and, in particular, to segmenting individual characters or words.
BACKGROUND OF THE INVENTION
Extracting text from a color image, especially from a color image where the text is integrated with a graphic, is useful for optical character recognition and for conducting a text search. In the present invention, text integrated with a graphic means that the text and the graphic are not located in separate regions of the image but are combined somehow (e.g., overlaid). Color images that integrate text and graphics communicate in an immediate and effective manner and are widely used. However, such images are often a complex mixture of shapes and colors arranged in unpredictable ways which make it difficult to automatically extract or separate the text from the rest of the color image.
Systematic attempts to understand notions of color go back at least to Newton's light experiments with prisms in the 17th century, and have been considerably formalized and expanded since, most notably by Huygens, Young, von Helmholtz, and Maxwell. Numerically, the notion of color has largely been reduced to a wavelength measurement of an electromagnetic light wave. In addition, the structure of color receptors in the human eye has motivated the choice of three colors, namely red, green, an d blue (RGB), from which most other colors may be obtained by suitable weighted combinations. In 1931, the International Commission on Illumination (CIE) standardized the wavelengths of these three colors as: Red=700 nm, Green=546.1 nm, and Blue=435.8 nm, where 1 nm=10
−9
m. The fundamental RGB system is still prevalent today, used for color monitors, color scanners, video cameras, and digital file formats such as TIFF, JPEG, and BMP.
Mathematically, the RGB system may be thought of as a basis, from which other color coordinate systems may be derived via linear or non-linear transformations, depending on the particular application at hand. The general intention has been to define color specifications in standard, well-accepted terminology, resulting in reproducible results across many media. Some examples of color coordinate systems derived via linear transformations of RGB are the Luminance (usually called grayscale), In-phase chrominance, and Quadrature chrominance (YIQ), a system used for TV signals in the United States; the Cyan, Magenta, Yellow (CMY) subtractive system used in color printing; and several XYZ Cartesian systems defined by the CIE. Some non-linear examples include the Luminance, a-chrominance, and b-chrominance (Lab) system; the Luminance, u-chrominance, and v-chrominance (Luv) system; and the Hue, Saturation, and Brightness (HSB) system and its variants: hue, saturation, and value (HSV); hue, saturation, and intensity (HSI); and hue, lightness, and saturation (HLS).
In the digital domain (where image components are doubly-indexed pixel matrices with integer entries) a typical 24-bit RGB image is an overlay of three eight-bit images; thus each pixel component (e.g., R in RGB) ranges in value from 0 to 255, and is represented by an eight-bit binary number. The value of a pixel's color component represents the intensity of that color component. Therefore, a color component may be assigned one of 256 values when represented by an eight-bit binary number. A color component value of zero means that none of that color component contributes to the color of the pixel. For example, if R in RGB has a value of 0 then no red color is present in the pixel.
A digital color image may be easily converted from one color coordinate system to another. As a linear example, an image in RGB may be converted to an image in YIQ by converting each pixel in the RGB image to a pixel in YIQ as follows:
Y
ij
=0.299R
ij
+0.587G
ij
+0.114B
ij
,
where Y
ij
is the ij pixel entry of the resulting Luminance channel, and R
ij
is the ij pixel value of the red component, G
ij
is the ij pixel value of the green component, and B
ij
is the ij pixel value of the blue component; similarly,
I
ij
=0.596R
ij
−0.274G
ij
−0.322B
ij
Q
ij
=0.211R
ij
−0.523G
ij
+0.312B
ij
for the in-phase and quadrature chrominance components I
ij
and Q
ij
.
Conversion from RGB to Lab provides an example of a non-linear coordinate transformation. It is given as
L
ij
=10(G
ij
)
1/2
,
a
ij
=(R
ij
−G
ij
)/(R
ij
+2G
ij
+B
ij
), and
b
ij
=0.4(G
ij
−B
ij
)/(R
ij
+2G
ij
+B
ij
).
U.S. Pat. No. 5,802,203, entitled “IMAGE SEGMENTATION USING ROBUST MIXTURE MODELS,” discloses a method of modeling an image, which may include text, as a compilation of layers having different brightness functions to prevent corruption of the image due to noise or to compress the image. The method of U.S. Pat. No. 5,802,203 does not extract text from a color image that includes graphics as does the present invention. U.S. Pat. No. 5,802,203 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 5,956,468, entitled “DOCUMENT SEGMENTATION SYSTEM,” discloses a method of identifying those regions in a document that contain an image and regions that contain text so that dithering techniques for printing may be used only on images. U.S. Pat. No. 5,956,468 requires an image to be separate from text. Therefore, U.S. Pat. No. 5,956,468 cannot extract text that is an integral part of an image as does the present invention. U.S. Pat. No. 5,956,468 is hereby incorporated by reference into the specification of the present invention.
SUMMARY OF THE INVENTION
It is an object of the present invention to extract text from a color image.
It is another object of the present invention to extract text from a color image, where the text is integrated with a graphic.
The present invention is a method of extracting text from a color image in three steps: image reception, grayscale conversion, and binarization.
The first step is receiving a color image, where the image includes pixels, where each pixel is represented by a color component system selected from the color component systems consisting of RGB, YIQ, CMY, XYZ, Lab, Luv, HSB, HSV, HSI, and HLS.
The second step is converting the color image received into a grayscale image in a manner that maximizes the contrast between any text in the image and the rest of the image. It is this second step that is crucial to the text extraction process. Five different conversion methods (i.e., C
1
, C
2
, C
3
, C
4
n
, and C
5
n
) are used to generate one or more grayscale images. C
1
, C
2
, and C
3
each generate one grayscale image while C
4
n
and C
5
n
each generate six grayscale images if a pixel is made up of three color components. To avoid division by zero, the corresponding value of the color component of the received image is biased upward by one. The five conversion methods apply pixel-wise as follows:
C
1
ij
=avg(R
ij
, G
ij
)=(R
ij
+G
ij
)/2;
C
2
ij
=min(R
ij
, G
ij
, B
ij
)/avg(R
ij
, G
ij
, B
ij
);
C
3
ij
=min(R
ij
, G
ij
, B
ij
)/max(R
ij
, G
ij
, B
ij
); and
C
4
n
ij
=one(R
ij
, G
ij
, B
ij
)/another(R
ij
, G
ij
, B
ij
); and
C
5
n
ij
=comb(R
ij
, G
ij
, B
ij
)/sum(R
ij
, G
ij
, B
ij
).
The third step of the method is comparing each pixel value in each grayscale image resulting from the second step to a threshold T and setting the value of the grayscale pixel to a first value (e.g., 0) if the value of grayscale pixel is not greater than T; otherwise, setting the value of the grayscale pixel to a second value (e.g., 1). The third step results in the binarization of the grayscale images resulting from the second step. The threshold is defined by the following equation:
T=min(Cm
ij
)+k*std(Cm
ij
), or alternatively as
T=max(Cm
ij
)−k*std(Cm
ij
),
where * denotes multiplication, and where m=1, 2, 3, 4n, and 5n., The result of the third step is a set of black and white images where any text in the color image
Chang Jon
LaRose Colin
Morelli Robert D.
The United States of America as represented by the National Secu
LandOfFree
Method of extracting text present in a color image does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of extracting text present in a color image, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of extracting text present in a color image will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3160716