Image analysis – Applications – Mail processing
Reexamination Certificate
2000-10-05
2004-09-28
Chang, Jon (Department: 2623)
Image analysis
Applications
Mail processing
C382S177000, C382S180000, C382S190000
Reexamination Certificate
active
06798895
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to image processing methods and apparatus used, for example, for an automatic mailing address reader, and in particular to image processing methods and apparatus that extract an irregular character string, such as an address that has been handwritten in Japanese.
BACKGROUND OF THE INVENTION
The postal service system, for example, daily must process a large amount of mail within a short period of time. And thus, since until recently all mail had to be sorted visually and manually, the size of the work load borne by postal service employees has been very large. To reduce this work load, mechanization, represented by automatic readers, has been introduced, and this has led to the development of an advanced procedure whereby mail to be delivered is sorted by district. This way of handling mail has been successful in Japan because postal codes are entered in boxes provided for them on mailing matter; however, for overseas mail, for which no clearly defined spaces for postal code entries are provided, it is difficult to mechanically determine where on mailing matter postal codes are located. Moreover, since it is anticipated that the amount of overseas mail processed will continually increase, currently a demand exists for the continued development of present techniques to permit the immediate reading of address information, including postal codes, carried by mail, and for sorting the mail for delivery by districts.
Although there is a strong demand for the extraction of postal codes carried by mailing matter for which postal code entry areas are not defined, and for the reading of address information, it is difficult to use current techniques for these purposes. This is primarily because on the exterior surfaces of the various items that constitute the mail, not only are there areas provided for recipient addresses, but there may be other areas in which sender addresses are entered, areas in which advertising material is presented, and areas in which various patterns, such as drawings and photographs, are displayed. Further compounding the problem are the many ways in which entries are made, including the use of handwritten and mechanically printed characters, and of vertical and horizontal writing styles. As a result, in order to correctly separate postal codes and other address data from the various coexisting information entries carried by the mail, a very complicated process must be employed, such as one that provides for the examination of all external surfaces. For addresses that have been handwritten in Japanese, in particular, not only does a problem exist because both vertical and horizontal writing styles are used, but there is a further problem in that the handwritten characters used in the addresses are irregularly shaped, which makes the extraction of appropriate addresses especially difficult.
Of the conventional methods that are used to extract address information, techniques exist by which address areas and actual addresses can be identified and read. Included is a method whereby it is presumed that labels bearing printed addresses are attached to mail. For this method the reflection attributes of the labels, or the shadows thrown by the edges of the labels, are detected and the address areas are thereafter extracted. There is also a method whereby horizontal and vertical projections of the mail are obtained and are used to select probable address areas, in which zones containing high entry densities are defined as character rows. In addition, in Japanese Unexamined Patent Publication No. Hei 7-265807, a technique is disclosed whereby to extract character rows, connected components that are near each other are joined together, and the character rows are later combined to define probable address areas.
However, when addresses are printed directly on external surfaces, the method whereby address areas are extracted from accompanying entries by presuming that printed address labels are attached to mail is definitely useless. According to this method, the available mail types are very limited. And it is particularly difficult to apply this technique for addresses that are handwritten in Japanese, since in this case, labels are seldom used. Also, with the method used to select probable mailing address areas by defining as character rows zones having high entry densities, it is difficult to separate address areas from their backgrounds if the backgrounds contain large numbers of image data, and if complicated patterns are displayed on the mail. As a result, with this method correct address areas can not precisely be detected.
Especially when the technique disclosed in Japanese Unexamined Patent Publication No. Hei 7-265807 is used, it is presumed that comparatively characters are regularly aligned, and that character strings are arranged relatively near each other. Thus, although this technique can more or less be applied for printed addresses, satisfactory results can not be expected when the technique is used for irregular handwritten addresses. Furthermore, according to this technique, a complicated joining process must be performed for all pixels, i.e., portions from which pixels are accumulated are detected by examining all pixels, and this detailed examination must be repeated for each portion involved. As a result, logic operations are very difficult, processing speeds are greatly reduced, and configurations are much too complicated. Therefore, as a system configuration, this method is not realistic.
SUMMARY OF THE INVENTION
It is, therefore, one object of the present invention to avoid the use of a complicated process to combine pixels, and to quickly and precisely extract character strings such as are contained in handwritten addresses.
It is another object of the present invention to extract character strings that are irregularly arranged but that practically can be assumed to constitute an address.
It is an additional object of the present invention to extract character strings by employing a flexible and simple process for mail images, such as ones whereon vertical and horizontal writing styles coexist.
It is a further object of the present invention to employ different algorithms for regularly arranged character strings, such as printed character strings, and for irregularly arranged character strings, such as addresses that are handwritten in Japanese, so as to flexibly and accurately extract character strings.
To achieve the above and other objects, according to the present invention, a character string extraction method comprises the steps of: extracting connected components from an input image; comparing the sizes of the connected components with a predetermined threshold size, and extracting connected components occupying a range within the predetermined threshold size; extending vertically or horizontally the extracted connected components occupying a range within the predetermined threshold size, and connecting the extended connected components to form and extract long connected components; and extracting probable character strings based on the connection state of the extracted long connected components.
The step for extracting the connected components can be a method whereby an 8-connected component arranged vertically, horizontally or obliquely, or a 4-connected component arranged vertically or horizontally, is employed to extract connected components occupying a range within a predetermined threshold size.
At the step for extracting the connected components occupying a range within the predetermined threshold size, the connected components occupying the range within the predetermined threshold size may be extracted by comparing the vertical and/or horizontal size of the connected components with a predetermined vertical and/or horizontal size for an assumed character. According to this method, since the comparison is performed by defining a threshold value while taking into account the size of a handwritten character and an assumed character size, the connected components can be detected, while n
Chang Jon
Dang Thu Ann
International Business Machines - Corporation
LandOfFree
Character string extraction and image processing methods and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Character string extraction and image processing methods and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Character string extraction and image processing methods and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3207358