E-mail signature block analysis

Image analysis – Pattern recognition – Context analysis or word recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S009000

Reexamination Certificate

active

06373985

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the analysis of signature blocks, and more particularly, to the analysis of signature blocks of e-mail messages, combining geometrical layout features and language constraints using finite state transducers.
2. Description of the Related Art
The rapidly increasing usage of the Internet in recent years has made electronic mail (e-mail) one of the most common forms of business and personal communication. How to manage the large and dynamic collection of e-mail documents for efficient storage and information retrieval, and how to convert between e-mail and other forms of messages (e.g., voice mail and fax) to allow convenient access when and where the user needs, are two of the most important research areas in multimedia messaging.
The content of modern-day e-mail has expanded beyond text to include encoded documents, images, even audio and video clips. However, unmarked text is still the prevailing format for e-mail communications due to its simplicity, and sufficiency in terms of conveying ideas, conducting discussions, making announcements, etc. One of the most common structured elements in text e-mail is the signature block. The signature block contains information about the sender, such as e-mail address, web address, phone/fax number, personal name, postal address, etc., and is usually separated from the rest of the message by some sort of border. Accurate identification and parsing of signature blocks is important for many multimedia messaging applications such as e-mail text-to-speech rendering, automatic construction of personal address databases, and interactive message retrieval.
Automatic conversion of e-mail into speech is one of the most important commercial applications of text-to-speech technology, and is one technological component of the growing interest in media conversion.
However, parsing of signature blocks is a very challenging task due to the fact that signature blocks often appear in complex two-dimensional layouts which are guided only by loose conventions. Table 1 shows one example of such a layout.
TABLE 1
An exemplary signature block
_/∥ Vinod Anupam
email: anupam@research.bell-labs.com
‘0.o’ Bell Labs, Lucent Tech.
WWW:
http://www.tempo.lucent.com/″anupam
= (
---
) = 700 Mountain Ave., Rm 2C-236A
phone: (908) 582-7366
U Murray Hill, NJ 07974-0636
fax: (908) 582-5809
A straightforward line-by-line analysis using conventional text analysis methods is unable to extract fields such as the postal address. Traditional text analysis methods designed to deal with sequential text cannot handle two-dimensional structures, while the highly unconstrained nature of signature blocks makes the application of two-dimensional grammars very difficult.
In particular, conventional techniques in the document analysis field, such as those described in “A document understanding method for database construction of an electronic library,” A. Takasu et al., In Proc. 12
th
CVPR, pp. 263-466, 1994 and “A matrix grammar for document processing,” A. Takasu et al., In Proc. 6
th
Int. Conf. on Industrial & Engineering Applications of Artificial Intelligence & Expert Systems, pp. 197-200, 1993 have applied the use of two-dimensional grammars or array grammars for logical layout analysis in printed documents. Other conventional techniques, such as those described in “High level document analysis guided by geometric aspects,” A. Dengel et al., International Journal of Pattern Recognition and Artificial Intelligence, 2(4):641-655, 1988 have applied geometric trees. However, these methods are applicable only to known document types with rigid layout rules, which is not the case with signature blocks where the layout design is highly individualized and unconstrained.
Further, as illustrated in Table 1, the signature block includes several fields, one of which is the e-mail address. If the personal name is not specifically identified, which it almost always is not, it is very difficult to distinguish the personal name from other elements such as street or city names, organization names, etc. As a result, it is difficult to automatically determine the originator of the e-mail message.
SUMMARY OF THE INVENTION
The present invention solves the above-identified problems with analysis of highly unconstrainted text blocks, such as e-mail signature blocks by combining two-dimensional structural (layout) analysis with one-dimensional grammatical (language) constraints. The information obtained from both the layout and language analysis are integrated in the form of weighted finite state transducers (WFST) and the final solution is the optimal interpretation under both analyses.
The present invention also solves the above-identified problems in identifying a personal name from an e-mail signature block, by analyzing the e-mail user name. In particular, for each candidate personal name, the present invention constructs a finite state transducer (FST) which summarizes all e-mail user names that can be derived from the personal name following common conventions. A confidence score is then assigned to the candidate based on whether the corresponding FST contains the actual e-mail user name and through which particular path.


REFERENCES:
patent: 4423287 (1983-12-01), Zeidler
patent: 5418717 (1995-05-01), Su et al.
patent: 5806032 (1998-09-01), Sproat
patent: 6021202 (2000-02-01), Anderson et al.
Atsuhiro Takasu et al., “A Document Understanding Method for Database Construction of an Electronic Library,” 1994 IEEE, pp. 463-466.
Andreas Dengel et al., “High Level Document Analysis Guide By Geometric Aspects,” International Journal of Pattern Recognition and Artifical Intelligence, vol. 2, No. 4, 1998, pp. 641-655.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

E-mail signature block analysis does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with E-mail signature block analysis, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and E-mail signature block analysis will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2849807

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.