Data processing: presentation processing of document – operator i – Presentation processing of document – Layout
Reexamination Certificate
2000-01-21
2004-11-16
Hong, Stephen S. (Department: 2178)
Data processing: presentation processing of document, operator i
Presentation processing of document
Layout
Reexamination Certificate
active
06820237
ABSTRACT:
FIELD OF THE INVENTION
The invention pertains to the field of text reduction by selecting the key content thereof and, more particularly, to an apparatus and method for intelligently analyzing and highlighting key words/phrases, key sentences and/or key components of an electronic document by recognizing and utilizing the context of both the electronic document (which may be any type of electronic message such as e-mail, converted voice, fax or pager message or other type of electronic document) and the user.
BACKGROUND OF THE INVENTION
The volume of information in the form of text, particularly electronic information, being communicated to users is increasing at a very high rate and such information can take many forms such as simple voice or electronic messages to full document attachments such as technical papers, letters, etc. Because of this, there is a growing need in the communications, data base management and related industries for means to intelligently condense electronic text information for purposes of assisting the user in handling such communications and for effective storage and retrieval of the information.
The known document condensers (sometimes also referred to as key word/phrase “extractors” or as “summarizers”), which typically function to identify a set of key words/phrases by utilizing various statistical algorithms and/or pre-set rules, have had limited success and limited scope for application. One such known method of condensing text is described in Canadian Patent Application No. 2,236,623 by Turney which was laid open on 23 Dec. 1998; the Turney method disclosed by this reference relies upon the use of a preliminary teaching procedure in which a number of pre-set teaching modules, directed to different document categories or academic fields, are provided and a selected one is run prior to using the text condenser in order to revise and tune a set of rules used by the condenser so as to produce the best results for documents of a selected category or within the selected academic field. However, such prior condensers do not advance the art appreciably because they are primarily statistically based and do not meaningfully address semantic factors. As such they are directed to producing lengthy indices of key words and phrases per se with the result that the relationships or concepts between those key words and phrases is often lost. They also ignore the intent of the electronic document and, hence, treat news, papers, discussions, journal papers, etc. generically.
The inventors herein have identified that the difficulty faced by any means of generating a summary of the key content of a given body of text of an electronic document, which must be overcome, is in recognizing and accommodating the specific context of the text. This is because electronic documents of various types are typically not authored in a structured or consistent manner. In addition, in some cases the context of the user may be an important factor to be accommodated because the interpretation of the meaning of a given body of text by one reader is personal to that reader and may not be the same interpretation made by another reader.
For example, by recognizing that a given electronic document is a discussion email, as distinguished from a technical paper or a news item, a particular structure can be assigned to that text for purposes of analysis. This is because email messages are typically informal (colloquial), less structured, shorter, have less redundancy and are often continuations of earlier email messages. By contrast, technical papers typically comprise a formal language format and are themselves structured according to a standard format (such as having a title and section headings, an opening summary, a background section, etc.). Similarly, news items have associated with them a pyramid-type format, usually providing the key content within the first paragraph or two (see Mittal V. et al “Selecting Text Spans for Document Summaries: Heuristics and Metrics”, American Association of Artificial Intelligence 1999 Conference Proceedings).
It has been found that the specific type of the electronic document which is to be processed, referred to herein as the “application context”, can be determined from the document text and format and the environment of the text which is referred to herein as the envelope of the electronic document. For example, it can be determined whether the text has an ASCII or HTML format and whether it arrived as an email or an attachment or otherwise. Text which is correspondence will typically have an opening salutation such as “Dear John”, a main body of text and a signature block with one of the words “regards”, “truly”, “sincerely”, etc. For email discussions of an on-going nature they may have been forwarded or may be a part of a reply message and some of the content thereof may be indented by the de facto standard character “>”. Once the application context of the electronic document has been determined the highlighting process can be assisted by differentiating between the envelope and the text components of the document; for example, on the basis of this information any superfluous information such as the salutation and signature block may be identified and removed. The particular application context may also dictate the handling of certain information which is typically relevant to that context.
Additional context information relating to an electronic document, referred to herein as the “user context”, which can be useful to infer the meaning of the text of that document may be obtained from knowledge of the user. That is, knowledge of the specific user context might, in some cases, assist in a determination as to which components of a given body of text are relevant. One example of this which would apply to the optimal automation of a personal text highlighter used, say, for processing one's received electronic messages, is that an electronic document which has been recognized to be a product/service advertisement of the type (i.e. determined from the envelope, for example) which the user normally deletes, could simply be truncated without any analysis applied to it; this would occur where it has been learned from the user context that the particular user is not interested in the content of such a document. On the other hand, advertisements which are targeted to the user through pre-selected identifiers could instead be highlighted for the user. Further examples in which the user context may be effectively utilized include the situation where correspondence received from one sender may be more important to the user than correspondence from another sender, where the time of receipt of certain correspondence may determine a particular importance level to the user and where specific words may be used more frequently by the user and these might be associated with a particular degree of relevance. Thus, the behaviour pattern and the situation of the user provides additional context parameters on which a process for highlighting the key components of the text of an electronic document may be based.
Reference herein to “highlighting” means an electronic process of selecting the key components of a given body of electronic text (e.g. in the form of key words/phrases, key sentences or parts thereof and/or key elements thereof, and not simply a string of disjointed keywords), the result appearing analogous to that which would be obtained by the commonly used manual method of highlighting a printed copy of the text using a fluorescent ink marker.
SUMMARY OF THE INVENTION
In accordance with the invention there is provided computer-readable apparatus for highlighting the content of a user's electronic input document and producing therefrom an electronic output highlight document. An application context module is provided for determining with respect to the input document the type of document it is. A user context module determines the context of the user with respect to the input document. A highlighter module determines at least a portion of the
Abu-Hakima Suhayya
McFarland Connie P.
AmikaNow! Corporation
Hong Stephen S.
Hutton Doug
Maclean Cassan
LandOfFree
Apparatus and method for context-based highlighting of an... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and method for context-based highlighting of an..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for context-based highlighting of an... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3283630