Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-06-02
2001-12-25
Amsbury, Wayne (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06334132
ABSTRACT:
RELATED APPLICATIONS
The application is related to EP97302616.4 filed on Apr. 16, 1997; and PCT/GB98/01119 filed on Apr. 16, 1998.
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention lies in the field of methods and apparatus for analysing data and finds particular application in summarising data.
2. Related Art
Recent advances in technology, such as CD-ROMs, Intranets and the World Wide Web have provided a vast increase in the volume of information resources that are available in electronic format.
A problem associated with this increase in resources is that of locating and identifying sets of data (i.e. data sets, examples of which include magazine articles, news articles, technical disclosures and other information) of interest to individual user of these systems.
Information retrieval tools such as Search engines and Web guides are one means for assisting users to locate data sets of interest. Proactive tools and services (e.g. News groups, broadcast services such as the POINTCAST™ system available at www.pointcast.com or tools like the JASPER agent detailed in the applicants co-pending, published international patent application PCT GB96/00132,) may also be used to identify information that may be of interest to individual users.
Once data sets of interest have been located by the information retrieval tool, the user is commonly provided with a summary of the data set. “Patterns of Lexis in Text (Describing English Language Series)” Michael Hoey, Oxford University Press, 1991 ISBN 0194371425 details one approach to summarising data sets.
A typical summary produced by a prior-art method will detail the primary subject matter (i.e. the main topic) of the data set. However, target data items, which the user is actually interested in are often not the main topic of the data set located. Under these circumstances, a summary which only gives the main topic will not identify how or why the target data items are relevant to the data set, or the location of these target data items within the data set.
By way of example, the target information may be the birth date of the author “D. H. Lawrence”. A search engine may locate this information in an article whose primary subject matter is a critique of his novel “Sons and Lovers”. An information retrieval tool, having found the birth date, would select the critique and produce a summary. This summary however will not contain the birth date of D. H. Lawrence as the author's birth date would be of almost no importance to the main topic in a critique of “Sons and Lovers”. Nor would the summary identify where in the critique the information about the author's birth date appears.
SUMMARY OF THE INVENTION
According to a first aspect of the present invention there is provided apparatus for summarising data sets, the apparatus having:
an input for receiving a data set to be summarised;
sectioning means for dividing said received data set into one or more sections according to pre-determined criteria;
ranking means operable for each said section to compare data within the said section with one or more target data items and for calculating a ranking value for the said section, said ranking value being dependent on the outcome of said comparisons for the said section; and
selecting means for compiling a customised summary of the data set by selecting one or more of said one or more sections according to their respective ranking values.
For instance, sections having a ranking value which is above (or below, depending on the circumstances) a preselected threshold might be selected.
According to a second aspect of the present invention there is provided a method for generating a customised summary of a data set, the method including the steps of:
i) receiving, as input, a data set to be summarised;
ii) dividing said data set into sections according to predetermined criteria;
iii) comparing data items in each said section against one or more target data items;
iv) calculating a ranking value for each said section in dependence upon the outcome of the respective said comparisons; and
v) compiling a customised summary of said data set by selecting one or more of said one or more sections according to their respective ranking values.
Preferably, target data items can be loaded to the target data item store by a user, for instance either directly or via a user profile. An advantage of such embodiments of the invention is that they enable a summarising tool to generate a summary of a data set that includes target data items specified by a user for whom the summary is generated.
There are many additional features which may be provided separately or in combination, by preferred embodiments of the present invention and at least some of these are discussed as follows.
Data sets may be divided into sections according to sentences, paragraphs, and other punctuation. Alternatively, other formats such as pages and chapters and headings may form section boundaries.
Within the context of summarising data sets, a key data item is a data item that forms a substantive component of the information contained within the data set. For example, in a document consisting of written prose, articles and conjunctions (for instance words such as ‘it’, ‘are’, ‘as’, ‘the’, ‘when’, ‘they’, ‘by’ etc.) are typically not considered to be key data items. This is because they do not identify subject matter contained within the data set.
According to preferred features of the present invention, the apparatus includes:
means for identifying one or more key data items in each said section according to a pre-determined stop list;
calculating means operable for each said section to calculate one or more distribution values, each said distribution value representing a different pre-determined measure of the distribution, in said data set, of key data items identified in the said section; and
adjustment means for adjusting said ranking value for each said section according to the respective said one or more distribution values.
Preferably the method includes the steps of:
a) identifying key data items within each said section from step ii) according to a pre-determined stop list;
b) calculating, for each said section, one or more distribution values each representing a pre-determined measure of the distribution of the key data items of the said section in said data set; and
c) adjusting said ranking value from step iv) for each said section in dependence upon the respective said one or more distribution values.
Refining ranking values according to the distribution of key data items within the data set allows the summary to detail target data items within the context of the main topic of the data being summarized. This increases the user's ability to determine how relevant a particular data set is for their intended purpose.
Preferably the apparatus and method calculate the distribution value for each section by: determining a first score for each key data item in each section; and for each section, summing said first scores for each key data item, wherein said first score of each key data item is calculated as the number of times the key data item of consideration occurs in the data set less the number of times the key data item of consideration occurs in the section of consideration.
This feature of the invention is a measure of how frequently the key data items of a particular section occur throughout the remainder of the data set being analysed. It is one measure of the distribution of key data items throughout the data set.
Preferably said apparatus and method calculate a second score for each key data item and either calculate or modify said distribution value dependent on said second scores, said second scores being calculated by; assigning a position value to each section of the data set corresponding to the position of the section within the data set; and for each key data item of the data set, performing the calculation of subtracting the position value of the first section in which the key data item of consideration occurs from the position value of t
Amsbury Wayne
British Telecommunications PLC
Nixon & Vanderhye P.C.
Pardo Thuy
LandOfFree
Method and apparatus for creating a customized summary of... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for creating a customized summary of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for creating a customized summary of... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2586790