Similar document retrieval method using plural similarity...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C706S014000

Reexamination Certificate

active

06301577

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a method for retrieving similar documents of a reference document from a plurality of retrieval target documents, and a recommended article notification service system utilizing the similar document retrieval method.
2. Description of the Background Art
The well known retrieval models for retrieving similar documents include a vector space model such as tf·idf and a probabilistic model in which a similarity with respect to a retrieval requested document is expressed by a ratio of a relevant document probability and an non-relevant document probability with respect to a retrieval request. An example of the probabilistic model is disclosed in Iwayama et al.: “Hierarchical Bayesian Clustering for Automatic Text Classification”, Proceedings of IJCAI-95, pp. 1322-1327, 1995, for example. When the vector space model and the probabilistic model are compared, the probabilistic model has a clearer meaning with respect to a value of the similarity (distance), and the probabilistic model is expected to have a superior precision at a time of clustering as shown by Iwayama et al. mentioned above, so that the probabilistic model is considered superior.
FIG. 1
is a graph showing a distribution of similar documents and dissimilar documents in the case where the similarities of a plurality of target documents are calculated in order to retrieve similar documents with respect to a given reference document, using the probabilistic model of Iwayama et al. In
FIG. 1
, the horizontal axis represents the similarity while the vertical axis represents a relative frequency, and black rectangle marks indicate a plot of the similarities of similar documents while white rectangle marks indicate a plot of the similarities of dissimilar documents. Note that this distribution was calculated using 10,000 target documents extracted from the Published Japanese Patent Applications between 1993 and 1999, with respect to 21 retrieval requests. Also, the comprehensive similarity Judgment for these 10,000 Published Japanese Patent Applications with respect to each retrieval request was made by experts.
As shown in
FIG. 1
, in the high similarity region such as a region with the similarity not greater than −1.0, there are hardly any dissimilar document so that the similar documents and the dissimilar documents can be separated almost completely. It can be seen that a distribution of the similar documents is flatter and more widespread compared with a distribution of the dissimilar documents. For this reason, there are many portions where the separation from the dissimilar documents is not realized very well because of the low similarities of some similar documents.
In the similar document retrieval using the probabilistic model of Iwayama et al. that is considered as a superior probabilistic model, the result of retrieval experiments using Iwayama et al's similarity measure can be analyzed in detail to reveal that, as can be seen in a graph of
FIG. 5
, the similar documents with relatively high similarities can be appropriately separated from the dissimilar documents, but there are many dissimilar documents at somewhat lower similarities so that the similar documents and the dissimilar documents coexist there, and they lowers the overall retrieval precision so that it is difficult to obtain the sufficient retrieval precision.
SUMMARY OF THE INVENTION
It is therefore an object of the present invention to provide a similar document retrieval method capable of realizing an improved retrieval performance by combining two or more similarities calculated by two or more different methods.
It is another object of the present invention to provide a recommended article notification service system utilizing this similar document retrieval method.
According to one aspect of the present invention there is provided a similar document retrieval method for retrieving similar documents of a reference document from a plurality of retrieval target documents, comprising the steps of: calculating similarities of each one of the plurality of retrieval target documents with respect to the reference document by using each one of two or more similarity calculation methods separately; retrieving the similar documents of the reference document by carrying out a discrimination analysis with respect to each one of a plurality of similarities calculated by using each one of the two or more similarity calculation methods separately.
According to another aspect of the present invention there is provided a recommended article notification service system for delivering mails describing recommended articles to users through Internet, comprising: a profile generation unit configured to generate a profile of each user; a recommended article selection unit configured to select the recommended articles from a plurality of articles in accordance with the profile of each user, by utilizing a similar document retrieval method, where the similar document retrieval method retrieves the recommended articles by calculating similarities of each one of the plurality of articles with respect to the profile of each user by using each one of two or more similarity calculation methods separately, and carrying out a discrimination analysis with respect to each one of a plurality of similarities calculated by using each one of the two or more similarity calculation methods separately; and a delivery unit configured to deliver a Web mail containing information of the recommended articles to each user through the Internet.
According to another aspect of the present invention there is provided a method for providing a recommended article notification service for delivering mails describing recommended articles to users through Internet, comprising: generating a profile of each user; selecting the recommended articles from a plurality of articles in accordance with the profile of each user, by utilizing a similar document retrieval method, where the similar document retrieval method retrieves the recommended articles by calculating similarities of each one of the plurality of articles with respect to the profile of each user by using each one of two or more similarity calculation methods separately, and carrying out a discrimination analysis with respect to each one of a plurality of similarities calculated by using each one of the two or more similarity calculation methods separately; and delivering a Web mail containing information of the recommended articles to each user through the Internet.
Other features and advantages of the present invention will become apparent from the following description taken in conjunction with the accompanying drawings.


REFERENCES:
patent: 5023912 (1991-06-01), Segawa
patent: 5550928 (1996-08-01), Lu et al.
patent: 5625748 (1997-04-01), McDonough et al.
patent: 5689584 (1997-11-01), Kobayashi
patent: 5781663 (1998-07-01), Sakaguchi et al.
patent: 5812998 (1998-09-01), Tsutumi et al.
patent: 5907836 (1999-05-01), Sumita et al.
patent: 5977964 (1999-11-01), Williams et al.
patent: 5999893 (1999-12-01), Lynch, Jr. et al.
patent: 6157921 (2000-12-01), Barnhill
Schroder et al “Interctive Learning and Probabilistic Retrieval in Remote Sensing Image Achives”, IEEE 2000, pp. 2288-2298.*
Vasconcelos et al “A Bayesian Framework for Semantic Content Characterization”, IEEE 1998, pp. 566-571.*
Leistensnider et al “A Simple Probabilistic Approach to Classification and Routing”, IEEE 1997, pp. 750-754.*
“Hierarchical Bayesian Clustering for Automatic Text Classification”, Iwayama, Makoto and Tokunaga, Takenobu, Proceedings of IJCAI-95, pp. 1322-1327, 1995.
Leistensnider et al “A simple probabilistic approach to classification and routing”, IEEE 1997, pp. 750-754.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Similar document retrieval method using plural similarity... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Similar document retrieval method using plural similarity..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Similar document retrieval method using plural similarity... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2575878

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.