Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2002-09-03
2003-12-16
Rones, Charles (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06665667
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to a document retrieving and delivering technique in which an electronic document is retrieved according to a retrieval condition registered by a user in advance and documents satisfying the condition are delivered to the user.
Recently, a large amount of electronic documents (to be referred to as texts herebelow) have been delivered at every moment to users through an electronic mail or e-mail, electronic news, and the like. Information sources which transmit information through the World Wide Web (WWW) are rapidly increasing and hence an immense amount of texts have been collected from such information sources using an information collecting robot or the like. There consequently arises a need for a document retrieving and delivering system in which texts containing information requested by a user are retrieved therefrom and are delivered to the user.
JP-A-10-27182 (to be referred to as prior art 1) describes such a document or text retrieving and delivering system. In this system, retrieval condition expressions of a plurality of users are combined with each other to process condition expressions of a plurality of users through one text scanning operation.
However, in prior art 1, the user is required to generate retrieval condition expressions, which leads to two problems as follows.
First, when a rarely used word is specified in a retrieval condition or when generally used words are complicatedly combined with each other in a retrieval condition specified, there appears texts which cannot be retrieved (retrieval leakage).
Second, in contrast with the first problem, when a simple retrieval condition expression containing only generally used words is specified, there are possibly retrieved many documents or texts (to be referred to as retrieval noise) not suitable for an object of the retrieval. This leads to a problem that documents desired by the user cannot be easily attained.
In short, to obtain retrieval results in which texts not retrieved as above are minimized and in which the noise is reduced, it is difficult for the user to appropriately generate a retrieval condition expression.
Japanese Patent Application Serial No. 10-148721 (to be referred to as prior art 2) describes a technique to improve two problems above in a document retrieval system in which documents containing information desired are retrieved from documents (to be referred to as registered documents herebelow) registered to a text database.
In this technique, a keyword (called “feature character string” in prior art 2) is extracted from a text (to be referred to as a seed text) exemplified as a retrieval condition to calculate similarity of the seed document with respect to registered documents.
In prior art 2, the user needs only to exemplify a seed document containing information desired. Namely, the user is relieved from the troublesome job to select appropriate retrieval terms for a retrieval condition expression. The user then instructs execution of retrieval to view retrieval results sorted according to the similarity. Therefore, even when the retrieval results include some retrieval noise, the user can easily attain necessary information.
Next, description will be given of an outline and problems of the prior arts above.
Referring to
FIG. 2
, an outline of prior art 1 will be described.
In this example, three users, i.e., users 1 to 3 have registered retrieval condition expressions to a document retrieving and delivering system, i.e., document containing “new” and “car”, document containing USA, and document containing used and car, respectively. Under this condition, a scanning operation is conducted using a text collected “price of this new car is . . . ” to determine whether or not the three conditions are satisfied.
The retrieval condition expressions registered by the users are analyzed to extract retrieval terms “new”, “car”, “USA”, and “used”.
The number of retrieval terms extracted is stored for each user in a retrieval term count table. For example, from retrieval condition expression of user
1
, i.e., document containing “new” and “car” registered by user
1
, two retrieval terms “new” and “car” are extracted and hence “2” is stored in an associated field of the table. In a similar fashion, “1” and “2” are stored in associated fields of the table for users
2
and
3
, respectively.
Next, the system creates a finite automaton to collate all retrieval terms extracted.
In the finite automaton in
FIG. 2
, a circle indicates a state of the automaton and an arrow denotes a state transition. A character next to the arrow represents input characters which cause the transition of the arrow. A numeral in the circle designates a state number of the automaton state. This example does not include an arrow to an initial state to be used when a character not indicated in the automaton is inputted (to be called a failure herebelow).
The system then forms a user list including elements each including a user identifier of a user having specified a retrieval term. The list is linked with retrieval term detection states of the automaton respectively associated with. In this example, when “car” is collated, the system refers to an associated user list item according to the last state “3”. This indicates that users
1
and
3
have specified “car”.
Description will next be given of the scanning of a text “price of this new car is” in the automaton shown in FIG.
2
. In this example, it is detected that the text includes partial character strings in which “car” or “new” appears. In this automaton, a retrieval term having a small circle at an end thereof means that a partial character string matching the term exists in the text. Since partial character strings matching with “car” or “new” appear in the text in
FIG. 2
, end states
3
and
6
are assigned with a small circle.
In the texts, the number of retrieval terms matching partial character strings in the text are counted for each user and is stored in a retrieval term appearance count table. For example, since the matching state is detected for “new” and “car” or user
1
, “2” is set to the count value. Only car is matching for user
3
, “1” is counted. For user
2
, the matching state does not occur for any partial character strings, and hence the counting is not achieved and “0” is kept unchanged for the count value.
The retrieval term count table in which the retrieval term counts extracted from the retrieval condition expressions are stored is compared with the retrieval term appearance count table in which the numbers of retrieval terms appearing in partial character strings in the text are stored. When these tables match each other, it is assumed that the retrieval condition expressions of the user are satisfied and hence the text is delivered to the user. In
FIG. 2
, the retrieval term count is “2” for user
1
in both tables and hence the text is delivered to user
1
. The retrieval term counts are respectively different from each other for users
2
and
3
and hence the text is not delivered to users
2
and
3
.
Prior art 1 has been briefly described.
In accordance with prior art 1, it is possible to implement a document retrieving and delivering system in which a text matching retrieval condition expressions given can be delivered to the user through one scanning operation.
However, the user must generate retrieval condition expressions in prior art 1. There consequently arises a problem, namely, it is not easy for the user to appropriately generate retrieval condition expressions.
Prior art 2 has been proposed to improve the problem above in a document retrieval system.
Referring now to
FIG. 20
, an outline of prior art 2 will be described.
Prior art 2 is a technique to extract keywords from a sentence of a language, e.g., Japanese not using a separation code between words.
FIG. 20
shows an example to extract keywords (to be described in accordance with a name “tokuchomojiretsu (feature character string)” in prior art 2 herebelow) from a seed document “. . . . Keitaidenwa no
Inaba Yasuhiko
Matsubayashi Tadataka
Okamoto Takuya
Sugaya Natsuko
Tada Katsumi
Mattingly Stanger & Malur, P.C.
Rones Charles
Veillard Jacques
LandOfFree
Method of and an apparatus for retrieving and delivering... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of and an apparatus for retrieving and delivering..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of and an apparatus for retrieving and delivering... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3183312