Key word deriving device, key word deriving method, and...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06836772

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a key word deriving device, a key word deriving method, and a storage medium containing a key word deriving program. More particularly, it relates to a key word deriving device and a key word deriving method for deriving a key word from a large amount of data that characterizes the data by performing a statistical process on a partial region of the data, and a storage medium containing a program for deriving the key word.
2. Description of the Related Art
Japanese Unexamined Patent Publication No. HEI 08(1996)-202737 discloses a method for deriving a key word from a large amount of data by performing a statistical process on a partial region of the data. In this publication, a specification of a patent is used as an example, where the whole data are divided into individual paragraphs by referring to preliminarily prepared index words such as “Title of the Invention” or “What is claimed is” to determine the number of concurrences of each word with another word in the same sentence, the number of concurrences of each word with another word in the same paragraph, and the number of appearances of each word in the whole data, and then, after suitable coefficients are multiplied to these numbers, an arithmetic sum is calculated to determine an importance of each word to derive a key word.
In other words, the key word is not determined simply by using a frequency of appearances of each word, but words that appear concurrently with each other in the same sentence or in the same paragraph are regarded as having a greater importance (more relevance as a key word).
However, according to the key word deriving method disclosed in the above-mentioned Japanese Unexamined Patent Publication, the paragraphs are divided by using preliminarily prepared index words (such as “Title of the Invention”) on the basis of the special characteristics of the target data, so that the method of dividing the paragraphs is fixed. Also, the derived key word is a key word for the whole target data, so that the key word for each paragraph is not derived.
Therefore, there does not occur any great problem if the target data are such that each paragraph has its respective fixed meaning such as in a patent specification and the contents are complete in themselves in one document of the specification. However, the key word deriving method disclosed in this publication cannot be applied to a case in which the target data are, for example, a set of (electronic) mail sentences received/sent by an individual person or a set of news sentences in a day or in a month, i.e. when the target data are a set of data divisible by various parameters such as the sender/receiver or the time of occurrence (date and time), because it is difficult to grasp the contents of the whole target data.
SUMMARY OF THE INVENTION
The present invention provides a key word deriving device comprising: a document data acquiring section for acquiring document data each having a parameter previously added thereto; a document data dividing section for dividing the acquired document data for each type of the parameter by distinguishing the types of parameters of the document data; a document table registering section for assigning the type of the parameter to the divided document data as divided data and for registering, in a document table, words contained in the divided data and their statistical amounts; a word table registering section for calculating and registering, in a word table, the statistical amounts of the words in the divided data having the same type of the parameter added thereto by referring to the document table; an importance table registering section for calculating an importance of each word in accordance with a preliminarily prepared importance calculation formula by referring to the word table and for registering the importance of each word in an importance table; and a key word deriving section for deriving a word having a higher importance as a key word by referring to the importance table.
According to the present invention, various and numerous document data can be divided appropriately by using a parameter added to each document data, and an importance of each word is calculated from the words contained in the divided data and their statistical amounts, and a word having a high importance is derived as a keyword, thereby enabling derivation of the keywords which show more accurately the characteristics of each of the divided data in various and numerous document data.


REFERENCES:
patent: 5293552 (1994-03-01), Aalbersberg
patent: 5642502 (1997-06-01), Driscoll
patent: 5708825 (1998-01-01), Sotomayor
patent: 5724571 (1998-03-01), Woods
patent: 5857179 (1999-01-01), Vaithyanathan et al.
patent: 5907840 (1999-05-01), Evans
patent: A8202737 (1996-08-01), None
H.P. Edmundson, “New Methods in Automatic Extracting”, Journal of the Association for Computing Machinery, vol. 16, No. 2, Apr., 1969, pp. 264-285.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Key word deriving device, key word deriving method, and... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Key word deriving device, key word deriving method, and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Key word deriving device, key word deriving method, and... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3313319

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.