Text mining method and apparatus allowing a user to analyze...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06757676

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention is intended for a data base of registered documents, and relates to a document processing technique for acquiring various kinds of information concerning a specified document set.
With the spread of word processors and personal computers in recent years, computerized information generated by them is increasing. Furthermore, computerized information available from WWW (World Wide Web), electronic mail, electronic news, and so on is also rapidly increasing. Therefore, it has become an important problem in enterprises to analyze contents of the computerized information and make the most thereof effectively.
In general, a large quantity of computerized information is described in many cases in a text form, i.e., in a composition form. Such text information such as questionnaires of free answer form is difficult to mechanically analyze, and consequently has heretofore been subjected to analysis using human work. This analysis using human work has the following problems.
(1) It is necessary to read all documents to be processed. In the case where the documents are increased, the human analysis is not practical.
(2) Since an analysis is made on the basis of subjective judgment, the result differs depending on the knowledge of the analyst and the degree of skill.
As such a technique of supporting the human analysis, the need for text mining is becoming strong. The processing procedure of text mining is described concretely in “Text mining—Knowledge finding by automatic analysis of massive document data—”, Nasugawa et al. Journal of Information Processing Society of Japan, Vol. 40, No. 4, April 1999 pp. 358-364, and “Text mining based on keyword association”, Watanabe et al. Information Processing Society of Japan, Meeting of Information Learning Foundation 55-8, Jul. 16, 1999, pp. 57-64. Hereafter, this is referred to as related art 1. The text mining is intended for text information registered beforehand, and finds new knowledge on the basis of coincidence relations and emergence tendency of words and/or phrases included in information to be processed. To be concrete, as regards a set of documents to be processed, an axis serving as a visual point for making an analysis is set, and words and/or phrases representing a feature of the document set are acquired in association with components of the axis. Here, “words and/or phrases are acquired in association with components of the axis” means “words and/or phrases coincident with components of the axis in a predetermined range are acquired”. By referring to the words and/or phrases, the user can grasp the tendency of the document set. For example, an example of the case where a set of newspaper accounts concerning “pathogenic colon bacilli O157” is analyzed by taking a publication month as the axis is shown in FIG.
2
. By making an analysis by taking a publication month as the axis, words and/or phrases “infection, patient, symptoms, hospitalization, . . . ” are acquired in association with “July” which is a component of the axis. Words and/or phrases “shock, school lunch, hospitalization, mass infection, . . . ” are acquired in association with “August”. Words and/or phrases “sales, minus, foodstuffs, perishables, . . . ” are acquired in association with “September”. By referring to the words and/or phrases, the user can grasp the tendency that a topic “patients infected with O157 are hospitalized” exists in the document set in “July”, the tendency that a topic “mass infection with O157 is caused by school lunch” exists in the document set in “August”, and the tendency that a topic “the sales of perishables have fallen under the influence of O157” exists in the document set in “September”. In a PAD (Problem Analysis Diagram) diagram of
FIG. 3
, the processing procedure of the related art 1 is shown. First, at step
300
, a document set which becomes the processing subject of text mining is defined.
In the case of a data base of documents, such as questionnaires, collected on the basis of a certain view-point beforehand, it is set as a document set to be processed as it is. In the case of a data base of documents, such as newspaper accounts, including diverse viewpoints of politics, economy, sports, and so on, full text search is conducted according to the analysis object of the user and the document set is defined. The full text search is such a technique that a full text in documents to be processed is inputted to a computer system to form a data base at the time of registration and the data base is searched at the time of retrieval for all documents including a character string specified by a user. The full text search is described in detail in “Present situation and future of index processing fast full text search technique which holds the key”, Majima, Nikkei byte, October 1996, pp. 158-167. Hereafter, this is referred to as related art 2. Subsequently, at step
301
, words and/or phrases distinctive of the contents (hereafter referred to as distinctive words and/or phrases) are extracted from the document set preset at the step
300
. The distinctive words and/or phrases may be extracted by referring to a dictionary, or may be extracted by using statistical information. At step
302
, an axis serving as a visual point for making an analysis is set. Here, date, age, sex distinction, or the like provided as bibliography information of documents is set as an analysis axis, and specified words and/or phrases are set as components of the analysis axis. For example, in the case where it is desired to know difference of consciousness depending upon the age from questionnaires, the age is set here as the analysis axis. In this case, numerical values, such as “20” and “30”, representing the age become components of the analysis axis. Finally at step
303
, words and/or phrases coincident with a component of the axis in a predetermined range are acquired. As the predetermined range, the same document, the same paragraph, the same sentence, m words, n characters (where m and n are integers), or the like can be used. As heretofore described, the related art 1 supports the user in grasping the tendency of the document set, by acquiring words and/or phrases in association with the components of the analysis axis. Thus, in the related art 1, words and/or phases distinctive of the document set to be processed are automatically acquired in association with components of the analysis axis. Therefore, it is possible to lighten the burden of the analyst and reduce the difference in analysis result between analysts.
In the related art 1 heretofore described, words and/or phases distinctive of the document set to be processed are automatically acquired in association with components of the analysis axis. Therefore, it becomes possible to lighten the burden of the analyst and reduce the difference in analysis result caused by the knowledge and degree of the skill of the analysts.
However, the related 1 has problems hereafter described. As shown in
FIG. 3
, in the related art 1, an analysis is made on the basis of only the coincidence relations with individual components of the analysis axis. In the case where it is desired to analyze coincidence relations with a plurality of different visual points, i.e., combinations of a plurality of analysis axes, it is necessary to conduct text mining for each of analysis axes, and the user must combine the results and analyze them. When the user makes an analysis, the user begins the analysis from such a state that the user does not know the contents of the document set. Therefore, it is difficult to determine one visual point from the beginning. However, the related art 1 has the above described problems, and an analysis cannot be made in combinations of a wide variety of visual points.
SUMMARY OF THE INVENTION
An object of the present invention is to provide a document processing method and system, and a computer readable storage medium which provide a text mining function allowing the user to analyze the contents of a document set from a plurality of visual p

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Text mining method and apparatus allowing a user to analyze... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Text mining method and apparatus allowing a user to analyze..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Text mining method and apparatus allowing a user to analyze... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3357377

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.