Data display method and apparatus for use in text mining

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06738786

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to a data display method and a data display apparatus in which various data is acquired, from a data base of documents beforehand registered thereto, for a set of specified documents and the acquired data is displayed.
With recent development of word processors, personal computers, and the like, the amount of electronic information generated by such word processors and personal computers are increasing. Moreover, the amount of electronic information available via worldwide web (WWW), e-mail, newswire, and the like are rapidly increasing. In firms and companies, it is quite important to analyze the contents of such electronic information for efficient use thereof.
In general, most electronic information is described in texts, that is, in a format of statements. The text information, for example, the contents of a questionnaire of free answer type cannot be easily analyzed by computers or the like and hence have been heretofore analyzed by human power. However, the information analysis by human power is attended with problems as follows. (1) The pertinent person in charge of analysis must read all documents for the processing. Therefore, when the amount of documents is largely increased, this method is not practical. (2) The information analysis is carried out according to subjective judgement of the user. Therefore, the results of information analysis vary depending on knowledge and skill of the user.
Therefore, an increasing need exists for a text mining technique as a technique to support the information analysis by human power. Agrawal et al U.S. Pat. No. 6,006,223 entitled “Mapping Words, Phrases Using Sequential-Pattern To Find User Specific Trends In a Text Database” issued on Dec. 21, 1999 concretely describes a processing procedure of text mining. This will be referred to as prior art
1
herebelow. In the text mining, a search or retrieval is made through text information beforehand registered to detect new knowledge according to, for example, or coincidence of words and phrases, a tendency of occurrence of words and phrases contained in the information to be processed. Specifically, for a set of processing objective documents, an analysis axis representing points of view for analysis is set to acquire words and phrases representing features or characteristics of a set of documents according to a correspondence to constituent components of the analysis axis. In this expression, “to acquire words and phrases according to a correspondence to constituent components of the analysis axis” means, for example, “to acquire words and phrases which cooccur in a predetermined range with constituent components of the analysis axis.” By referring to the words and phrases, the user can recognize a tendency of a set of documents.
FIG. 2
shows an example of analysis in which a set of news items of “0157” in newspapers are analyzed using “the month of report or publication of the pertinent news item” as the analysis axis. That is, the analysis condition is expressed as “news item reported in ‘July’”, “news item reported in ‘August’”, and the like. In the analysis using the publication month as the analysis axis, words “infection, patient, symptom, hospitalization, etc.” are acquired in association with “July” as a component of the analysis axis; words “damage, provision of means, hospitalization, group infection, etc.” are acquired in association with “August” as a component of the analysis axis; words “sales amount, minus, foods, perishable, etc.” are acquired in association with “September” as a component of the analysis axis; and so on. By referring to the words, the user can obtain a tendency that the set of documents contains topics: “Patients infected with “0157 disease-causing bacteria” are hospitalized” in “July”, “Group infection with “0157 bacteria” through provision of meals” in “August”, and “Sales amount of perishable foods and the like lowered due to influence of 0157”.
FIG. 3
shows an example of a processing procedure of prior art
1
in a problem analysis diagram (PAD). In step
300
, a set of documents is specified as an object of the text mining. In a case of a questionnaire in which a pertinent document database contains documents collected according to predetermined points of view, the database is directly specified as an objective document set. In a case of items of newspapers in which the database contains documents gathered according to various points of view such as politics, economy, sports, and the like, a full text search is conducted according to an analysis purpose of the user to specify a set of documents. “A full text search” is a technique in which all texts of the documents as the processing objects are inputted to a pertinent computer system to thereby generate a database in a registration stage. In a retrieval stage, in response to a character string specified by the user, all documents containing the character string are retrieved from the database. For example, Kato et al U.S. Pat. No. 6,094,647 entitled “Presearch Type Document Search Method and Apparatus” assigned to the present assignee describes the full text search in detail. This technique will be referred to as prior art
2
herebelow. In step
301
, characteristic words and phrases, namely, words and phrases which characterize the contents are extracted from the set of documents specified in step
300
. The characteristic words and phrases may be extracted by referring to a dictionary or by using statistical information. The characteristic words and phrases are not limited to words. For example, when the dictionary contains a complex word including two or more words, for example, “disease-causing colon bacillus”, the characteristic words and phrases extracted in step
301
may include tow or more words. Conversely, the characteristic words and phrases to be extracted may be limited to a word. In step
302
, an analysis axis is set as points of view for the analysis. In this example, “date”, “age”, “sex”, and the like assigned as bibliographical information items of a document are specified as the analysis axis or words and phrases specified by the user are set as constituent components of the analysis axis. For example, when it is desired to acquire difference of awareness or consciousness by age from a questionnaire, the age is set as the analysis axis. In this situation, values representing ages such as “20” and “30” are specified as components of the analysis axis. Finally, in step
303
, processing of step
304
is repeatedly executed for the components of the analysis axis set in step
302
. In step
304
, a search is made through the characteristic words and phrases extracted in step
301
to extract words and phrases strongly related to the components of the analysis axis, for example, a cooccurrence word/phrase which cooccurs in a predetermined range. The predetermined range is specified, for example, “within one document”, “within one paragraph”, “within one sentence” or “within m or n words (m and n are integers).” In prior art
1
, words and phrases are obtained by establishing correspondence to the components of the analysis axis to thereby help the user recognize a tendency of the set of documents. As above, since the words and phrases characterizing the pertinent set of documents are automatically obtained by establishing correspondence to the components of the analysis axis in prior art
1
, the load imposed on the user can be reduced and the difference in the analysis results between users can be minimized.
SUMMARY OF THE INVENTION
According to prior art
1
, the words and phrases characterizing the pertinent set of documents are automatically obtained by establishing correspondence to the components of the analysis axis. Therefore, it is possibly to minimize the load imposed on the user described above, and the fluctuation or dispersion of the analysis resultant from respective knowledge and skill of users can be minimized.
However, prior art
1
is attended with a problem as below. As can be seen from an analysis

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Data display method and apparatus for use in text mining does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Data display method and apparatus for use in text mining, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data display method and apparatus for use in text mining will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3206275

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.