Data mining apparatus and storage medium storing therein...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06671680

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a data mining apparatus for discovering an unknown rule hidden in data by a mathematical method such as clustering or classification and to a storage medium in which a data mining processing program has been stored. More particularly, the invention relates to a data mining apparatus for displaying an unknown rule discovered by data mining so that the user can easily understand it and for enabling the unknown rule to be externally utilized and to a storage medium in which a data mining processing program has been stored.
2. Description of the Related Arts
In recent years, attention has been paid to data mining for automatically discovering an unknown rule from a large amount of data of Giga bytes or Tera bytes accumulated for a long time till now by a mathematical method. The data mining has: a “discovery-like approach” to classify and refine information on the basis of a certain hidden rule, thereby automatically finding out information which cannot be manually found; and a “verificative approach” to analyze uncertain known information and add certainty to the information.
Hitherto, according to the data mining, an engine having an application interface is called and a result is reported. With respect to it, there are various methods of reporting the result. A display format with high visibility for each analyzing algorithm has not been established yet. Accordingly, although the data mining has high intelligent engine function and performance, the data mining is not introduced in general systems very often.
The data mining includes: clustering for classifying data having similar characteristics into clusters (classes) and extracting an unknown rule; and classification for extracting an unknown rule by expressing characteristics of a specific analysis item by a function or a profile using the other analysis items as condition values with respect to a group of data having a plurality of analysis items as targets. The clustering automatically collects similar data into the same group by using a conventional algorithm called a Word method or the like. In this case, the data can be divided into any number of groups in accordance with the designation of the user. In JP-A-11-15897, the results obtained by designating a certain division number and clustering data are plotted to axes of a plurality of analysis items of a parallel coordinate graph and a polygonal line of each record is overlapped thereto, thereby displaying. Although the clustering divides the data on the basis of the designated division number into groups, the optimum division number cannot be found soon even when the clustering result is expressed on the parallel coordinate graph. In order to obtain the optimum division number, the user pays attention to the axes of a plurality of analysis items, analyzes a tendency of the data, and judges which division number is the best, so that he finally knows the proper division number. However, when the division number is large or a range of division is wide, an extreme troublesomeness is caused to decide the proper division number.
On the other hand, the classification generally uses a decision tree or a recurrence tree. In many cases, a rule extracted by using the algorithm of the decision tree or recurrence tree is visualized in a format of a tree diagram which branches on the basis of condition values which are automatically formed.
However, the tree diagram for expressing the result of the classification tends to display a complicated multilayer in which a root is set as a start point, the tree diagram branches at multi-stage nodes, and each branch finally reaches a leaf. It is difficult to grasp a rule having significance from such a tree diagram. Information expressed in the tree diagram obtained as a result of the classification is merely formed as drawing information and used to discover a rule having significance from it by the user.
SUMMARY OF THE INVENTION
According to the invention, there is provided a data mining apparatus for improving a display of a rule discovered by data mining, thereby enabling the user to easily understand it and easily discover a rule having significance.
According to the invention, there is provided a data mining apparatus in which a rule discovered by data mining can be used by an external application.
According to the invention, there is provided a data mining apparatus for discovering an unknown rule included in a data group, comprising a clustering processing unit and a classification processing unit which function as a data mining engine.
1. Clustering
According to the invention, first, the clustering process has the following features.
(Simultaneous Display of the Classification Result and the Division Number)
The data mining apparatus of the invention comprises: a division number designating unit for designating a division range of 2-division to an arbitrary division number N; a clustering processing unit for classifying data having similar characteristics into a plurality of clusters (classes) every division number within a range of 2-division to the designated division number N with respect to a group of data having a plurality of analysis items as targets; and a display processing unit for simultaneously displaying a plurality of processing results obtained by the clustering processing unit.
Particularly, the display processing unit displays a parallel coordinate graph as a polygonal line by plotting the classification result of the designated division number N onto an axis of each analysis item and arranges the dividing axes of 2-division to the designated division number N, for example, N=5-division, thereby simultaneously displaying a transition of the division and a connection between the classification results by a polygonal line. In this manner, by simultaneously arranging and displaying the transition of the division based on the display of the dividing axes of 2-division to the designated division number of, for example, 5-division and the clustering results at the designated division number, it is necessary to again analyze the reason why the data has been classified into the specific group among the divided groups from another viewpoint, thereby enabling the proper division number to be easily determined. In other words, by simultaneously comparing a plurality of analysis items, which grouping is the best can be known when customer information or the like is grouped. The clustering can be used in a specific business field.
(Annual Ring Display of the Classification Results and the Division Numbers)
The display processing unit converts the classification result of each of the division numbers from 2-division to the designated division number N into an annual ring diagram and displays it. The annual ring diagram expresses the division numbers in the increasing order from the inner annual ring toward the outer annual ring and expresses a data distance between the clusters divided into widths (thicknesses) in the radial direction of the annual ring, thereby allowing the division number of the annual ring having the largest width to be recognized as a proper division number. The clustering is characterized in that a large amount of data is divided into groups having similar tendencies by a unique algorithm, and the user designates the division number upon dividing. The user also judges whether the designated division number is proper or not. According to the annual ring diagram of the present invention, the proper division number can be presented to the user by displaying the significance of the division every division number. Consequently, the grouping based on a plurality of analysis items such as customer information and the like can be significantly performed.
2. Classification
The invention has the following characteristics as a classification.
(Folding of the Node)
The data mining apparatus of the invention comprises: a classification processing unit for forming characteristics of a specific analysis item among a plurality of

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Data mining apparatus and storage medium storing therein... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Data mining apparatus and storage medium storing therein..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data mining apparatus and storage medium storing therein... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3181190

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.