Method and apparatus for determining rule in database

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06317735

ABSTRACT:

TECHNICAL FIELD
The present invention relates to a prediction of an objective numeric attribute value of data in a database, more particularly to a prediction of an objective numeric attribute value of data in a database, using a decision tree or a regression tree. The decision tree is a tree constituted for predicting whether a true-false attribute value of data is true or false, and the regression tree is a tree constituted for predicting a numeric attribute value of data.
BACKGROUND
Japanese Published Unexamined Patent Application No. 09-179883 discloses a method comprising the steps of: preparing a plane having two axes corresponding to two predicative numeric attributes of data in a database and divided into a plurality of rectangular buckets; storing the number of data included in each of the buckets so as to correspond to the bucket, as well as the number of data included in each of the buckets, whose true-false attribute value is true; segmenting a bucket region which is convex to one axis on the plane from the plane according to predetermined conditions, and deriving an association rule among the data using the segmented region. The object of this gazette is to derive the association rule among the data in the database. Since the region is constituted by a group of buckets connected to each other, the region is squarish in shape. A paper (paper 1: “Computing Optimized Rectilinear Regions for Association Rules,” K. YODA, T. FUKUDA, Y. MORIMOTO, S.MORISHITA, and T. TOKUYAMA, in KDD-97 Proceedings Third International Conference on Knowledge Discovery and Data Mining, pp. 96-103, The AAAI Press, ISBN 0-1-57735-027-8) discloses a method for segmenting a region in a rectilinear convex, which comprises rectangular buckets, from a plane according to predetermined conditions unlike the above mentioned gazette. Also in this paper, it is intended to derive an association rule among data in a database. Since the region is defined by a group of rectangular buckets connected to each other, the region is squarish in shape.
Moreover, a paper (paper 2: “Efficient Construction of Regression Trees with Range and Region Splitting,” Y. MORIMOTO, H. ISHII and S. MORISHITA, in Proceeding of the Twenty-third International Conference on Very Large Data Bases, pp 166-175, August 1997) discloses a method in a regression tree which comprises the steps of: preparing a plane having two axes corresponding to two predicative numeric attribute of data in a database and divided into a plurality of rectangular buckets; storing the number of data included in each of the buckets and a sum of an objective numeric attribute value of data so as to correspond to each bucket; segmenting a bucket region which minimizes the mean-squared error of the objective numeric attribute value from the plane; and generating a node concerning the data included in the segmented region and a node concerning data outside the region. The regression tree itself can be used for predicting a numeric attribute value in unknown data. However, since the bucket region which minimizes the mean-squared error of the objective numeric attribute value is the one which is convex to one axis on the plane or is rectilinear convex and is defined by a group of rectangular buckets connected to each other, the region is squarish in shape.
OBJECTS OF THE INVENTION
In the background art described above, owing to properties of region segmentation algorithm, the segmented region is squarish in shape since it is a gathering of the rectangular buckets. However, in spite of the fact that the two numeric attribute values corresponding to the two axes on the plane are continuous and the two corresponding numeric attribute values of the data to be predicted are given as continuous values, the region is segmented in a unit of rectangular buckets, it is unclear whether the boundary line of that region possesses a reliability enough to perform a prediction or not. For this reason, prediction results may differ occasionally. Moreover, in the decision tree or the regression tree, since the number of data included in the node reduces as the tree grows, the size of a rectangular bucket becomes larger. Therefore, the segmented region becomes irregular in shape. Also in this case, appropriateness of the boundary line of the region is questionable. However, it is impossible to directly segment from the foregoing plane the region defined by a smooth curve because of the large volume of computation.
From the point of view described above, the object of the present. invention is to segment a region defined by a smooth boundary line from a plane mapped by two axes corresponding to two predicative attributes of data, and to utilize the region for prediction of an objective attribute of data.
Another object of the present invention is to constitute a node of a decision tree or a regression tree using the segmented region.
Still another object of the present invention is to enhance the accuracy of the prediction.
SUMMARY OF THE INVENTION
The present invention is a method for determining a rule associated with an objective attribute of data in a database to predict the objective attribute value of data, comprising the steps of: storing values relative to data belonging to each bucket wherein the values correspond to each bucket in a plane, and the plane has two axes respectively corresponding to two predicative numeric attributes of data and is divided into N×M buckets; segmenting a bucket region that is satisfied with a predetermined condition, from the plane; performing a smoothing processing for a boundary of the segmented bucket region; and determining a rule for predicting the object attribute value of the data by the smoothed region. As a result, by the smoothed bucket region, an objective attribute value can be predicted. The values relative to data belonging to a bucket, which are described above, may be the number of data belonging to each bucket and values relative to the objective attribute of data belonging to a bucket.
If the objective attribute is a numeric attribute (in case of a regression tree), the value relative to said objective attribute may be the sum of values of said objective attribute of data belonging to a bucket. If the objective attribute is a true-false attribute (in case of a decision tree), the value relative to said objective attribute may be the number of data whose objective attribute value is true and belongs to a bucket.
The predetermined condition in the segmenting step may be (a) to minimize the mean squared error of the objective attribute value (or to maximize an interclass variance), (b) to maximize an entropy gain of discrete values of the objective attribute, (c) to maximize a GINI index function value of discrete values of the objective attribute, (d) to maximize a &khgr; square value of discrete values of the objective attribute, (e) if the objective attribute value is a true-false attribute, to maximize the number of included data when a ratio of data whose objective attribute value is true is more than a predetermined value, or (f) to maximize a ratio of data whose objective attribute value is true when the minimum number of included data is defined. It is better to use the condition (a) in the regression tree.
The smoothing processing described above may be a processing to make a boundary of said region a spline curve. In addition, the smoothing processing may comprise a step of defining control points in sides of N stripes in the region, wherein the stripes are parallel to an axis corresponding to a first predicative numeric attribute, and the sides are parallel to an axis corresponding to a second predicative numeric attribute. At this time, the smoothing processing may further comprise a step of setting a curve passing the control points (for example, see FIG.
11
), or a step of setting a curve passing middle points of a line between adjacent control points (for example, see FIG.
12
). By making the boundary of the region the spline curve, the accuracy of the prediction is improved.
In addition, the determining s

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for determining rule in database does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for determining rule in database, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for determining rule in database will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2603956

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.