Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-06-26
2004-04-06
Metjahic, Safet (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
Reexamination Certificate
active
06718338
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The field of the invention is data processing, or, more specifically, methods, systems, and products for storing data mining clustering results in a relational database for querying and reporting.
2. Description of Related Art
Data mining is an analytic technique to dynamically discover patterns in historical data records and to apply properties associated with these records to production data records that exhibit similar patterns. Based on historical data, a data mining algorithm first generates a data mining model that captures the discovered patterns and produces data mining results that describe the statistical distribution of the historical data along the discovered patterns. These data mining results are then used to visualize the model quality so that an analyst can either tune the data mining model to improve its quality or—after some amount of tuning—certify that it is of good quality. In the latter case, we can go to the next step of applying the certified data mining model to production data.
Data mining tools such as IBM's Intelligent Miner currently produce data mining results in a proprietary representation. Intelligent Miner also provides a converter from this proprietary representation to an XML (eXtensible Markup Language) format known as PMML (Predictive Modeling Markup Language). Both these representations are difficult to query for lack of querying tools that match these representations (a new query language called XQuery is being proposed to query XML data, but it will be quite some time before this technology takes hold). Accordingly, both these representations are still very low-level for most visualization tools which are inherently designed to read data from a relational database rather than from some internal representations. What is needed here is a relational representation of data mining results so that the data mining results can be stored directly in a relational database. Obtaining a relational representation of data mining results is non-obvious due to the complexity of their internal representations.
SUMMARY
In summary, this specification discloses methods, systems, and products for storing data mining clustering results in a relational database for querying and reporting, embodiments typically applied to data mining results from scoring of data items in operational data, embodiments typically including reading, from a hierarchical clustering node embodied in a hierarchical representation of data mining results, clustering data describing a clustering, and recording the clustering data in a relational clustering record, wherein the relational clustering record includes a clustering identification field. Embodiments typically include reading, from a hierarchical cluster node embodied in the hierarchical representation of data mining results, cluster data describing a cluster, and recording the cluster data in a relational cluster record, wherein the hierarchical cluster node is embodied in a position in the hierarchy below the hierarchical clustering node, the relational cluster record is related to the relational clustering record through a foreign key comprising the clustering identification field; and the relational cluster record includes a cluster identification field.
Embodiments typically include reading, from a hierarchical cluster attribute node embodied in the hierarchical representation of data mining results, cluster attribute data describing a cluster attribute, and recording the cluster attribute data in a relational cluster attribute record, wherein the hierarchical cluster attribute node is embodied in a position in the hierarchy below the hierarchical cluster node, the relational cluster attribute record is related to the relational cluster record through a foreign key comprising the cluster identification field; the relational cluster attribute record is related to the relational clustering record through a foreign key comprising the clustering identification field; and the relational cluster attribute record includes a cluster attribute identification field. Embodiments typically include reading, from a hierarchical cluster attribute bin node embodied in the hierarchical representation of data mining results, cluster attribute bin data describing a cluster attribute bin, and recording the cluster attribute bin data in a relational cluster attribute bin record, wherein the hierarchical cluster attribute bin node is embodied in a position in the hierarchy below the hierarchical cluster attribute node, the relational cluster attribute bin record is related to the relational cluster attribute record through a foreign key comprising the cluster attribute identification field; the relational cluster attribute bin record is related to the relational cluster record through a foreign key comprising the cluster identification field; and the relational cluster attribute bin record is related to the relational clustering record through a foreign key comprising the clustering identification field.
In typical embodiments, the cluster data, the clustering data, the clustering attribute data, and the cluster attribute bin data comprise data mining results generated by at least one data mining operation performed upon operational data using a trained data mining model. In typical embodiments, the relational cluster attribute bin record includes a cluster attribute bin identification field. In typical embodiments, the clustering data as recorded in the relational clustering record includes a unique identifier for the relational clustering record; a text description of the purpose of a clustering represented by the relational clustering record; a clustering type; a number of clusters given by the clustering; a number of attributes considered in the clustering; an algorithm field identifying the clustering algorithm used in the clustering; and an items numeric field that stores the number of data items input to the clustering from the operational data. In some embodiments, the clustering type has the value “demographic.” In other embodiments, the clustering type has the value “neural.”
In typical embodiments, the cluster data as recorded in the relational cluster record comprises a unique identifier for the cluster; a unique identifier of a relational clustering record to which the relational cluster record related; an ordinal number of the relational cluster record; a text description of the purpose of the cluster represented by the relational cluster record; and a numeric field identifying the number of data items from the operational data that are represented in records related to the relational cluster record. In typical embodiments, the cluster attribute data as recorded in the relational cluster attribute record comprises a unique identifier of the relational cluster attribute record; a unique identifier of a relational clustering record to which the relational cluster attribute record is related; a unique identifier of a relational cluster record to which the relational cluster attribute record is related; an attribute type; a text name of a relational cluster attribute represented by the relational cluster attribute record; a text description of the relational cluster attribute; a use type field; a categories numeric field indicating a number of categories associated with the cluster attribute when the attribute type has the value “categorical;” a lowest value numeric field indicating a lowest value allowed when the attribute type has the value “continuous;” a highest value numeric field indicating a highest value allowed when the attribute type has the value “continuous;” and an items numeric field identifying the number of data items from the operational data that are represented in records related to the relational cluster attribute record. In some embodiments, the use type field has the value “active.” In some embodiments, the use type field has the value “supplementary.” In some embodiments, the attribute type has the value “categorical.” In some embodiments, the attribute
Al-hashemi Sana
Biggers John R.
Biggers & Ohanian PLLC
International Business Machines - Corporation
Metjahic Safet
LandOfFree
Storing data mining clustering results in a relational... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Storing data mining clustering results in a relational..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Storing data mining clustering results in a relational... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3216837