Data mining and visualization techniques

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C345S419000, C345S215000, C345S960000

Reexamination Certificate

active

06711577

ABSTRACT:

BACKGROUND
The present invention relates to data processing techniques and more particularly, but not exclusively, relates to the discovery and visualization of association rules.
Association is a powerful data analysis technique that finds frequent use in data mining tasks. Given a set of items, S={i
1
, i
2
, i
j
, . . . , i
n
} where n≧2, an association rule is an implication of the form X→i
j
; where X⊂S, and i
j
∉S such that i
j
∉X. The set of items X is the antecedent, while the item i
j
is the consequent of the association rule. The size of X is between 1 to (n−1) items. The “support” of the rule X→i
j
is the percentage of items in S that satisfies the union of items in X and i
j
. The “confidence” of the rule is the percentage of items that satisfies X and also satisfies i
j
. The support and confidence levels of an association rule are among the metadata frequently of interest to analyzers.
For the given elements A, B, C and D of a common domain, A+B+C→D is an example of an association rule; where the occurrence of A & B & C together imply D. Another example from a supermarket database is “80% of the people who buy diapers and baby powder also buy baby oil.” Applying the more general notation from the earlier example, this supermarket database association can be represented in elemental form as A+B=C; where A=“buy diapers”, B=“buy baby powder”, and C=“buy baby oil.” For further background information concerning association rule data mining, reference is made to Pak Chung Wong, Paul Whitney and Jim Thomas, “Visualizing Association Rules for Text Mining” Proceedings of IEEE Information Visualization, (published by IEEE CS Press) (dated Oct. 26, 1999).
In contrast to association rules, another common knowledge discovery and data mining tool is sequential patterning. A four-element sequential pattern can be represented as A→B→C→D; where A, B, C, and D are elements of the same domain. An association rule is a study of “togetherness” of elements, whereas a sequential pattern is a study of the “ordering” or “arrangement” of elements. Further background information about sequential patterns can be found in above cited U.S. Provisional Patent Application No. 60/239,334 filed Oct. 9, 2000.
To support analyses of association rules, scientists and engineers have developed various limited visualization schemes. Among these limited schemes are the two-dimensional item-to-item matrix and the directed graph. The basic design of a two-dimensional (2D) association matrix positions the antecedent and consequent items on separate axes of a square matrix as illustrated in the examples of
FIGS. 7 and 8
. Customized icons are drawn on certain matrix tiles that connect the antecedent and the consequent items of the corresponding association rules. Different icons can be used to depict different metadata such as the support and confidence values of the rules.
FIG. 7
depicts an association rule (B→C). Both the height and the color of the column icon can be used to present metadata values. The values of support and confidence are mapped to 3D columns that are built separately on and beneath the matrix tiles. Alternatively, other icons such as disk and bar can be used to visualize metadata.
This type of item-to-item matrix is frequently effective to show a one-to-one binary relationship; however, it is often less effective when visualizing many-to-one relationships, as in the case of association rules with multiple antecedent items. For example, in
FIG. 8
it is unclear if there is only one association rule (A+B→C) or two (A→C and B→C). The lack of a practical way to identify the togetherness of individual antecedent items makes this matrix form a weaker candidate to visualize rules with multiple antecedent items.
In one attempt to address this problem, all the antecedent items of an association rule are grouped as one unit and plotted against its consequent, resulting in an antecedent-to-consequent plot. For example, a dedicated item group (A+B) is created in
FIG. 9
to describe the association rule (A+B→C). Unfortunately, as the number of antecedents for a given rule increases, the number of possible item-to-item relationships becomes unwieldy. Furthermore, the loss of item identity within an antecedent group also undermines advantages provided by visualizing the associations with a matrix. For example, the row (or column) of the matrix connected to an item can no longer be used to search for all the rules involving that item.
Another problem with some item-to-item matrix displays is object occlusion, especially when multiple icons are used to depict different metadata values on the matrix tiles. The occlusion problem is illustrated in the example of FIG.
10
.
As illustrated in
FIG. 11
, a directed graph is another possible scheme for depicting item associations. The nodes of a directed graph represent the items, and the edges represent the associations.
FIG. 11
shows three association rules (A→C, B→C, A+B→C). Unfortunately, for as few as a dozen rules, a directed graph can often become tangled and difficult to follow. In an attempt to address this problem, the edges are animated to show the associations of certain items with 3D rainbow arcs. See, Beth Hetzler, W. Michelle Harris, Susan Havre, and Paul Whitney, “Visualizing the Full Spectrum of Document Relationships” Proceedings of the Fifth International Society for Knowledge Organization (ISKO) Conference (dated 1998). However, this animation technique typically requires significant human interaction to turn on and off the item nodes, and it is frequently difficult to show multiple metadata values, including support and confidence, alongside the association rules.
Indeed, with any of these existing schemes, it is often difficult to meaningfully visualize a large number of association rules, and effective management of association rules with multiple antecedents is generally lacking. Accordingly, new strategies are needed to identify and present association rule information. The present invention addresses such needs.
SUMMARY OF THE INVENTION
One embodiment of the present invention is a unique data processing technique. Other embodiments include unique apparatus, systems, and methods for processing association rules.
A further embodiment of the present invention includes processing a dataset to determine a number items and establishing several rules with a computer system. These rules each correspond to a different association between two or more of the items. A visualization is provided that displays a rule-to-item relationship for each one of the rules. This visualization can further display one or more types of metadata for the rules. The visualization can be in a two-dimensional or three-dimensional form.
Yet a further embodiment includes: processing a dataset with a computer to determine several association rules and providing a visualization of the association rules; where the association rules each correspond to a different one of a number of portions of the visualization. A set of identifiers is included in the visualization for each one of the association rules. These identifiers each have a different location along the different one of the portions. One of the identifiers represents a consequent item, and one or more other of the identifiers correspondingly represent one or more antecedent items.
In another embodiment, a computer system includes one or more processors operable to process a dataset to determine a number of items and establish a number of association rules for these items. The one or more processors generate a visualization output of the association rules that includes one or more signals representative of a rule-to-item relationship for each one of the association rules. The system further includes an output device responsive to the visualization output to display a visualization of the association rules.
In still another embodiment

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Data mining and visualization techniques does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Data mining and visualization techniques, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data mining and visualization techniques will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3268159

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.