Method and system for visualization of clusters and...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C707S793000, C705S026640, C704S202000, C704S206000, C345S215000, C381S083000

Reexamination Certificate

active

06216134

ABSTRACT:

TECHNICAL FIELD
This invention relates generally to user interfaces and, more specifically, to user interfaces for visualization of categories of data.
BACKGROUND OF THE INVENTION
Computer systems have long been used for data analysis. For example, the data may include the demographics of users and web pages accessed by users. A web master (i.e., a manager of a web site) may desire to review the web page access patterns of the users in order to optimize the links between the various web pages or to customize advertisements to the demographics of the users. However, it may be very difficult for the web master to analyze the access patterns of thousands of users involving possibly hundreds of web pages. However, the difficulty in the analysis may be lessened if the users can be categorized by common demographics and common web page access patterns. Two techniques of data categorization—classification and clustering—can be useful when analyzing large amounts of such data. These categorization techniques are used to categorize data represented as a collection of records containing values for various attributes. For example, each record may represent a user, and the attributes describe various characteristics of the user. The characteristics may include the sex, income, and age of the user, or web pages accessed by the user.
FIG. 1A
illustrates a collection of records as a table. Each record (
1
,
2
, . . . ,n) contains a value for each of the attributes (
1
,
2
, . . . ,m). For example, attribute
4
may represent the age of a user and attribute
3
may indicate whether the user has accessed a certain web page. Therefore, the user represented by record
2
accessed the web page as represented by attribute
3
and is age 36 as represented by attribute
4
.
Classification techniques allow a data analyst (e.g., web master) to group the records of a collection into classes. That is, the data analyst reviews the attributes of each record, identifies classes, and then assigns each record to a class.
FIG. 1B
illustrates the results of the classification of a collection. The data analyst has identified three classes: A, B, and C. In this example, records
1
and n have been assigned to class A; record
2
has been assigned to class B, and records
3
and n−
1
have been assigned to class C. Thus, the data analyst determined that the attributes for rows
1
and n are similar enough to be in the same class. In this example, a record can only be in one class. However, certain records may have attributes that are similar to more than one class. Therefore, some classification techniques, and more generally some categorization techniques, assign a probability that each record is in each class. For example, record
1
may have a probability of 0.75 of being in class A, a probability of 0.1 of being in class B, and a probability of 0.15 of being in class C. Once the data analyst has classified the records, standard classification techniques can be applied to create a classification rule that can be used to automatically classify new records as they are added to the collection. (e.g., Duda, R., and Hart, P.,
Pattern Classification and Scene Analysis,
Wiley, 1973)
FIG. 1C
illustrates the automatic classification of record n+
1
when it is added to the collection. In this example, the new record was automatically assigned to class B.
Clustering techniques provide an automated process for analyzing the records of the collection and identifying clusters of records that have similar attributes. For example, a data analyst may request a clustering system to cluster the records into five clusters. The clustering system would then identify which records are most similar and place them into one of the five clusters. (e.g., Duda and Hart) Also, some clustering systems automatically determine the number of clusters.
FIG. 1D
illustrates the results of the clustering of a collection. In this example, records
1
,
2
, and n have been assigned to cluster A, and records
3
and n−
1
have been assigned to cluster B. Note that in this example the values stored in the column marked “cluster” in
FIG. 1D
have been determined by the clustering algorithm.
Once the categories (e.g., classes and clusters) are established, the data analyst can use the attributes of the categories to guide decisions. For example, if one category represents users who are mostly teenagers, then a web master may decide to include advertisements directed to teenagers in the web pages that are accessed by users in this category. However, the web master may not want to include advertisements directed to teenagers on a certain web page if users in a different category who are senior citizens also happen to access that web page frequently. Even though the categorization of the collection may reduce the amount of data, a data analyst needs to review from thousands of records to possibly 10 or 20 categories. The data analyst still needs to understand the similarity and dissimilarity of the records in the categories so that appropriate decisions can be made.
SUMMARY OF THE INVENTION
An embodiment of the present invention provides a category visualization (“CV”) system that presents a graphic display of the categories of a collection of records referred to as “category graph.” The CV system may optionally display the category graph as a “similarity graph” or a “hierarchical map.” When displaying a category graph, the CV system displays a graphic representation of each category. The CV system displays the category graph as a similarity graph or a hierarchical map in a way that visually illustrates the similarity between categories. The display of a category graph allows a data analyst to better understand the similarity and dissimilarity between categories. A similarity graph includes a node for each category and an arc connecting nodes representing categories whose similarity is above a threshold. A hierarchical map is a tree structure that includes a node for each base category along with nodes representing combinations of similar categories.
In another aspect of the present invention, the CV system calculates and displays various characteristic and discriminating information about the categories. In particular, the CV system displays information describing the attributes of a category that best discriminate the records of that category from another category. The CV system also displays information describing the attributes that are most characteristic of a category.


REFERENCES:
patent: 4903305 (1990-02-01), Gillick et al.
patent: 5506986 (1996-04-01), Healy
patent: 5537586 (1996-07-01), Amram et al.
patent: 5742816 (1998-04-01), Barr et al.
patent: 5758072 (1998-05-01), Filepp et al.
patent: 5768578 (1998-06-01), Kirk et al.
patent: 5787414 (1998-07-01), Miike et al.
patent: 5832484 (1998-11-01), Sankaran et al.
patent: 5835905 (1998-11-01), Pirolli et al.
patent: 5850516 (1998-12-01), Schneier
patent: 5873099 (1999-02-01), Hogan et al.
patent: 5903892 (1999-05-01), Hoffert et al.
patent: 5911139 (1999-06-01), Jain et al.
patent: 5913205 (1999-06-01), Jain et al.
patent: 5915250 (1999-06-01), Jain et al.
patent: 5920873 (1999-07-01), Van Huben et al.
patent: 5953725 (1999-09-01), Eprahim et al.
patent: 5991756 (1999-11-01), Wu
patent: 6006230 (1999-12-01), Ludwig et al.
patent: 6038559 (2000-03-01), Ashby et al.
patent: 6088717 (2000-07-01), Reed et al.
patent: 6094654 (2000-07-01), Van Huben et al.
patent: WO 90/04231 (1990-04-01), None
patent: WO 95/31788 (1995-11-01), None
patent: WO 95/34884 (1995-12-01), None
patent: WO 96/28787 (1996-09-01), None
Brunk, Cliff et al., “MineSet: An Integrated System for Data Mining,” Data Mining and Visualization, AAAI Press, 1997.
Cheeseman et al., “Bayesian Classification AutoClass: Theory and Results,”Advances in Knowledge Discovery and Data Mining, AAAI Press, 1995.
Chickering, David Maxwell et al., A Bayesian Approach to Learning Bayesian Networks with Local Structure [Web Page] 1997; http://www.lis.pitt.edu/~dsl/UAI97/Chickering.UAI97.html[Accessed Jun. 17, 1

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for visualization of clusters and... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for visualization of clusters and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for visualization of clusters and... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2449754

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.