Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-04-30
2004-05-25
Alam, Shahid (Department: 2172)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C707S793000, C704S202000, C704S206000, C705S026640, C345S215000
Reexamination Certificate
active
06742003
ABSTRACT:
BACKGROUND OF THE DISCLOSURE
1. Field of the Invention
The invention relates to a system that incorporates an interactive graphical user interface for graphically visualizing clusters (specifically segments) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for those particular segments and then forms and visually depicts hierarchical organizations of those segments. The system also compares two user-selected segments or segment groups together and graphically displays normalized scored comparison results. Additionally, the system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization (total number of hierarchical levels) based on scored similarity measures of the selected clusters; and, based on normalized scores, provides and displays a relative ranking of the displayed segments, as well as displays summarized characteristics of any such segment.
2. Description of the Prior Art
Computer systems have long been used for data analysis. For example, data may include demographics of users and web pages accessed by those users. A web master (i.e., a manager of a web site) may desire to review web page access patterns of those users in order to optimize links between various web pages or to customize advertisements to the demographics of the users. However, it may be very difficult for the web master to analyze the access patterns of thousands of users involving possibly hundreds of web pages. However, this difficulty may be lessened if the users can be categorized by common demographics and common web page access patterns. Two techniques of data categorization—classification and clustering—can be useful when analyzing large amounts of such data. These categorization techniques are used to categorize data represented as a collection of records, each containing values for various attributes. For example, each record may represent a user, and the attributes describe various characteristics of that user. The characteristics may include the sex, income, and age of the user, or web pages accessed by the user.
FIG. 1A
illustrates a collection of records organized as a table. Each record (
1
,
2
, . . . , n) contains a value for each of the attributes (
1
,
2
, . . . , m). For example, attribute
4
may represent the age of a user and attribute
3
may indicate whether that user has accessed a certain web page. Therefore, the user represented by record
2
accessed the web page as represented by attribute
3
and is age 36 as represented by attribute
4
. Each record, together with all its attributes, is commonly referred to as a “case”.
Classification techniques allow a data analyst (e.g., web master) to group the records of a collection (dataset or population) into classes. That is, the data analyst reviews the attributes of each record, identifies classes, and then assigns each record to a class.
FIG. 1B
illustrates the results of classifying a collection. The data analyst has identified three classes: A, B, and C. In this example, records
1
and n have been assigned to class A; record
2
has been assigned to class B, and records
3
and n−1 have been assigned to class C. Thus, the data analyst determined that the attributes for rows
1
and n are similar enough to be in the same class. In this example, a record can only be in one class. However, certain records may have attributes that are similar to more than one class. Therefore, some classification techniques, and more generally some categorization techniques, assign a probability that each record is in each class. For example, record
1
may have a probability of 0.75 of being in class A, a probability of 0.1 of being in class B, and a probability of 0.15 of being in class C. Once the data analyst has classified the records, standard classification techniques can be applied to create a classification rule that can be used to automatically classify new records as they are added to the collection. (see, e.g., R. Duda et al,
Pattern Classification and Scene Analysis
(© 1973, John Wiley and Sons) (hereinafter the “Duda et al” textbook) which is incorporated by reference herein)).
FIG. 1C
illustrates the automatic classification of record n+1 when it is added to the collection. In this example, the new record was automatically assigned to class B.
Clustering techniques provide an automated process for analyzing the records of the collection and identifying clusters of records that have similar attributes. For example, a data analyst may request a clustering system to cluster the records into five clusters. The clustering system would then identify which records are most similar and place them into one of the five clusters. (See, e.g., the Duda et al textbook) Also, some clustering systems automatically determine the number of clusters.
FIG. 1D
illustrates the results of the clustering of a collection. In this example, records
1
,
2
, and n have been assigned to cluster A, and records
3
and n−1 have been assigned to cluster B. Note that in this example the values stored in the column marked “cluster” in
FIG. 1D
have been determined by the clustering algorithm.
Once the categories (e.g., classes and clusters) are established, the data analyst can use the attributes of the categories to guide decisions. For example, if one category represents users who are mostly teenagers, then a web master may decide to include advertisements directed to teenagers in the web pages that are accessed by users in this category. However, the web master may not want to include advertisements directed to teenagers on a certain web page if users in a different category who are senior citizens who also happen to access that web page frequently. Even though the categorization of the collection may reduce the amount of data from thousands of records, a data analyst still needs to review possibly 10 or 20 categories. The data analyst still needs to understand the similarity and dissimilarity of the records in the categories so that appropriate decisions can be made.
Currently, the Internet is revolutionizing commerce by providing a relatively low cost platform for vendors and a very convenient platform for consumers through which consumers, in the form of Internet users, and vendors can engage in commerce. Not only are certain vendors merely appearing through a so-called web presence, but existing traditional, so-called “bricks and mortar”, retail establishments are augmenting their sales mechanisms through implementation of electronic commerce web sites. To facilitate this commerce, various computer software manufacturers have developed and now have commercially available software packages which can be used to quickly implement and deploy, and easily operate a fully-functional electronic commerce web site. One such package is a “Commerce Server” software system available from the Microsoft Corporation of Redmond, Wash. (which is also the present assignee hereof). In essence and to the extent relevant, the “Commerce Server” system provides a very comprehensive, scalable processing infrastructure through which customized business-to-consumer and business-to-business electronic commerce web sites can be quickly implemented. This infrastructure, implemented on typically a web server computer, provides user profiling, product cataloguing and content management, transaction processing, targeted marketing and merchandizing functionality, and analysis of consumer buying activities.
With the rapid and burgeoning deployment of electronic commerce web sites, web site owners have realized that voluminous consumer data gathered and provided through such a site, and particularly its electronic commerce server, provides a wealth of useful information. Through this information, on-line consumer buying patterns can be discerned and targeted advertising, even to the point of directed targeted advertising to a particular individual based on that person's particular buying habits and
Bradley Paul S.
Chickering David M.
Heckerman David E.
Meek Christopher A.
Alam Shahid
Amin & Turocy LLP
Microsoft Corporation
LandOfFree
Apparatus and accompanying methods for visualizing clusters... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and accompanying methods for visualizing clusters..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and accompanying methods for visualizing clusters... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3224809