Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-03-02
2003-02-11
Mizrahi, Diane D. (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06519599
ABSTRACT:
FIELD OF THE INVENTION
This invention relates generally to high-dimensional data, and more particularly to the visualization of such data.
BACKGROUND OF THE INVENTION
With the advent of the Internet, and especially electronic commerce (“ecommerce”) over the Internet, the use of data analysis tools, has increased. In ecommerce and other Internet and non-Internet applications, databases are generated and maintained that have large amounts of information, so that they can be analyzed, or “mined,” to learn additional information regarding customers, users, products, etc. That is, data analysis tools provide for leveraging the data already contained in databases to learn new insights regarding the data by uncovering patterns, relationships, or correlations.
It is usually desirable for a data analyst to visualize the relationships and patterns underlying the data. Existing exploratory data analysis techniques include plotting data for subsets of variables, and various clustering methods. However, inasmuch as the data analyst desires to have as many tools at his or her disposal as possible, new visualization techniques for displaying the relationships and patterns underlying data are always welcome. For this and other reasons, therefore, there is a motivation for the present invention.
SUMMARY OF THE INVENTION
The invention relates to the visualization of high-dimensional data sets. In one embodiment, a network is constructed for a data set having a number of variables, which can also be referred to as dimensions or columns. The network, such as a dependency or a Bayesian network, has a number of nodes that have dependencies thereamong. Each node corresponds to a variable in the data set, and has a local distribution. Each dependency has a first node and a second node, such that the first node depends on the second node.
In one embodiment, the network is displayed as a number of items and a number of connections. Each item represents a node of the network. Each connection, such as an arc, represents a dependency and connects a first item representing the first node of the dependency with a second item representing the second node of the dependency. In one embodiment, selection of a particular item displayed that represents a particular node results in the display of the local distribution associated with the particular node.
In another embodiment, only a predetermined number of the items are shown, such as only the items representing the most popular nodes of the data set. Furthermore, in one embodiment, in response to receiving a user input, such as in conjunction with a graphical slider, a sub-set of the connections is displayed, proportional to the user input, in accordance with a predetermined measure of the dependencies represented by the connections. Thus, from all of the connections to only a connection representing the dependency having a largest value for the predetermined measure can be displayed.
In another embodiment, a particular item is displayed in an emphasized manner, and the particular connections representing dependencies including the node represented by the particular item, as well as the items representing nodes also in these dependencies, are also displayed in the emphasized manner. The emphasized manner can be, for example, only displaying the particular item, the particular connections, and the items representing nodes also in the dependencies represented by the particular connections, and not showing any of the other items or connections. Furthermore, in one embodiment, only an indicated sub-set of the items is displayed, as well as the connections representing dependencies among the nodes represented by the indicated sub-set of items. For example, the user may be able to add items to the indicated sub-set by searching for desired items, or otherwise selecting items, in an item-by-item manner.
The invention includes computer-implemented methods, machine-readable media, computerized systems, and computers of varying scopes. Other aspects, embodiments and advantages of the invention, beyond those described here, will become apparent by reading the detailed description and with reference to the drawings.
REFERENCES:
patent: 5999923 (1999-12-01), Kowalski et al.
patent: 6138123 (2000-10-01), Rathbun
patent: 6216134 (2001-04-01), Heckerman et al.
patent: 6301579 (2001-10-01), Becker
Chickering D. Maxwell
D'Hers Thierry
Heckerman David E.
Meek Christopher A.
Netz Amir
Amin & Turocy LLP
Microsoft Corporation
Mizrahi Diane D.
LandOfFree
Visualization of high-dimensional data does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Visualization of high-dimensional data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Visualization of high-dimensional data will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3158669