System and method of determining and searching for patterns...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06799175

ABSTRACT:

FIELD OF THE INVENTION
The present invention generally relates to techniques for querying large databases and, more particularly, to query techniques which employ a user/computer interactive process including visual feedback in order to find the best responses to a given query.
BACKGROUND OF THE INVENTION
In recent years, large databases have become very common because of the ability to store large amounts of information with the current advances in hardware technology. Such databases are often queried by users in order to find specific patterns of behavior. Finding the closest matching object is important for a number of applications. Examples include similarity searches in geometric databases, multimedia databases, and data mining applications such as fraud detection, information retrieval, among numerous other domains. Many of these domains contain applications in which the dimensionality of the representation is very high. For example, a typical feature extraction operation on an image will result in hundreds of dimensions. By way of another example, a data set which is inherently dimensionally high may be a demographic database in which the dimensions comprise information such as the name, age, salary, and other features which characterize a person.
Similarity retrieval problems are reasonably well solved for low dimensional applications for which efficient index structures have been proposed. A wide variety of multidimensional indexes have been proposed which work well for low dimensional data. For a comprehensive overview, see, e.g., V. Gaede et al., “Multidimensional Access Methods,” ACM Computing Surveys, vol. 30, no. 2, pp. 170-231, 1998, the disclosure of which is incorporated by reference herein. These structures can support a wide range of queries such as point queries, range queries, or similarity queries to a predefined target.
However, many empirical studies have shown that traditional indexing methods fail in high dimensional spaces, see, e.g., K. Beyer et al., “When is Nearest Neighbors Meaningful?,” Proceedings of the ICDT Conference, 1999; K. -I. Lin et al., “The TV-tree: An Index Structure for High Dimensional Data,” VLDB Journal, vol. 3, no. 4, pp. 517-542, 1992; N. Katayama et al., “The SR-Tree: An Index Structure for High Dimensional Nearest Neighbor Queries,” Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 369-380, 1997; R. Weber et al., “A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,” VLDB, pp. 194-205, 1998; and S. Berchtold et al., “The X-Tree: An Index Structure for High-Dimensional Data,” Proceedings of the International Conference on Very Large Databases (VLDB'96), Bombay, India, pp. 28-39, 1996, the disclosures of which are incorporated by reference herein. In such cases, almost the entire index is accessed by a single query. In fact, the use of most indexes may be less efficient and/or effective than a sequential scan because of the simplicity of the latter. Furthermore, as recent results in the above-referenced K. Beyer et al. article show, it may often be difficult to determine the meaningfulness of the nearest neighbor for a query point. In fact, in many cases, it has been shown that the distance measurements to the target are not very meaningful with the use of traditional distance functions.
SUMMARY OF THE INVENTION
The present invention provides techniques for finding query responses from database queries using an interactive process between a user (e.g., a person entering a query to a database) and a computer system (e.g., a computing system upon which the database resides or which has access to the database). The interactive process comprises providing the user with one or more visual perspectives as feedback on the distribution of points in the database. These visual perspectives may be considered by the user in order for the user to provide feedback to the computer system. The computer system may then use the user-provided feedback to determine the best response to the query.
In one illustrative aspect of the invention, a method for resolving queries in large databases is provided which uses an interactive process. More specifically, in response to a user query to a database, the method comprises providing a user with different views of the data which the user can use in order to vote on the effectiveness of different sets of points which match the query the user entered. These responses may then be aggregated to determine the true sets of responses corresponding to the original user intent.
The conventional fully-automated systems discussed above are incomplete in their characterization of the data in terms of a single best distance function. However, the present invention realizes that different projections can provide different views of the data, all of which may be quite informative to a human in understanding the relationship between a query point and the rest of the data in a database. Furthermore, the present invention realizes that it is often easier for a human to make judgments on the viability of clusters of different shapes and sizes in a given view, or whether a view yields any useful information at all. That is, a computer cannot match the visual insight and intuition of a human in distinguishing useful patterns from the data. On the other hand, a human needs computational support in analyzing large volumes of high dimensional data in order to find the relevant characteristics of the data which relate the “query point” to the database. It is to be appreciated that, in one example, the query point may refer to a data record specified by a user that is used to locate patterns in a database that are closely related to the data record.
Accordingly, in accordance with an illustrative embodiment of the present invention, a computer provides an iterative visual feedback loop in which carefully chosen views of the data are presented to a user to illustrate the different clusters in the data. Once these views have been determined, then kernel density estimation may be used to provide a visual profile of the distribution of the data. These visual profiles are used to determine the clusters.
It is evident to those skilled in the art that the above method may be used in order to determine either the closest patterns to a given query point or the different clusters in the data. This can be done by finding the patterns close to multiple query points which are chosen randomly. These different patterns form the clusters in the data.
These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.


REFERENCES:
patent: 5832182 (1998-11-01), Zhang et al.
patent: 5912674 (1999-06-01), Magarshak
patent: 5974412 (1999-10-01), Hazlehurst et al.
patent: 6115708 (2000-09-01), Fayyad et al.
patent: 6121969 (2000-09-01), Jain et al.
patent: 6154213 (2000-11-01), Rennison et al.
patent: 6226409 (2001-05-01), Cham et al.
patent: 6523026 (2003-02-01), Gillis
patent: 6564202 (2003-05-01), Schuetze et al.
patent: 6567797 (2003-05-01), Schuetze et al.
patent: 2001/0049689 (2001-12-01), Mentzer
Sheikholeslami et al. “WaveCluster: A Wavelet-Based Clustering Approach for Spatial Data in Very Large Databases” The VLDB Journal—The International Journal on Very Large Databases. vol. 8, Issue 3-4. pp. 289-304. Feb. 2000. Springer-Verlag New York.*
K. Beyer et al., “When is Nearest Neighbor Meaningful?,” Proceedings of the ICDT Conference, 19 pages, 1999.
M. Livny et al., “Fast Density Estimation Using CF-Kernel for Very Large Databases,” Proceedings of the ACM SIGKDD Conference, pp. 1-22, 1996.
V. Gaede et al., “Multidimensional Access Methods,” ACM Computing Surveys, vol. 30, No. 2, pp. 1-86, 1998.
R. Weber et al., “A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,” Proceedings of the 24th VLDB Conference, New York, USA, 12 page

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method of determining and searching for patterns... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method of determining and searching for patterns..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method of determining and searching for patterns... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3244340

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.