System and method for searching databases with applications...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C705S037000

Reexamination Certificate

active

06236985

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to searching high-dimensional databases. In particular, related information in a database is identified by performing a similarity search. Furthermore, the present invention relates to an automated system and method for detecting records in a database that are similar to user defined target records.
BACKGROUND OF THE INVENTION
The increased popularity of electronic commerce, and progress in technologies such as bar-codes have made it possible to automatically store sales data. For example, electronic transactions over the Internet allow retail organizations, banks, and governments, for example, to store and maintain information related to customer behavior. Retail organizations, for example, may wish to store information concerning customers buying behavior. In order to better serve customers, it may be desirable to target marketing information. In other words, it may be desirable to deduce, from stored information of past customer behavior, what type of information may interest which customers. Information regarding one customer may, for example, be deduced on the basis of the behavior of other customers identified as “similar” or “peers”.
In the case of retail organizations, for example, transaction records may take the form of market basket transactions. Each market basket transaction may include an item or set of items (data value(s)) which may be bought together by a customer. For example, transaction records of a supermarket may be
{Milk, Bread, Butter}, and
{Pepsi, Diet Coke, Sprite}.
It is logical to expect that market transactions of a particular customer be strongly correlated with each other. For example, it is likely that the items in an individual customers' weekly shopping list are correlated m from week to week. Similarly, it is possible that two individual customers exhibit correlation in their buying patterns. Suppose, for example, that customer A and customer B are each associated with the following transactions:
Customer A<={Bed_sheet, Pillow, Comforter, Pillowcase}
Customer B<={Pillow, Comforter, Pillowcase, Nightstand}. It seems likely that customer A and customer B have correlated behavior. Thus, it may be possible to make personalized recommendations to customer A based of the behavior of customer B, and vice versa. For example, buying a “Nightstand” may be recommended to customer A, or buying a “Bed_sheet” may be recommended to customer B. Deducing the preferences of a given user by examining information about the preferences of other similar users is often referred to as collaborative filtering.
Collaborative filtering is possible when customers, such as customers A and B above, for example, are determined to be “similar”. “Similarity” between customers may be judged based on records in a transaction database and on a similarity criteria or objective function. Specifically, the problem of finding similar, neighboring, or peer records (similarity searching) is that of finding the data record or set of k data records in a database that is/are closest, in the sense of an objective function, to a given target value.
Many methods have been previously described for searching a database. In particular, several similarity search methods have been proposed. For example, White D. A., and Jain R., ‘Similarity Indexing with the SS-Tree,’ Proceedings of the 12th International Conference on Data Engineering, New Orleans, U.S.A., pages 516-523, February, 1996; Roussopoulos N., Kelley S., and Vincent F., ‘Nearest Neighbor Queries,’ Proceedings of the ACM-SIGMOD International Conference on Management of Data, 1995, pages 71-79; Samet H., Design and Analysis of Spatial Datastructures, pp. 135-141, Addison Wesley, 1993. These techniques are known to be efficient for data for which records include numerical data values, and for data for which the number of possible data values associated with each record is relatively small (e.g. 5-8 data values/record). In the case of non-quantitative data values, and a relatively large number of data values per data record (e.g. in the range of thousands data values/record), however, these techniques may not be applicable.
SUMMARY OF THE INVENTION
A method of analyzing information in the form of a plurality of data records. Each data record includes one or more data values. The data values are partitioned into a plurality of data signatures. Data values of data signatures are compared to data values of data records. Based on the result of the comparison an index is associated with each data record. A bound corresponding to the index is calculated based on a user defined target value and an objective function.


REFERENCES:
patent: 5845276 (1998-12-01), Emerson et al.
patent: 6012058 (2000-01-01), Fayyad et al.
patent: 6026398 (2000-02-01), Brown et al.
patent: 6049797 (2000-04-01), Guha et al.
patent: 6049861 (2000-04-01), Bird et al.
patent: 6061658 (2000-05-01), Chou et al.
patent: 6148295 (2000-11-01), Megiddo et al.
“Iterate: A conceptual Clustering Algorithm for Data Mining,” Biswas et al., IEEE Transactions on Systems, Man, and Cybernetics—Part C, Applications and Reviews, vol. 28, No. 2, pp. 219-230, May, 1998.*
R. Sibson, “SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method”,The Computer Journal, The British Computer Society, vol. 16, No. 1, Feb. 1973.
Tian Zhang et al., “BIRCH: An Efficient Data Clustering Method for Very Large Databases”, Proceedings of the ACM SIGMOND International Conference on Management of Data, Montreal, Canada, Jun. 1996, pp. 103-114.
Raymond T. Ng et al., “Efficient and Effective Clustering Methods for Spatial Data Mining”, Proceedings of the 20thInternational Conference on Very Large Data Bases, Santiago, Chile, 1994, pp. 144-155.
Nick Roussopoulos et al., “Nearest Neighbor Queries”, Proceedings of the ACM-SIGMOD International Conference on Management of Data, 1995, pp. 71-79.
Stefan Berchtold et al., “Fast Nearest Neighbor Search in High-dimensional Space”, Proceedings of the International Conference on Data Engineering, Feb. 1998, pp. 209-218.
David A. White et al., “Similarity Indexing with the SS-Tree”, Proceedings of the 12thInternational Conference on Data Engineering, New Orleans, U.S.A., Feb. 1996, pp. 516-523.
Stefan Berchtold et al., “The X-Tree: An Index Structure for High-Dimensional Data”, Proceedings of the 22ndInternational Conference in Very Large Databases, 1996, pp. 28-39.
Douglas H. Fisher, “Knowledge Acquisition Via Incremental Conceptual Clustering”, Machine Learning 2(2), 1987, pp. 139-172.
Mohamed Zait et al., “A Comparative Study of Clustering Methods”, FGCS Journal, Special Issue on Data Mining, 1997, pp. 149-159.
Samet H., Design and Analysis of Spatial Datastructures, Addison Wesley, 1993, pp. 135-141.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for searching databases with applications... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for searching databases with applications..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for searching databases with applications... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2568775

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.