Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2005-11-17
2010-11-16
Trujillo, James (Department: 2159)
Data processing: database and file management or data structures
Database design
Data structure types
Reexamination Certificate
active
07836090
ABSTRACT:
A system and method for performing and accelerating cluster analysis of large data sets is presented. The data set is formatted into binary bit Sequential (bSQ) format and then structured into a Peano Count tree (P-tree) format which represents a lossless tree representation of the original data. A P-tree algebra is defined and used to formulate a vertical set inner product (VSIP) technique that can be used to efficiently and scalably measure the mean value and total variation of a set about a fixed point in the large dataset. The set can be any projected subspace of any vector space, including oblique sub spaces. The VSIPs are used to determine the closeness of a point to a set of points in the large dataset making the VSIPs very useful in classification, clustering and outlier detection. One advantage is that the number of centroids (k) need not be pre-specified but are effectively determined. The high quality of the centroids makes them useful in partitioning clustering methods such as the k-means and the k-medoids clustering. The present invention also identifies the outliers.
REFERENCES:
patent: 5715455 (1998-02-01), Macon, Jr. et al.
patent: 5960437 (1999-09-01), Krawchuk et al.
patent: 5987468 (1999-11-01), Singh et al.
patent: 6185561 (2001-02-01), Balaban et al.
patent: 6941303 (2005-09-01), Perrizo
patent: 6941318 (2005-09-01), Tamayo et al.
patent: 6952499 (2005-10-01), Vititoe
patent: 2003/0208488 (2003-11-01), Perrizo
patent: 2005/0163384 (2005-07-01), Avni et al.
patent: 2005/0171700 (2005-08-01), Dean
patent: 2008/0281764 (2008-11-01), Baxter
patent: 2008/0312513 (2008-12-01), Simon et al.
Ren, Dongmei; Rahal, Imad; Perrizo, William; and Scott, Kirk. “A Vertical Distance-based Outlier Detection Method with Local Pruning”, Proceedings from CIKM '04, Nov. 8-13, 2004, Washington DC, USA, pp. 279-284, pp. 6.
Ding, Qiang; Ding, Qin; and Perrizo, William. “Decision Tree Classification of Spatial Data Streams Using Peano Count Trees”, Proceedings from SAC 2002, Madrid, Spain, pp. 413-417, pp. 5.
Ding, Qin; Khan, Maleq; Roy, Amalendu; and Perrizo, William. “The P-Tree Algebra”, Proceedings from SAC 2002, Madrid, Spain, pp. 426-431, pp. 6.
Wang, Baoying; Pan, Fei; Ren, Dongmei; Cui, Yue; Ding, Qiang; and Perrizo, William. “Efficient OLAP Operations for Spatial Data Using Peano Trees”, Proceedings from DMKD' 03, Jun. 13, 2003, San Diego, CA, USA, pp. 28-34, pp. 7.
Abidin, Taufik; Perera, Amal; Serazi, Masum; and Perrizo, William. “Vertical Set Square Distance: A Fast and Scalable Technique to Compute Total Variation in Large Datasets”, Mar. 16-18, 2005, pp. 60-65, pp. 6.
“Fast Algorithms for Mining Association Rules,” R. Agrawal, R. Srikant, Proceedings of the International Conference on VLDB, Santiago, Chile, 13 pgs., Sep. 1994.
“Mining Quantitative Association Rules in Large Relational Tables,” R. Srikant, R. Agrawal, ACM-SIGMOD 96, Montreal, Canada, pp. 1-12, Jun. 1996.
“An Effective Hash-Based Algorithm for Mining Association Rules,” J.S. Park, M.S. Chen, P.S. Yu, ACM-SIGMOD 95, California, pp. 175-186, 1995.
“Multidimensional Access Methods,” V. Gaede, O. Gunther, ACM Computing Surveys, vol. 30, No. 2, pp. 171-231, Jun. 1998.
“The Quadtree and Related Hierarchical Data Structure,” H. Samet, ACM Computing Survey, vol. 16, No. 2, pp. 188-260, Jun. 1984.
Web site print-out: “What are HH-codes and how can they be used to store hydrographic data?,” H. Iverson, Norwegian Hyrdorgraphic Service (NHS), http://www.statkart.no
lhdb/iveher/hhtext.htm, 7 pgs., Jan. 1998.
“Run-Length Encodings,” S.W. Golomb, IEEE Trans. On Information Theory, vol. 12, No. 3, pp. 399-401, Jul. 1966.
“Spatial Data Mining: A Database Approach,” M. Ester, H-P. Kriegel, J. Sander, Proceedings of the Fifth International Symposium on Large Spatial Databases (SSD), Berlin, Germany, 20 pgs., 1997.
“Spatial Data Mining: Progress and Challenges Survey Paper,” K. Koperski, J. Adhikary, J. Han, Data Mining and Knowledge Discovery, 16 pgs., 1996.
“Spatial Data Mining: Database Primitives, Algorithms and Efficient DBMS Support,” M. Ester, A. Frommelt, H-P. Kriegel, J. Sander, Data Mining and Knowledge Discovery, 28 pgs., 1999.
“Discovery of Spatial Association Rules in Geographic Information Databases,” K. Koperski, J. Han, SSD, 20 pgs., 1995.
Web site print-out:SMILEY(Spatial Miner&Interface Language for Earth Yield), Database Systems Users & Research Group at NDSU (DataSURG) http://www.midas.cs.ndsu.nodak.edu/˜smiley, 5 pgs., undated.
“Growing Decision Trees on Support-Less Association Rules,” K. Wang S. Zhou, Y. He, 6thACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Boston, Massachusetts, 5 pgs., Aug. 2000.
“An Interval Classifier for Database Mining Applications,” R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, A. Swami, 18th International Conference on Very Large Data Bases, Vancouver, Canada, 14 pgs., Aug. 1992.
“SPRINT: A Scalable Parallel Classifier for Data Mining,” J. Shafer, R. Agrawal, M. Mehta, 22nd International Conference on Very Large Data Bases, Bombay, India, pp. 544-555, Sep. 1996.
“Fast Approach for Association Rule Mining for Remotely Sensed Imagery,” Q. Zhou, Q. Ding, W. Perrizo, Proceedings of the ISCA International Conference on Computers and Their Applications, New Orleans, Louisiana, 4 pgs., Mar. 2000.
“Efficient and Effective Clustering Method for Spatial Data Mining,” R. Ng, J. Han, Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile, 12 pgs., Sep. 1994.
“Data Mining: An Overview from a Database Perspective,” M.S. Chen, J. Han, P.S. Yu,IEEE Transactions on Knowledge and Data Engineering, vol. 8, No. 6, pp. 1-40, Dec. 1996.
“Mining Association Rules Between Sets of Items in Large Database,” R. Agrawal, T. Imielinski, A. Swami, ACM-SIGMOD 93, Washington, D.C., pp. 207-216, May 1993.
“Quad Trees: A Data Structure for Retrieval of Composite Keys,” R.A. Finkel, J.L. Bentley, ACTA Informatica, vol. 4, pp. 1-9, 1974.
“Mining Frequent Patterns Without Candidate Generation,” J. Han, J. Pei, Y. Yin, ACM-SIGMOD 2000, Dallas, Texas, pp. 1-12, May 2000.
“The Application of Association Rule Mining on Remotely Sensed Data,” J. Dong, W. Perrizo, Q. Ding, J. Zhou, Proceedings of ACM Symposium on Applied Computers, Italy, 6 pgs., Mar. 2000.
“Finding Interesting Associations Without Support Pruning,” E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. Ullman, C. Yang, Proceedings of 26th International Conference on Very Large Data Bases, Cairo, Egypt, 12 pgs., Sep. 2000.
“Integrating Classification and Association Rule Mining,” B. Liu, W. Hsu, Y. Ma, The Fourth International Conference on Knowledge Discovery and Data Mining, New York, New York, 7 pgs., Aug. 1998.
“Inferring Decision Trees Using the Minimum Description Length Principle,” J.R. Quinlan, R.L. Rivest, Information and Computation, Academic Press, Inc., vol. 80, pp. 227-248, 1989.
“Automatic Subspace Clustering of High Dimensional Data for Data Mining Application,” R. Agrawal, J. Cehrke, D. Gunopulos, P. Raghavan,Proceedings of ACM SIGMOD International Conference on Management of Data, Seattle, Washington, 12 pgs., Jun. 1998.
“Constraint-Based Clustering in Large Databases,” A.K.H. Tung, J. Han, L. V.S. Lakshmanan, R.T. Ng,The 8th International Conference on Database Theory, London, United Kingdom, 15 pgs., Jan. 2001.
“Fast Vertical Mining Using Diffsets,” Mohammed J. Zaki, Karam Gouda, Special Interest Group in Knowledge discovery and Data Mining (SIGKDD), Washington DC, 21 pgs, Aug. 2003.
“Request Order Linked List(ROLL):A Concurrency Control Object for Centralized and Distributed Database Sys
Abidin Taufik Fuadi
Perera Amal Shehan
Perrizo William K.
Serazi Masum
NDSU - Research Foundation
Patterson Thuente Christensen Pedersen , P.A.
Somers Marc
Trujillo James
LandOfFree
Method and system for data mining of very large spatial... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for data mining of very large spatial..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for data mining of very large spatial... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4191984