Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-07-23
2004-04-27
Coby, Frantz (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
06728728
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to methods for managing information in general and to the binary representation and information mining in particular.
The idea of a binary database was first introduced by Spiegler and Maayan in a seminal paper of 1985 (Spiegler, I., and Maayan, R., “Storage and Retrieval Considerations of Binary Data Bases”,
Information Processing & Management
, Vol. 21,3 pp. 233-254, 1985), hereinafter; Spiegler and Maayan.
The original binary database concept described in Spiegler and Maayan proposed a method for storage and retrieval of alphanumeric data found in files and databases as an alternative to inverted file for a storage and retrieval technique in database management.
The “binary idea” was then ahead of time. Today, the application of the binary idea in bit maps or bit vectors, have come to age with several vendors developing software to support access and retrieval to databases and data warehouses. Those developments fall short of full realization of the original binary database concept as they use bit vectors at the attribute level without linking among attributes or providing an overall binary database view.
U.S. Pat. No. 5,649,181 to French et al. describes a method for using bit vectors for indexing database columns (attributes) for the purposes of information access and retrieval. The patent was implemented in a software product called Sybase IQ, aimed for use as an on line analytical processing (OLAP) engine.
U.S. Pat. No. 5,706,495 to Chadha et al. describes the use of a vectorized index on which a series of bit-vector operations are performed for optimizing SQL queries.
Some firms apply today bit vectors in their products. For example, Sand Technologies, in a package called Nucleus, uses bit maps for improving high performance ad hoc interactive queries.
The present invention carries the binary database concept to new territories and applications, which include representation of graphs, keywords contexts, to data and text mining, knowledge discovery in databases (KDD), and up to a database on a chip. The binary/positive representation of data can be used to extract behavior patterns, characterizing consumer segments, select symptoms identifying a certain disease, support target marketing campaign, perform DNA analysis, and many more.
A recent article by Gelbard and Spiegler's (Gelbard. R., and Spiegler, I., “Hempel's Raven Paradox: A Positive Approach to Cluster Analysis”,
Computers & Operations Research
, Vol. 27.4, April 2000), hereinafter; Gelbard and Spiegler, enhances and advances the binary database approach even more and presents a model for similarity evaluation and a method for data clustering which is based on positive attributes of data.
The present invention carries the similarity evaluation and the clustering method far ahead, by improving similarity indexing and clustering techniques
The present invention provides an innovative approach to the use of the binary data representation in the following areas:
Marketing
Segmentation: customers, products, events and
Direct Marketing.
Customer Relationship Management (CRM),
lifetime value, retention.
Market Basket Analysis and consumer behavior.
Internet
Search Engines: keyword, names, natural language,
categories, contexts.
User Profiling
Personalization of service in e-commerce and
related applications, locating most likely users
to respond to a product or service.
Management
Support decision making in data warehousing,
data marts and OLAP.
Finance
Customization of investment packages,
classification of customers, market trend
detection/alert.
Banking
Fraud detection, credit policy,
customer defaults, defection
Insurance
Plan tailoring, risk identification, focusing
Telecomm
Customer management, churning modeling,
customer retention/defection in cellular, line
and Internet communication.
Medicine
DNA segmentation, pharmacology, diagnosis.
Human Resources
Characterization, classification, prioritization.
Database on a chip
Implementation of databases in hardware. Relevent
data may become part of palm, cellular, or
network devices in the near future.
CBR
Case Base Reasoning-a method for comparing and
handling cases such as emergencies, social
crises and more.
FIG. 1
to which reference is now made shows an overview
10
of new areas and applications in which the present invention is mostly useful.
SUMMARY OF THE INVENTION
In accordance to the present invention there is provided a knowledge tool for describing a relationship pattern between objects, comprising a binary representation for an interaction between the objects, the binary representation indicates an alleged influence of an object i on an object j by assigning a positive value to an element in an i
th
row and a j
th
column of a matrix in which the objects are set in a row and column format.
In accordance to the present invention there is provided a method to evaluate quantitatively a similarity or a distinction between at least two objects, comprising the stages of: (a) representing the objects by a binary representation in which attributes of the objects are features which are relevant to the similarity; (b) calculating a similarity index between the at least two objects, the similarity index is proportional to a number of positive attributes common to the at least two objects being represented by the binary representation.
In accordance to the present invention there is provided a method for preserving a compression capability of a database comprising the stages of: (a) representing the data in the database by a binary matrix; (b) interchanging an order between rows and an order between columns of the binary matrix, as to partition said binary matrix into approximate homogeneous sub-areas containing cells of “1” or “0” only; (c) excluding said approximate homogeneous sub-areas of said binary matrix so as to get a reduced binary matrix and loading said reduced binary matrix into a data storage space; (d) symbolizing the homogeneity pattern by a tree structure, and (e) changing the root of the tree structure in order to get a required feature of said tree structure.
In accordance to the present invention there is provided a method for grouping a plurality of objects according to their similarity, the method comprises the stages of: (a) representing the objects by a binary representation matrix with positive attribute values, in which the rows being the objects and the columns consist of attributes relevant to grouping; (b) calculating an index of similarity for each pair of objects among the plurality of objects; (c) building an object similarity matrix in which an entry of the matrix element of an intersection between two objects, is the index of similarity between the two objects, and (d) scanning the similarity matrix to chose pairs of objects having the similarity index of at least a pre-selected value, each of the chosen pair of objects consist a different clustering candidates respectively.
In accordance to the present invention there is provided a method for data mining comprising the stages of: (a) defining attributes which are considered a-priori by an expert opinion to be meaningful to a score of a data mining process; (b) reading raw data from operational database system and converting the data into objects of a binary representation in a binary matrix in which columns consist of the attributes; (c) performing positive clustering of the converted data according to a similarity which is based on the attributes to obtain number of groups, and (d) executing data mining within the groups.
In accordance to the present invention there is provided a method for text mining comprising the stages of: (a) defining attributes which comprises words considered a-priori to be included in a text as an N-chain phrase; (b) reading a free form text and performing initial parsing of the text; (c) identifying and reconstructing the binary N-chain phrase, and (d) retrieving the N-chain phrases in relevant contexts.
In accordance to the present invention there is provided a method for adaptive n
Gelbard Roy Moshe
Spiegler Israel
Coby Frantz
Friedman Mark M.
LandOfFree
Unified binary model and methodology for knowledge... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Unified binary model and methodology for knowledge..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Unified binary model and methodology for knowledge... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3215141