Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-08-26
2003-05-20
Robinson, Greta (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06567814
ABSTRACT:
FIELD OF THE INVENTION
This invention relates generally to the field of computer databases, and more particularly relates to a method and apparatus for knowledge discovery from databases.
BACKGROUND OF THE INVENTION
Throughout the 1980's and early 1990's, many major corporations adopted so-called “business intelligence” tools such as spreadsheets, report writers, and on-line analytical processing (“OLAP”) servers to gain a competitive advantage through better business decision-making. However, the exponential increase in information resulting from the electronic capture of data and its storage in vast data warehouses has dramatically reduced the perceived benefits of such tools. Such tools are valuable for monitoring and planning, but are unable to cope with the large volumes of data or the sophisticated analysis that is required for strategic decision-making if organizations are to achieve or maintain a competitive status.
For many types of businesses, strategic value may be derived from understanding customer behavior and being able to model customers' responses to evaluate alternative actions. The knowledge required to anticipate behavior cannot be discovered by computer users running a large number of traditional queries against data warehouses. Moreover, answering complex questions through traditional database queries is impractical, since users may not have sufficient time to complete such analyses.
“Knowledge discovery from databases” (referred to herein as “KDD”) is perceived by some to be a powerful method of enabling an organization to better understand the dynamics at work in a particular context, for example, in the context of a consumer market for a particular product or service, by automatically searching through large amounts of data, searching for otherwise hidden patterns and relationships of events, and presenting these to the user in a readily understandable format. (An early instance of the use of the term “knowledge discovery” may be found in “Advances in Knowledge Discovery & Data Mining,” Fayyad et al, eds., MIT Press, 1996. KDD systems may be fully automated, freeing up skilled human resources and finding answers to important questions that users might otherwise not known to ask.
Because KDD involves searching for hidden information that is commercially valuable, it is often confused with “data mining.” However, data mining is only one aspect of the KDD process. A KDD process may be broken into several phases, and may be cyclical and iterative, with the results of one phase driving requirements for further phases. Each stage is essential to ensure that knowledge is successfully extracted from data. The identified knowledge can be used to achieve a wide range of objectives, such as making predictions about new data, identifying and explaining hidden patterns and trends in existing data, and summarizing the contents of large databases to facilitate understanding.
A simple example application of Knowledge Discovery is to predict whether a loan application should or should not be granted to a particular applicant. Such a decision can be based on the history of previous applicants who subsequently did, or did not, repay the loan extended to them. Use would be made of data from these previous loans to determine any statistically significant characteristics of applicants who did or did not eventually repay the loan. An algorithm may be trained to assess these features in future applicants and give an indication of how likely the applicant was to repay the loan.
An initial phase of a KDD process focuses on understanding the objectives of the process from a particular perspective. This objective may be converted into a KDD problem definition so that a preliminary KDD plan can be designed to achieve the objectives.
Starting with an initial plan, the user of a KDD system must identify what data is required, where it may be found, what format it is in, and what external sources of missing data are available. This stage of a KDD process provides the first insights into the data and must identify and find solutions for any data quality issues that may exist.
In a data preparation phase, data is “cleaned” and transformed to ready it for a data modeling phase. In a data modeling phase, various known techniques may be applied. Some techniques require certain forms of data, so that a reiteration of the data preparation phase may be required. This is the process often referred to as data mining.
After the modeling phase has been completed, the results must be reviewed to confirm that the model used solves the original problem. If not, it must be determined what has been missed. At the end of this evaluation phase, a decision should be reached as to how to use the results to accomplish the identified objectives. To this end, the new knowledge gained must be deployed, i.e., organized and presented in such a way that it may be used effectively.
KDD and data warehousing are complementary concepts addressing a demand for better use of information. An existing data warehouse may provide a rich source of data for KDD, but may still need to be augmented by data sourced from operational systems and external sources.
The need for data warehousing was driven by the requirement in many organizations to better understand the data already existing in different processing systems, and to enable organizations to make better use of such existing data. The need to integrate the data held in different processing systems makes it desirable to centralize data. Data warehouse volumes for many commercial organizations now commonly exceed 100 gigabytes of information, and the number of systems over one terabyte (1,000 gigabytes) is growing rapidly. OLAP systems commonly handle ten to twenty gigabytes of information, with some handling up to 100 gigabytes.
As the volume of available data increases, the number of possible permutations of data relationships grows exponentially. The volume can become too great for users to explore and analyze, increasing the risk that important patterns and relationships may be overlooked. This is the reason that data mining techniques are being increasingly adopted for KDD.
The following table contrasts the sorts of questions that a data warehouse or OLAP tool can answer against those that KDD systems are well-suited to solve:
TABLE 1
TYPICAL OLAP QUESTIONS
TYPICAL KDD QUESTIONS
Which customers spent the most last year?
Which customers should be targeted with a future
promotion?
How many customers closed their accounts in the
Which customers are most likely to switch their
previous six months as compared to the same
accounts to a competitor in the next six months?
period last year?
Which stores failed to meet target goals last
What is the optimum size and location of a new
month?
store?
What were the top selling five products by
Which additional products are most likely to be sold
revenue?
with a purchase from the delicatessen counter?
How much did the bank lose on failed loans in the
Which customers are most likely to default on a
last year?
loan?
As the examples in Table 1 illustrate, OLAP and KDD techniques may be advantageously applied in a variety of commercial contexts, including retail organizations, banks, and many others, including marketing, insurance, sales, personnel, medical, fraud detection, customer care.
KDD is perceived by many as the next step in the natural evolution of the reporting and OLAP systems deployed over the last ten or more years. KDD tools and techniques can analyze the same operational data or data warehouse data that populates and OLAP system, although KDD processes may require data preparation specific to the form of algorithm to be applied. Such data preparation may be needed on both operational and data warehouse sources.
On very large data warehouse, KDD techniques may be employed to select the information required for further OLAP analysis, as it may not be feasible to load all of the original data into an OLAP system, or event to know which information
Bankier John Duncan
Beck Charles Allan
Brind Andrew Craig
Brown David John
Brown Kristy Irene
Howrey Simon Arnold & White , LLP
Robinson Greta
thinkAnalytics Ltd
LandOfFree
Method and apparatus for knowledge discovery in databases does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for knowledge discovery in databases, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for knowledge discovery in databases will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3051256