Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-10-13
2003-09-30
Jung, David (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C706S045000
Reexamination Certificate
active
06629095
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to relational database management systems. More specifically, the invention relates to data mining applications for data stored in a relational database table.
Data mining is the process of discovering unexpected patterns or trends in existing data so that a database user can use the discovered knowledge to manage a business more effectively. For example, a typical application might study the demographic information known about a list of customers to create a profile of people most likely to buy a given product, respond to a direct mail campaign, or default on a loan.
Despite its name, “data mining” has little to do with “drill-down” queries; these types of queries are feasible in a data warehouse that has no data mining component. The difference between a data mining exercise and standard decision support queries lies in the analyst's approach to the data: standard queries are assumption-driven, whereas data mining is discovery-based.
For example, an analyst who writes a query that compares sales last year to sales this year is looking for an accurate result to a routine business question, while an analyst who mines sales data is looking for patterns or trends that might be interesting and useful to understand. If the sales comparison query reveals that sales this year are up 200% from last year in some stores but only 100% in others, for example, the analyst might want to mine the data to discover any unexpected reasons for this discrepancy.
In other words, the data warehouse application helps analysts understand what happened, while data mining applications attempt to tell analysts why it happened. It is useful for a retailer to know, for example, what factor had the greatest impact on sales of a given product, and whether the retailer can control that factor or take advantage of that knowledge. Any number of factors can influence sales—price, style, packaging, promotions, placement in the store, season, weather, day of week, time of day, and so on. Without requiring analysts to ask explicit questions about each possible factor, a data mining application can read very large volumes of data and discover both obvious and hidden trends.
A data mining exercise consists of two main phases: building a model and predicting future results. A model defines the influencing factors (input data) is and the potential outcomes (output values). If the data must be obtained from a large pool of data that already exists (e.g., a relational database), the model also defines how these factors and outcomes are selected and mapped from the larger pool of data. The set of inputs, which might be very large (100 or more), is a list of factors that might influence the output. The result of the data mining exercise is a much smaller list of inputs that individually or in combination with other inputs does influence the output.
In turn, the model provides the ability to predict an output given a set of input characteristics. For example, if the model accurately reflects the tendency of customers with certain attributes (age, gender, income, and so on) to purchase a luxury automobile, the discovered results can be compared with a list of prospective buyers to determine those most likely to make a purchase in the near future. Because models can be used independently or in conjunction with query analysis, a warehouse query might be issued to generate a promotional mailing list from a Customer table.
Categorization analysis (a subset of data mining) has useful applications in various industries—financial services, direct mail marketing, telecommunications, insurance, healthcare, retail sales—as a means of solving business problems by leveraging a deeper knowledge of the targeted consumers. Some of these applications are as follows:
Promotion analysis—In retail sales, understanding which products customers often purchase in combination with other products can lead to changes and improvements to in-store displays, seasonal promotions, advertising campaigns, and general advertising strategies.
Churn analysis—In telecommunications, discovering the profile of individuals likely to switch (or not to switch) long-distance carriers helps find ways to attract long-term customers and discourage short-term customers from switching carriers.
Claims analysis—In insurance, isolating the characteristics of a good risk can save both time and money. In healthcare, cost-cutting measures can be discovered by analyzing the history of a claim and the profile of the claimant.
Customer profiling—In almost any business, discovering the demographic profile of consumers most likely to buy or use a given product is critical to maintaining a competitive edge. The ability to compile effective target mailing lists is a common result of good profiling.
Rate and usage analysis—In telecommunications, studying the details of customer calls helps analysts find ways to better serve customers and keep their business, as well as improve the day-to-day service available to those customers.
Fraud detection—In insurance, healthcare, and credit-card businesses, discovering the typical characteristics of a fraudulent claim or application helps analysts predict and prevent future fraudulent activity.
Various data mining products are available including “DataCruncher” from DataMind Corporation of San Mateo, Calif., “IBM Intelligent Miner” from IBM Corporation of Armonk, N.Y., and “MineSet” available from Silicon Graphics, Inc., of Redwood City, Calif. To use large data sets, all of these products must first retrieve data from a relational database (via SQL) and then convert the retrieved data to a flat file, and finally send that file to the data mining engine. Unfortunately, even if the data mining functionality can quickly and efficiently handle the large data sets it receives, the associated systems can convert and send the data only relatively slowly. First, the step of converting the database data to a flat file entails significant effort, including converting data types. Second, transporting the data to the data mining functionality typically involves sending the file over an ODBC connection. Unfortunately, transporting the large data sets necessary for many data mining projects over such connections is too slow for many applications. A related problem exists in providing the mined results back into a database. Specifically, other tools need to transfer the mined results back into the database, further increasing the overhead.
Further, some data mining products employ spreadsheets to provide the data used in the data mining operation and to display results of the operation. Without a mechanism for rapidly porting large data sets (from a large database for example), the size of the data sets employed with such products are limited. In addition to limiting application of data mining to relatively small data sets, this can limit the validity of models generated from the data.
In view of these limitations, a data mining system with improved performance would be desirable.
SUMMARY OF THE INVENTION
The present invention fills this need by providing an integrated data mining and relational database system. This is accomplished by eliminating the need to convert data to a flat file and export it from the relational database management system to a data mining engine. In addition, the invention makes patterns uncovered during data mining (e.g., “understand” and “predict” information) available in virtual relational database tables that can be queried.
Preferably, the relational database management system integrated with the data mining engine is integrated on a server. The data mining engine determines characteristics of relationships between input data values and an output data value that are obtained from a relational database (managed by the relational database management system). The integration allows direct conversion of data values from the relational database to data mining identifiers used for data mining operations by the data mining engine. It also allows identifiers output by t
Bunger Craig J.
Cole Richard L.
Koehler Ann M.
Schneider Donovan A.
Wagstaff William M.
Davda Janaki K.
Jung David
Konrad Raynes & Victor & Mann LLP
LandOfFree
System and method for integrating data mining into a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for integrating data mining into a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for integrating data mining into a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3052394