Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-07-26
2002-10-22
Corrielus, Jean M. (Department: 2172)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06470333
ABSTRACT:
FIELD OF THE INVENTION
The invention relates to computer database systems and more specifically to distributed computer database systems.
BACKGROUND OF THE INVENTION
Organizations routinely collect large amounts of data on their customers, products, operations and business activities. Insights buried in this data can contribute to marketing, reducing operating costs and strategic decision-making. For example, if there is a strong correlation between the customers who buy one product with those who buy another product, then those customers who have bought just one of them might be good prospects for buying the other product.
Analytical processing of data is primarily done using statistical methods to extract correlations and other patterns in the data. This kind of processing has been variously called data mining, knowledge discovery and knowledge extraction. A search for a specific pattern or kind of pattern in a large collection of data will be called a pattern query.
Large enterprises typically maintain many databases, many of which are transactional databases. The requirements of these databases are often in conflict with the requirements of data mining. Transactional databases are updated using small transactions operating in real time. Data mining, on the other hand, uses large pattern queries that do not have to take place in real time. To resolve this conflict, it is now common for data from a variety of sources to be downloaded to a centralized resource called a data warehouse.
The downloading and centralizing of data from diverse, often disparate sources requires a number of tasks. The data must be extracted from the sources, transformed to a common, integrated data model, cleansed to eliminate or correct erroneous or inaccurate data and integrated into the central warehouse constituting yet another it database in which all the data is stored. In addition, one must ensure that every instance of every business entity, such as a customer, product or employee, has been correctly identified. This is known as the problem of referential integrity. All of these are difficult tasks, especially ensuring referential integrity when the data is being downloaded from databases that identify the business entities in slightly different ways. Current technology downloads data to the data warehouse as an independent activity. from data mining. In contrast with data mining, for which there is a large research literature and many commercial products, data warehousing does not have a strong theoretical basis and has few good commercial products.
Because data warehouses integrate many diverse data sources, it is necessary to specify an integrated data model for the data warehouse as well as a data mapping that extracts, transforms and cleanses data from each data source. It is known in the art that richer data models, such as object-oriented data models, are better suited for defining such an integrated data model and for defining the data mappings, than more limited data models, such as the relational model. Yet most data warehouses still use a flat record structure such as the relational model. Relational databases have a very limited data structure, so that synthesizing more complex data structures is awkward and error-prone. Some of the kinds of data that are poorly suited to storage in a relational database include: textual data in general, hypertext documents in particular, images, sound, multimedia objects and multi-valued attributes. Relational databases are also poorly suited for representing records that have a very large number of potential attributes, only a few of which are used by any given record.
An object database consists typically of a collection of data or information objects. Each information object is identified uniquely by an object identifier (OID). Each information object can have features, and some features can have associated values. Information objects can also contain or refer to other information objects.
To assist in finding information in a database, including the warehousing database, special search structures are employed called indexes. Large databases require correspondingly large index structures to maintain pointers to the stored data. Such an index structure can be larger than the database itself. Current technology requires a separate index for each attribute or feature. This technology can be extended to allow for indexing a small number of attributes or features in a single index structure, but this technology does not function well when there are hundreds or thousands of attributes. Furthermore, there is considerable overhead associated with maintaining an index structure. This limits the number of attributes or features that can be indexed, so the ones that are supported must be chosen carefully. For transactional databases, the workload is usually well understood, so it is possible to choose the indexes so as to optimize the performance of the database. For a data warehouse, there is usually no well defined workload, so it is much more difficult to choose which attributes to index.
Further information can be had regarding the foregoing concepts with reference to the following publications:
1 L. Aiello, J. Doyle, and S. Shapiro, editors.
Proc. Fifth Intem. Conf. on Principles of Knowledge Representation and Reasoning
. Morgan Kaufman Publishers, San Mateo, Cali., 1996.
2 K. Baclawski. Distributed computer database system and method, December 1997. U.S. Pat. No. 5,694,593. Assigned to Northeastern University, Boston, Mass.
3 A. Del Bimbo, editor.
The Ninth International Conference on Image Analysis and Processing
, volume 1311. Springer, September 1997.
4 N. Fridman Noy.
Knowledge Representation for Intelligent Information Retrieval in Experimental Sciences
. PhD thesis, College of Computer Science, Northeastern University, Boston, Mass., 1997.
5 M. Hurwicz. Take your data to the cleaners. Byte Magazine, January 1997.
6 Y. Ohta.
Knowledge-Based Interpretation of Outdoor Natural Color Scenes
. Pitman, Boston, Mass., 1985.
7 A. Tversky. Features of similarity.
Psycho ogical review
, 84(4):327-352, July 1977.
8 S. Weiss and N. Indurkhya.
Predictive Data Mining: A Practical Guide
. Morgan Kaufmann Publishers, Inc., San Francisco, Cali., 1998.
9 J.-L. Weldon and A. Joch. Data warehouse building blocks. Byte Magazine, January 1997.
The disclosures of the publications referenced in this “Background of the Invention” are incorporated herein by reference.
It would be desirable to provide improved systems for data warehousing and-data mining, which overcomes many-of the performance and other problems and limitations of current systems.
SUMMARY OF THE INVENTION
The present invention combines the two activities of data warehousing and data mining, thereby improving the basis and support for data warehousing. The term knowledge extraction will be used herein for the integration of the data warehousing and data mining activities.
The invention resides in an information retrieval apparatus and method for processing a query from a user, including, e.g., a query, for retrieval of information from the data warehouse. The apparatus includes a mechanism for locating a number of features and feature fragments in an index database; an evaluating mechanism for identifying a number of sub-queries of a number of levels contained in the query and recursively evaluating the sub-queries using each of the located features and feature fragments; and a mechanism for collecting and storing a number of results of the recursive evaluation of the query and sub-queries pursuant to computing an overall result of the query.
As used herein, “evaluation” is a process by which a response to a query is generated, characterized by retrieval of information, information location specifiers, or data regarding the information, which match criteria set forth in the query. Recursive it evaluation is a type of query evaluation in which new queries, called sub-queries, are generated from the query and evaluated. The sub-queries so generated can be regarded as nodes in a query tree, with th
Corrielus Jean M.
Jarg Corporation
Kudirka & Jobse LLP
LandOfFree
Knowledge extraction system and method does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Knowledge extraction system and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Knowledge extraction system and method will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2963255