Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-02-14
2002-12-03
Homere, Jean R. (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
06490590
ABSTRACT:
FIELD OF INVENTION
The present invention relates generally to the generation of a logical data model (LDM) and a physical data model (PDM) of a data base, and more particularly, to a generalized LDM and corresponding tool for maintaining correspondence between the LDM and PDM. The present invention also relates to using a physical database design and data description language created from the PDM to construct a data warehouse.
BACKGROUND ART
The manual creation and population, or loading, of a data warehouse with data from varied data sources is known. Also known are automatic routines for loading data into a data warehouse, as described below. Prior approaches required duplication of effort and data to create new LDMs, new PDMs, new extraction routines, and new loading routines for every customer. Upwards of 80% of the time is spent creating and recreating LDMs, PDMs, and extraction and loading routines under prior approaches.
A prior approach for creating and populating a data warehouse using LDMs and PDMs is illustrated by the flow chart, generally designated by reference numeral
100
, in
FIG. 1. A
description of the prior approach as shown in
FIG. 1
follows. Beginning at step
102
, a subject matter expert (SME) provides business rules to a modeler.
The SME, or domain expert, is a person with intimate knowledge of the particular area or topic for which the data warehouse is being constructed. The modeler is a person knowledgeable about transforming or creating representations of business rules in a data model. Business rules express the constraints on business operations in the real world.
In step
104
, the modeler converts the business rules, received from the SME in step
102
, into a logical data model using a data modeling tool, such as ERwin available from Platinum Technology, Inc. Other similar data modeling tools include PowerDesigner available from Sybase, Inc. and System Architect available from Popkin Software and Systems. After the modeler has converted the business rules into data structures in the LDM in step
104
, the modeler produces a PDM using a data modeling tool, such as ERwin, in step
106
.
The PDM of step
106
is produced in accordance with the LDM of step
104
, regardless of intervening changes to the LDM occurring between the end of step
104
and the completion of step
106
. In other words, changes made to the LDM while step
106
is in progress are not necessarily reflected in the final PDM of step
106
.
In step
108
, a data information analyst (DIA) determines data sources for a necessary attribute list (NAL). The NAL is developed by the DIA based on the DIA's domain knowledge and the LDM. The NAL includes those fields required by the system in order to operate properly and answer the business-related questions of the user.
In response to the NAL determination of step
108
and the PDM production of step
106
, the DIA produces a cross referenced source-target data list in step
110
. The source-target data list produced in step
110
cross references data sources for the NAL, identified in step
108
, to fields or objects in the PDM.
After the DIA produces the source-target data list, an extract specialist determines a method of operating on a chosen source system to obtain formatted data in step
112
. The source system includes the data to be converted and moved into the data warehouse. The extract specialist causes the data from the source system to be converted to the data format needed by the loading effort.
The loading effort is performed by any one of several available utilities for populating the data warehouse. Some example utilities include FastLoad, MultiLoad, and Tpump available from the NCR Corporation.
The FastLoad utility loads empty tables in the data warehouse with data collected from channel and network attached clients. FastLoad can be executed from any client platform, mainframe, UNIX® or Windows NT® system. Data splitting, data conversion, data movement and data loading are all automatic and are all performed in parallel. FastLoad is fully restartable, enabling the data load process to automatically begin loading from the last transient journal checkpoint simply by re-executing the job.
The MultiLoad utility or MLOAD handles loading, updating and deleting data in existing tables as a single, integrated process with the MultiLoad utility. MultiLoad handles extremely large data volumes requiring updates and deletes. MultiLoad enables rapid acquisition of host data with rapid block-level application of data into tables. Large inserts, deletes, updates, and upserts against empty or populated tables and data maintenance on multiple tables are performed simultaneously. MultiLoad executes on a variety of client platforms, in a fail safe mode and is fully recoverable with an inherent checkpoint design.
The TPump utility handles loading near real-time data into the data warehouse. TPump inserts, updates and deletes data in the data warehouse in a continuous fashion. TPump uses row hash locks instead of table level locks thereby enabling users to run queries while TPump is working.
Once the method for obtaining formatted data is determined in step
112
, the formatted data is extracted from the source system and communicated to a database management system in step
114
, e.g., the Teradata Relational Database Management System (RDBMS) available from the NCR Corporation.
In step
116
, the cleansing, transforming, and householding specification is generated by the DIA. The cleansing routines selected by the DIA eliminate or fix bad records from the data, for example eliminating records without values in required fields. The transforming routines deal with changing the data type, math, and semantics of the data. Householding routines perform filtering of records to appropriately group together similar records. For example, householding routines are able to group together many services provided to a single user and identify them in relation to the user. In this manner, using a telephone example, even though a person may use differing telephone services (long distance, calling card, local and toll calls) all of the services are identified as related to the same individual.
Execution of the cleansing, transforming, and householding operations on the data in the database management system occurs in step
118
.
In step
120
, the data is converted into loadable format for eventual loading into the data warehouse using either the MLOAD, FastLoad, or TPump utilities of the Teradata RDBMS.
Construction of the data warehouse occurs in step
122
using SQL code generated by the ERwin model from step
106
. In this step, the tables and fields of the data warehouse corresponding to the PDM are instantiated within the database management system.
Once the data warehouse has been constructed in step
122
and the data converted into loadable format in step
120
, the data warehouse is loaded with the data in step
124
using either the MLOAD, FastLoad, or TPump utilities. Since much of the information required for creation and population of the data warehouse is not stored in the database management system, the manual and human intensive steps making up flowchart
100
must be repeated for each and every iteration or modification of the data warehouse. This results in a large amount of duplicated data and work effort for each iteration.
Currently, the information required for creation and population of the data warehouse is stored on paper or in the knowledge-base of the individual worker. This can be problematic if changes are necessitated and either the worker is no longer available or the papers are not available or are disorganized. In either situation, reproducing or modifying the data warehouse is difficult. Therefore, there is a need in the art to achieve reproducibility of the creation and population of a data warehouse.
SUMMARY OF THE INVENTION
Accordingly, an object of the present invention is to decrease the amount of effort and duplication of data and work effort during the creation and population of a data warehouse.
Ano
Homere Jean R.
Lewis Cheryl
Lowe Hauptman & Gilman & Berner LLP
NCR Corporation
LandOfFree
Method of generating a logical data model, physical data... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of generating a logical data model, physical data..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of generating a logical data model, physical data... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2919014