Systems, methods, and computer program products to...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Systems, methods, and computer program products to... Systems, methods, and computer program products to...

: 2001-11-15
: 2003-11-25
: Mizrahi, Diane D. (Department: 2175)
: Data processing: database and file management or data structures
: Database design
: Data structure types

: Reexamination Certificate
: active
: 06654764
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is directed to the field of computer-based multidimensional data modeling. It is more particularly directed to interpreting, explaining, and manipulating exceptions in multidimensional data on a computer system.
2. Description of the Background Art
On-Line Analytical Processing (OLAP) is a computing technique for summarizing, consolidating, viewing, analyzing, applying formulae to, and synthesizing data according to multiple dimensions. OLAP software enables users, such as analysts, managers and executives, to gain insight into performance of an enterprise through rapid access to a wide variety of data “views” or “dimensions” that are organized to reflect the multidimensional nature of the enterprise performance data. An increasingly popular data model for OLAP applications is the multidimensional database (MDDB), which is also known as the “data cube.” OLAP data cubes are often used by a data analyst for interactive exploration of performance data for finding regions of anomalies in the data, which are also referred to as “exceptions” or “deviations.” Problem areas and new opportunities associated with the enterprise are often identified when an anomaly in the enterprise data is located.
An exception represents the degree of surprise associated with data that is included in an OLAP data cube. An exception may be defined by means of an example. Given a two-dimensional data cube having “p” values along a first dimension “A,” and “q” values along a second dimension “B,” the element or quantity corresponding to the ith value of dimension A and jth value of dimension B is denoted as, “y
ij
.” To estimate the exception, y
ij
, in this data cube, an expected value, “ŷ
ij
,” of y
ij
is calculated as a function, “f(),” of three terms: (1) a term “&mgr;” that denotes a trend that is common to all y values of the cube, (2) a term “&agr;
i
” that denotes special trends along the ith row with respect to the rest of the cube, and (3) a term “&bgr;
j
” that denotes special trends along the jth column with respect to the rest of the cube. The residual difference “r
ij
” between the expected value ŷ
ij
=f(&mgr;,&agr;
i
,&bgr;
j
) and the actual value y
ij
represents the relative importance of the exception, y
ij
, based on its position in the cube.
By means of further explanation, when a data cube has three dimensions, for example, with dimension, “C,” being the third dimension, the expected value ŷ
ijk
is calculated by taking into account not only the kth value of the third dimension, but also the three values corresponding to the pairs (i,j) in the AB plane, (i,k) in the AC plane and (j,k) in the BC plane. The expected value ŷ
ijk
is then expressed as a function of seven terms as:
ŷ
ijk
=f
(&mgr;,&agr;
i
,&bgr;
j
,&ggr;
k
,(&agr;&bgr;)
ij
,(&agr;&ggr;)
ik
,(&ggr;&bgr;)
kj
), (1)
where (&agr;&bgr;)
ij
denotes the contribution of the ijth value in the AB plane, (&agr;&ggr;)
ik
denotes the contribution of jkth value in the AC plane, and (&ggr;&bgr;)
kj
denotes the contribution of the kjth value in the BC plane. In general, for any k-dimensional cube, the y value can be expressed as the sum of the coefficients corresponding to each of the 2
k
−1 levels of aggregations or group-bys of the cube. The “coefficient” represents a component that provides information used in making predictions about the expected value of ŷ and a “group-by” represents different combinations of the dimensions associated with the multidimensional cube. In the present example, group-bys include “AB” and “ABC.” Therefore, a coefficient is a group-by component that contributes to predictability of a cell in a multidimensional cube. The coefficient model may be used to make predictions about the expected value of an exception.
By means of example, a three-dimensional cube will be considered. The function, f() can take several forms or models, such as an additive form, where function f() is a simple addition of all its arguments, and a multiplicative form, where function f() is a product of its arguments. It will be appreciated by those skilled in the art that the multiplicative form can be transformed to the additive form by performing a logarithm on the original data values. For a multiplicative model, the y
ijk
values denote the log of the original y-values of the cube. The log is used to remove bias associated with the distribution. That is, taking the log will tend to normalize the distribution. The choice of the best form of the function depends on the particular class of data, and is preferably selected by a user having understanding and experience with the data at hand. For example, the distribution of the data is one of the factors that may be used to determine the best form of the function.
The final form of Equation One as shown in Equation Two is,
y
ijk
=ŷ
ijk
+r
ijk
=&mgr;+&agr;
i
+&bgr;
j
+&ggr;
k
+(&agr;&bgr;)
ij
+(&agr;&ggr;)
ik
+(&ggr;&bgr;)
kj
, (2)
where r
ijk
is the residual difference between the expected value ŷ
ijk
and the actual value y
ijk
. The relative importance of an exception is based on the value of its residual. That is, the higher the value of the residual, the higher the importance of the exception.
There are several ways of deriving values of the coefficients of Equation Two. One way of deriving coefficients is shown in U.S. Pat. No. 6,094,651. The approach is a mean-based solution where the coefficients are estimated by taking the logs of all the relevant numbers and then the mean of the previous result. Taking the log will distribute the numbers so that the effect of large differences in the values of the cells is reduced. When the mean is derived a trend may be observed. In general, the coefficient corresponding to any group-by, “G,” is recursively determined, according to the mean-based solution, by subtracting the coefficients from group-bys that are at a smaller level of detail than, G, from the average y value at G.
The mean-based approach for calculating the coefficients is not particularly robust in the presence of extremely large numbers that are outliers. An “outlier” represents data that is related to a coefficient that deviates from the trend of the data by a significant amount. There are statistical methods for deciding when to keep or discard these suspected outlier data points. A number of well-known alternative approaches for handling large outliers can be used, such as the Median Polish Method and the Square Combining Method, disclosed by D. Hoaglin et al.,
Exploring Data Tables, Trends and Shapes
, Wiley Series in Probability, 1988, and incorporated by reference herein. These two alternative approaches are based on using a “median” instead of “mean” for calculating the coefficients. Nevertheless, these alternative approaches have an associated high computational cost. Consequently, the mean-based approach is preferred for most OLAP data sets because significantly large outliers are uncommon in most data sets.
The method for determining a residual, “r
ijk
,” may be determined from Equation Two as shown in Equation Three.
r
ijk
=|y
ijk
−ŷ
ijk
| (3)
The greater the value of r
ijk
, the more likely that the cell in the multidimensional data for which an expected value is being calculated is an exception in the data model. However, the residual value may need to be standardized for a meaningful comparison of multidimensional data. A “standardized residual value” is calculated as shown in Equation Four.
sr=|y
ijk
−ŷ
ijk
|/&sgr;
ijk
(4)
The step of standardization is performed because the magnitude of the residual may appear to be significantly larger than the other values considered. Considering the magnitude of the residual alone can be misleading because the residual should be evaluated in relation to the data in the neighboring cells. Normalization of the data is ach

Affiliated with

Kelkar Bhooshan Prafulla

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Malloy William Earl

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

International Business Machines - Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Mizrahi Diane D.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Smith Christine H.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Systems, methods, and computer program products to... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Systems, methods, and computer program products to..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Systems, methods, and computer program products to... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3177746

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure