Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-06-14
2004-02-03
Rones, Charles (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06687690
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to multidimensional data, and more particularly, to a method and system for providing a function for use in finding exceptions in the multidimensional data.
2. Description of the Prior Art
On-Line Analytical Processing (OLAP) is a computing technique for summarizing, consolidating, viewing, applying formulae to, and synthesizing data according to multiple dimensions. OLAP software enables users, such as analysts, managers and executives, to gain insight into performance of an enterprise through rapid access to a wide variety of data views that are organized to reflect the multidimensional nature of the enterprise performance data. An increasingly popular data model for OLAP applications is the multidimensional database (MDDB), which is also known as the data cube. OLAP data cubes are predominantly used for interactive exploration of performance data for finding regions of anomalies in the data, which are also referred to as exceptions or deviations. Problem areas and/or new opportunities are often identified when an anomaly is located.
An exception is defined by first considering a two-dimensional data cube having p values along a first dimension A and q values along a second dimension B. The element or quantity corresponding to the ith value of dimension A and jth value of dimension B is denoted as y
ij
. To estimate the degree of surprise y
ij
holds in this data cube, an expected value ŷ
ij
of y
ij
is calculated as a function f of three terms: (1) a term &mgr; that denotes a trend that is common to all y values of the cube, (2) a term &agr;
i
that denotes special trends along the ith row with respect to the rest of the cube, and (3) a term &bgr;
j
that denotes special trends along the jth column with respect to the rest of the cube. The residual difference r
ij
between the expected value ŷ
ij
=f(&mgr;, &agr;
i
, &bgr;
j
), and the actual value y
ij
represents the degree of surprise of element y
ij
based on its position in the cube.
When the data cube has three dimensions, for example, with dimension C being the third dimension, the expected value ŷ
ijk
is calculated by taking into account not only the kth value of the third dimension, but also the three values corresponding to the pairs (i,j) in the AB plane, (i,k) in the AC plane and (j,k) in the BC plane. The expected value ŷ
ijk
is then expressed as a function of seven terms as:
ŷ
ijk
=f
(&mgr;, &agr;
i
, &bgr;
j
, &ggr;
k
,(&agr;&bgr;)
ij
,(&bgr;&ggr;)
ik
),(&ggr;&bgr;)
kj
), (1)
where (&agr;&bgr;)
ij
denotes the contribution of the ijth value in the AB plane, (&agr;&ggr;)
ik
denotes the contribution of jkth value in the AC plane, and (&ggr;&bgr;)
kj
) denotes the contribution of the kjth value in the BC plane. In general, for any k dimensional cube, the y value can be expressed as the sum of the coefficients corresponding to each of the 2k−1 levels of aggregations or group-bys of the cube. To illustrate, a 3-dimensional cube will be considered.
The function f( ) can take several forms or models. Two particularly useful forms are an additive form, where function f( ) is a simple addition of all its arguments, and a multiplicative form, where function f( ) is a product of all its arguments. The multiplicative form can be transformed to the additive form by performing a logarithm on the original data values. Thus, the final form of Eq. (1) is,
y
ijk
=ŷ
ijk
+r
ijk
=&mgr;+&agr;
i
+&bgr;
j
+&ggr;
k
+(&agr;&bgr;)
ij
+(&agr;&ggr;)
ik
+(&ggr;&bgr;)
jk
, (2)
where r
ijk
is the residual difference between the expected value ŷ
ij
=f(&mgr;, &agr;
i
, &bgr;
j
) and the actual value y
ij
. The relative importance of an exception is based on the relative value of its residual, that is, the higher the value of the residual, the higher the importance of the exception.
For a multiplicative model, the y
ijk
values denote the log of the original y-values of the cube. The choice of the best form of the function depends on the particular class of data, and is preferably selected by a user having the understanding and experience with the data at hand.
There are several ways for deriving values of the coefficients of Eq. (2). One approach is by a mean-based method where the coefficients are estimated as follows:
&mgr;=
y
. . .
=overall mean or average (3)
&agr;
i
−y
i
. . .
−&mgr;, (4)
where y
i
. . .
is the mean over all numbers with the ith value of A. Thus, for a two-way table, &agr;
i
denotes the magnitude of the difference of the average of the numbers along the ith row from the overall average &mgr;.
&bgr;
j
=y
.j.
−&mgr;, (5)
where y
.j.
is the mean over all numbers with the jth value of B.
&ggr;
k
=y
. . . k
−&mgr;, (6)
where y
. . . k
is the mean over all numbers with the kth value of C. Lastly,
(&agr;&bgr;)
ij
=y
ij
−&agr;
i
−&bgr;
j
−&mgr; (7)
The remaining terms are defined analogously.
In general, the coefficient corresponding to any group-by G is recursively calculated by subtracting all coefficients from group-bys that are at a smaller level of detail than group G from the average y value at group-by G.
The mean-based approach for calculating the coefficients is not particularly robust in the presence of extremely large outliers. Consequently, a number of well-known alternative approaches for handling large outliers can be used, such as the median polish method and the square combining method, disclosed by D. Hoaglin et al., Exploring data tables, trends and shapes, Wiley series in probability, 1988, and incorporated by reference herein. These two alternative approaches are based on using a “median” instead of “mean” for calculating the coefficients. Nevertheless, these alternative approaches have an associated high computational cost. Consequently, the mean-based approach is preferred for most OLAP data sets because significantly large outliers are uncommon in most data sets.
The technique for determining residual and coefficients can be extended to handle hierarchies along one or more dimensions of a data cube. The basic idea is to define the expected value of a data cube element, not only based on its row and column position, but also on its hierarchical parents. For instance, consider values y
ij
in a data cube consisting of two dimensions A and B, where dimension A has two levels of hierarchies: A
1
→A
2
→ALL. To calculate an expected value ŷ
ij
at the A
1
B level, the row coefficient &agr;
i
at level A
1
, the column coefficient &bgr;
j
at level B and overall coefficient &mgr; at level ALL, two new terms corresponding to the two new aggregations A
2
and A
2
B along the hierarchy on A are used. Equation (2) thus becomes:
ŷ
ij
=&mgr;+&agr;
i
+&bgr;
j
+&agr;′
i′
+(&agr;′&bgr;)
i′j
(8)
where i′ denotes the parent of i at hierarchy level A
2
, &agr;′
i′
denotes the contribution of the ith value at level A
2
, and (&agr;′&bgr;)
i′j
denotes the contribution due to the ijth value at level A
2
B.
The general formula for handling hierarchies is to express a y value in a cube in terms of coefficients obtained from all higher level aggregations of the cube. For example, for the y-values at A
1
B in Eq. (8), coefficients from the five higher level aggregates are used, that is, A
1
, A
2
, B, A
2
B, and ALL. The same recursive rule of the previous subsection is followed for estimating the coefficients, where &mgr; is first estimated as the overall average, and then for terms corresponding to each group-by G. The average at group-by G is computed and then the coefficients from each child of G are subtracted from the computed average.
“ANalysis Of VAriance”(ANOVA) is a well-known statistical technique described by D. G. Montgomery, in ‘Desi
Ohlandt Greeley Ruggiero & Perle L.L.P.
Rones Charles
Smith Christine H.
Spiegel Michael
LandOfFree
Employing a combined function for exception exploration in... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Employing a combined function for exception exploration in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Employing a combined function for exception exploration in... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3316890