Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Chemical analysis
Reexamination Certificate
1998-05-07
2002-09-17
Wachsman, Hal (Department: 2867)
Data processing: measuring, calibrating, or testing
Measurement system in a specific environment
Chemical analysis
C702S032000, C702S179000, C382S225000
Reexamination Certificate
active
06453246
ABSTRACT:
The above referenced applications are incorporated herein by reference in their entireties.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is directed to data analysis and, more particularly, to representation of proximity data in multi-dimensional space.
2. Related Art
Multidimensional scaling (MDS) and non-linear mapping (NLM) are techniques for generating display maps, including non-linear maps, of objects wherein the distances between the objects represent relationships between the objects.
MDS and NLM were introduced by Torgerson,
Phychometrika
, 17:401 (1952); Kruskal,
Psychometrika
, 29:115 (1964); and Sammon,
IEEE Trans. Comput
., C-18:401 (1969) as a means to generate low-dimensional representations of psychological data. Multidimensional scaling and non-linear mapping are reviewed in Schiffman, Reynolds and Young,
Introduction to Multidimensional Scaling
, Academic Press, New York (1981); Young and Hamer,
Multidimensional Scaling: History, Theory and Applications
, Erlbaum Associates, Inc., Hillsdale, N.J. (1987); and Cox and Cox, Multidimensional Scaling, Number 59 in
Monographs in Statistics and Applied Probability
, Chapman-Hall (1994). The contents of these publications are incorporated herein by reference in their entireties.
MDS and NLM (these are generally the same, and are hereafter collectively referred to as MDS) represent a collection of methods for visualizing proximity relations of objects by distances of points in a low-dimensional Euclidean space. Proximity measures are reviewed in Hartigan,
J. Am. Statist. Ass
., 62:1140 (1967), which is incorporated herein by reference in its entirety.
In particular, given a finite set of vectorial or other samples A={a
i
, i=1, . . . , k}, a relationship function r
ij
=r(a
i
, a
j
), with a
i
, a
j
&egr;A, which measures the similarity or dissimilarity between the i-th and j-th objects in A, and a set of images X={x
i
, . . . , x
k
; x
i
&egr;R
m
} of A on an m-dimensional display plane (R
m
being the space of all m-dimensional vectors of real numbers), the objective is to place x
i
onto the display plane in such a way that their Euclidean distances d
ij
=∥x
i
−x
j
∥ approximate as closely as possible the corresponding values r
ij
. This projection, which in many cases can only be made approximately, is carried out in an iterative fashion by minimizing an error function which measures the difference between the original, r
ij
, and projected, d
ij
, distance matrices of the original and projected vector sets.
Several such error functions have been proposed, most of which are of the least-squares type, including Kruskal's ‘stress’:
S
=
∑
i
<
j
k
⁢
⁢
(
r
ij
-
d
ij
)
2
∑
i
<
j
k
⁢
r
ij
2
EQ
.
⁢
1
Sammon's error criterion:
E
=
∑
i
<
j
k
⁢
(
r
ij
-
d
ij
)
2
r
ij
∑
i
<
j
k
⁢
r
ij
EQ
.
⁢
2
and Lingoes' alienation coefficient:
K
=
∑
i
<
j
k
⁢
⁢
(
r
ij
⁢
d
ij
)
2
∑
i
<
j
k
⁢
d
ij
EQ
.
⁢
3
where d
ij
=∥x
i
−x
j
∥ is the Euclidean distance between the images x
i
and x
j
on the display plane.
Generally, the solution is found in an iterative fashion by:
(1) computing or retrieving from a database the relationships r
ij
;
(2) initializing the images x
i
;
(3) computing the distances of the images d
ij
and the value of the error function (e.g. S, E or K in EQ. 1-3 above);
(4) computing a new configuration of the images x
i
using a gradient descent procedure, such as Kruskal's linear regression or Guttman's rank-image permutation; and
(5) repeating steps 3 and 4 until the error is minimized within some prescribed tolerance.
For example, the Sammon algorithm minimizes EQ. 2 by iteratively updating the coordinates x
i
using Eq 4:
x
pq
(
m+
1)=
x
pq
(
m
)−&lgr;&Dgr;
pq
(
m
) EQ. 4
where m is the iteration number, x
pq
is the q-th coordinate of the p-th image x
p
, &lgr; is the learning rate, and
Δ
pq
⁡
(
m
)
=
∂
E
⁡
(
m
)
∂
x
pq
⁡
(
m
)
&LeftBracketingBar;
∂
2
⁢
E
⁡
(
m
)
∂
x
pq
⁡
(
m
)
2
&RightBracketingBar;
EQ
.
⁢
5
The partial derivatives in EQ. 5 are given by:
∂
E
⁡
(
m
)
∂
x
pq
⁡
(
m
)
=
-
2
⁢
⁢
∑
j
=
1
,
j
≠
p
k
⁢
r
pj
-
ⅆ
pj
r
pj
⁢
ⅆ
pj
⁢
(
x
pq
-
x
jq
)
∑
i
<
j
k
⁢
r
ij
EQ
.
⁢
6
∂
2
⁢
E
⁡
(
m
)
∂
x
pq
⁡
(
m
)
2
=
-
2
⁢
∑
i
<
j
k
⁢
1
r
pj
⁢
ⅆ
pj
⁢
⌊
(
r
pj
-
ⅆ
pj
)
-
(
x
pq
-
x
jq
)
2
ⅆ
pj
⁢
(
1
+
(
r
pj
-
ⅆ
pj
)
ⅆ
pj
)
⌋
∑
i
<
j
k
⁢
r
ij
EQ
.
⁢
7
The mapping is obtained by repeated evaluation of EQ. 2, followed by modification of the coordinates using EQ. 4 and 5, until the error is minimized within a prescribed tolerance.
The general refinement paradigm above is suitable for relatively small data sets but has one important limitation that renders it impractical for large data sets. This limitation stems from the fact that the computational effort required to compute the gradients (i.e., step (4) above), scales to the square of the size of the data set. For relatively large data sets, this quadratic time complexity makes even a partial refinement intractable.
What is needed is a system, method and computer program product for representing proximity data in a multi-dimensional space, that scales favorably with the number of objects and that can be applied to both small and large data sets. Moreover, what is needed is a system, method and computer program product that can be effective with missing data and/or data containing bounded or unbounded uncertainties, noise or errors.
SUMMARY OF THE INVENTION
The present invention is a system, method and computer program product for representing precise or imprecise measurements of similarity/dissimilarity (relationships) between objects preferably as distances between points in a multi-dimensional space that represent the objects. The algorithm uses self-organizing principles to iteratively refine an initial (random or partially ordered) configuration of points using stochastic relationship/distance errors. The data can be complete or incomplete (i.e. some relationships between objects may not be known), exact or inexact (i.e. some or all relationships may be given in terms of allowed ranges or limits), symmetric or asymmetric (i.e. the relationship of object A to object B may not be the same as the relationship of B to A) and may contain systematic or stochastic errors.
The relationships between objects may be derived directly from observation, measurement, a priori knowledge, or intuition, or may be determined directly or indirectly using any suitable technique for deriving proximity (relationship) data.
The present invention iteratively analyzes sub-sets of objects in order to represent them in a multi-dimensional space that represents relationships between the objects.
In an exemplary embodiment, the present invention iteratively analyzes sub-sets of objects using conventional multi-dimensional scaling or non-linear mapping algorithms.
In another exemplary embodiment, relationships are defined as pair-wise relationships or pair-wise similarities/dissimilarities between pairs of objects and the present invention iteratively analyzes a pair of objects at a time. Preferably, sub-sets are evaluated pair-wise, as a double-nested loop.
In the following discussion, the terms relationship, similarity or dissimilarity is used to denote a relationship between a pair of objects. The term display map is used to denote a collection of images on an n-dimensional space that represents the original objects. The term distance is used to denote a distance between images on a display map that correspond to the objects.
Examples of the present invention are provided herein, including examples of the present invention implemented with chem
Agrafiotis Dimitris K.
Labanov Victor S.
Salemme Francis R.
3-Dimensional Pharmaceuticals Inc.
Sterne Kessler Goldstein & Fox PLLC
Wachsman Hal
LandOfFree
System, method, and computer program product for... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System, method, and computer program product for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System, method, and computer program product for... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2883317