Statistical outlier detection for gene expression microarray...

Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Biological or biochemical

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06763308

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention is generally directed to the field of processing genomic data. More specifically, the invention relates to a system and method for performing statistical outlier detection for gene expression microarray data.
2. Description of the Related Art
In genomics research, gene expression arrays are a breakthrough technology enabling the measurement of tens of thousands genes' transcription simultaneously. Because the numerical data associated with expression arrays usually arises from image processing, data quality is an important issue.
Two recent scientific articles, Schadt et al. (2000) and Li and Wong (2001), discuss this data quality issue for one of the most popular expression array platforms, the Affymetrix GeneChip™. For example, they point out that outlier problems may arise due to particle contaminations (see, FIG. 1 in Schadt et al. (2000)) or scratch contaminations (see FIG. 5 in Li and Wong (2001)). They indicate that improper statistical handling of aberrant or outlying data points can mislead analysis results.
Li and Wong propose an outlier detection method based on a multiplicative statistical model. While this approach is useful, it is limited to Affymetrix data and lacks the flexibility to accommodate more complex experimental designs. The multiplicative model used by the Li and Wong is as follows:
Y
ij
=&thgr;
i
&PHgr;
j
+&egr;
ij
, &Sgr;
j
&PHgr;
j
2
=J
, &egr;
ij
~N
(0, &sgr;
2
).  (1)
Y
ij
is the intensity measurement of the j
th
probe in the i
th
array. &thgr;
i
is the i
th
fixed array effect, &PHgr;
j
is the j
th
fixed probe effect, and J is the number of probes. The &egr;
ij
′s are assumed to be independent identically distributed normal random variables with mean 0 and variance &sgr;
2
. With the assumption of knowing &PHgr;s or &thgr;s, the following conditional means and standard errors can be derived and used in the Li and Wong method.
θ
~
i
=

j

Y
ij

Φ
j

j

Φ
j
2
,
Φ
j
=

i

Y
ij

θ
i

i

θ
i
,
StdErr



(
θ
~
i
)
=

j

(
Y
ij
-
Y
^
ij
)
2
J

(
J
-
1
)
,
StdErr



(


Φ
~
i
)
=

i

(
Y
ij
-
Y
^
ij
)
2
K

(
K
-
1
)
,
K
=

i

θ
~
i
2
.
The following is a description of the Li and Wong outlier detection approach:
1. Check array outliers—Fit the model (1) and calculate the conditional standard errors for all &thgr;
i
′s. Designate array as array outlier if either of the following criteria are met:
i. Associated &thgr; has standard error larger than three times the median standard error of all &thgr;
i
′s.
ii. Associated &thgr; has dominating magnitude with square value larger than 0.8 times the sum of squares of all &thgr;s.
Select out those array outliers and go to step 2.
2. Check probe outliers—Fit the model (1) and calculate the conditional standard error for all &PHgr;
j
′s. Designate probe as probe outlier if either of the following criteria are met:
i. Associated &PHgr; has standard error larger than three times the median standard error of all &PHgr;
j
′s.
ii. Associated &PHgr; has dominating magnitude with square value larger than 0.8 times the sum of squares of all &thgr;
j
′s.
Select out those probe outliers and go to step 3.
3. Iterate steps 1 and 2 until no further array or probe outliers selected.
SUMMARY OF THE INVENTION
In accordance with the disclosure below, a computer-implemented method and system are provided for detecting outliers in microarray data. A mixed linear statistical model is used to generate predictions based upon the received microarray data. Residuals are generated by subtracting model-based predictions from the original microarray sample data. Statistical tests are performed for residuals by adding covariates to the mixed model and testing their significance. Data from the microarrays are designated as outliers based upon the tested significance.


REFERENCES:
patent: 5143854 (1992-09-01), Pirrung et al.
patent: 5571639 (1996-11-01), Hubbell et al.
patent: 6132969 (2000-10-01), Stoughton et al.
patent: 6229911 (2001-05-01), Balaban et al.
patent: 6341257 (2002-01-01), Haaland
patent: 2002/0039740 (2002-04-01), Ramm et al.
patent: 2003/0023148 (2003-01-01), Lorenz et al.
patent: 2003/0023403 (2003-01-01), Nadon et al.
patent: 2003/0144746 (2003-07-01), Hsiung et al.
patent: 2003/0171876 (2003-09-01), Markowtiz et al.
patent: 2003/0216870 (2003-11-01), Wolber et al.
Schadt, Eric E., et al., “Analyzing High-Density Oligonucleotides Gene Expression Array Data,” J.Cel. Biochem, 80 (2), 2000, pp. 192-202.
Li, Cheng et al., “Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection”, Proc. Natl. Acad. Sci. USA 98 (1), 2001, pp. 31-36.
Wolfinger, Russell D., et al., “Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models,” J. Compu. Biol., vol. 8, No. 6, 2001, pp. 625-637.
Chu, Tzu-Ming, et al., “A systematic statistical linear modeling approach to oligonucleotide array experiments,” Mathematical Biosciences 176, 2002, pp. 35-51.
Kerr, M. Kathleen, et al., “Analysis of Variance of Gene Expression Microarray Data,” Journal of Computational Biology, vol. 7, No. 6, 2000, pp. 819-837.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Statistical outlier detection for gene expression microarray... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Statistical outlier detection for gene expression microarray..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Statistical outlier detection for gene expression microarray... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3205829

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.