Data processing: measuring – calibrating – or testing – Measurement system – Statistical measurement
Reexamination Certificate
1999-04-07
2001-06-26
Shah, Kamini (Department: 2857)
Data processing: measuring, calibrating, or testing
Measurement system
Statistical measurement
C702S069000, C250S282000, C340S541000
Reexamination Certificate
active
06253162
ABSTRACT:
FIELD OF THE INVENTION
The present invention is a method of identifying and/or characterizing features in indexed data. The present invention is especially useful for identifying and extracting features in spectral data, for example distinguishing signal from noise.
As used herein, the term “indexed data” refers to a set of measured values called responses. Each response is related to one or more of its neighbor element(s). The relationship may be, for example, categorical, spatial or temporal and may be explicitly stated or implicitly understood from knowing the type of response data and/or how the response data were obtained. When a unique index is assigned to each response, the data are considered indexed. The unique index may be one dimensional or multi-dimensional. One dimensional indexed data may be defined as data in ordered pairs (index value, response), where the index values represent values of a physical parameter such as time, distance, frequency, or category, and the response may include but is not limited to a signal intensity, particle or item count, or concentration measurement. An example of a multi-dimensional indexed dataset is a matrix having a unique row and column address for each response.
BACKGROUND OF THE INVENTION
The identification and/or characterization of significant or useful features is a classic problem in the analysis of indexed data. Often this problem is reduced to separating the desired signal from undesired noise. Transient features, specifically peaks are frequently of interest. For indexed data, a peak appears as a deviation, for example a rise and fall, in the responses over consecutive indices. However, the appearance of background noise can also result in a deviation of responses for indexed data.
Traditionally, peak detection based upon rejecting responses below a threshold value has been used. Whether manual or automated, selection of a threshold is still an art, requiring arbitrary and subjective operator/analyst-dependent decision making. The effectiveness of traditional peak detection is affected by signal to noise ratio, signal drift, and varying baseline signal. Consequently, an operator or analyst may have to apply several thresholds to the responses over different regions of indices to capture as much signal as possible, which is difficult to reproduce, suffers from substantial signal loss, and is subject to operator/analyst uncertainty.
For example, in developing statistical analysis methods for MALDI-MS (matrix-assisted laser desorption/ionization—mass spectrometry), current peak detection and characterization algorithms are inadequate. The MALDI-MS process begins with an analyte of interest placed on a sample plate and mixed with a matrix. The matrix is a compound chosen to absorb light of wavelengths emitted by a given laser. Laser light is then directed at the sample and the matrix absorbs the light energy, becoming ionized. The ionization of the matrix results in subsequent ionization of the analyte as analyte ions
100
(FIG.
1
). A charge is applied at the detector
104
that attracts the analyte ions
100
through a flight tube
102
to the detector
104
. The detector
104
measures the abundance of ions that arrive in short time intervals. The abundance of ions over time is converted to the abundance of ions as a function of mass/charge (m/z) ratio. The ions
100
arrive at the detector
104
in a disperse packet which spans multiple sampling intervals of the detector
104
. As a result, the ions
100
are binned so that the are counted over several m/z units as illustrated in FIG.
2
. Current algorithms require the user to specify a detection threshold
200
; only peaks
202
exceeding this threshold will be detected and characterized. The detection threshold procedure is conceptually appealing and suggests that m/z values for which no ions are present will read zero relative abundance, while m/z values for which ions are present will result in a peak. The list of MALDI-MS peaks produced by the instrument depends on how a given user sets the detection threshold
200
on any given day. This required human intervention makes complete automation impossible and induces variability that makes accurate statistical characterization of MALDI-MS spectra difficult.
Operator, instrumental and experimental uncertainty add noise to the MALDI-MS spectra, decreasing even further the effectiveness of current peak detection algorithms. If the user-defined threshold
200
is set too low, noise can erroneously be characterized as a peak. However, if the user-defined threshold
200
is set too high, small peaks might be erroneously identified as noise.
Related to the problem of distinguishing signal from noise is bounding uncertainty of the signal. It is well known that replicate analyses of a sample often produce slightly different indexed data.
Thus, there is a need in the art of indexed data collection and analysis for a method of processing indexed data that provides greater confidence in identification/characterization of spectral feature(s), and/or greater confidence in separating signal from noise with less signal loss that is robust and minimizes the adverse effects of low signal to noise ratio, signal drift, varying baseline signal and combinations thereof. In addition, there is a need for a method for characterizing multi-dimensional uncertainty of the signal.
SUMMARY OF THE INVENTION
The present invention is a method of identifying features in indexed data that is fundamentally distinct from prior methods. Whereas prior methods focused on comparing the responses to a response threshold, the present invention uses the responses in combination with the indices. More specifically, the present invention considers responses as a histogram of the indices, and uses this histogram concept to construct a measure of dispersion of the indices. The responses associated with each of the indices are used as histogram frequencies in measuring dispersion of indices. Comparison of the index dispersion to a dispersion critical value provides the determination of significant or useful feature(s). Thus, the method of the present invention has the steps of
(a) selecting a subset of indices having a beginning index and an ending index;
(b) computing a measure of dispersion of the subset of indices using a subset of responses corresponding to the subset of indices as histogram frequencies; and
(c) comparing the measure of dispersion to a dispersion critical value.
In the example of MALDI mass spectrometry, the index values are mass/charge ratios and the responses are corresponding intensities. Each index value represents the physical measurement of mass/charge ratio, and its corresponding intensity measurement represents the relative abundance of ions observed at that mass/charge ratio. A MALDI-MS spectrum can then be thought of as a histogram of mass/charge ratios (i.e. the relative abundance of ions as a function of mass/charge ratio).
From this histogram concept, features can be identified and characterized by comparing properties of the histogram to the corresponding properties for a hypothesized noise only distribution. In a preferred embodiment of this invention, dispersion is used as a criteria for distinguishing spectral features due to signal from spectral features due to noise. In particular, when only noise is present, the dispersion of index values in some small, consecutive region of the data should reflect the dispersion of a Uniform distribution, where the relative abundance of each index value is expected to be constant over the region of interest. On the other hand, when a feature due to signal is present, the dispersion of index values in a small region of the data will be significantly different from the dispersion due to a Uniform distribution. Once a feature has been identified, it is characterized in a similar manner using various measures of statistical moments of the index values.
It is, therefore, an object of the present invention to provide a method of identifying features in indexed data using a measure of dispersi
Anderson Kevin K.
Daly Don Simone
Jarman Kristin H.
Wahl Karen L.
Battelle (Memorial Institute)
Shah Kamini
Woodard Emhardt Naughton Moriarty & McNett
LandOfFree
Method of identifying features in indexed data does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of identifying features in indexed data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of identifying features in indexed data will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2480576