Method for reducing cross-talk within DNA data

Data processing: measuring – calibrating – or testing – Measurement system – Measured signal processing

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C702S019000, C702S020000, C702S190000, C204S456000

Reexamination Certificate

active

06598013

ABSTRACT:

BACKGROUND OF THE INVENTION
This invention relates in general to DNA data processing and in particular to an algorithm for reducing cross-talk between DNA data streams.
The structural analysis of DNA has an increasingly important role in modern molecular biology and is needed to support many research programs, including searching for clues to certain diseases. Accordingly, extensive research into DNA structure is ongoing. One of the most complex programs is the Human Genome Project which has the goal of determining the content of human DNA.
DNA is a nucleic acid consisting of chains of nucleotide monomers, or oligomers, that occurs in a specific sequence. The structural analysis of DNA involves determining the sequence of the oligomers. Currently, DNA sequencing begins with the separation of a DNA segment into DNA fragments comprising a stochastic array of the oligomers. The separation involves electrophoresis in DNA sequencing gels, such as denaturing polyacrylamide gels. One of two methods is typically used for the electrophoresis, either a chemical method is used that randomly cleaves the DNA segment or dideoxy terminators are used to halt the biosynthesis process of replication.
Each of the oligomers in the resulting stociastic array terminates in one of four identifying nitrogenous bases that are typically referred to by a letter. The bases are: adenine (A), cytosine (C), guanine (G) and thymine (T). Thus, the sequencing of the DNA can be accomplished by identifying the order of the bases A, C, G and T. This process is often referred as “base calling”. However, DNA is extremely complex. For example, there are 3.1 billion biochemical letters in human DNA that spell out some 50,000 genes, automated base calling is highly desirable.
One method of automated base calling involves fluorescence detection of the DNA fragments. A schematic drawing of an apparatus for fluorescence detection is shown generally at
10
in FIG.
1
. The apparatus
10
includes an upper buffer reservoir
12
connected to a lower buffer reservoir
14
by a gel tube
16
. The gel tube
16
is formed from glass or quartz and has an inside diameter within the range of one to two mm. A detector
18
is mounted near the bottom of the tube
16
. The detector
18
monitors the gel passing through the tube
16
and transmits the data to a computer
20
.
The chemical method described above is used to separate a DNA segment into its base oligomers. A different colored fluorophore dye is used for each of the chemical reactions for the bases A, C, G and T . One of the fluorophore dyes attaches to each of the oligomers as a marker. The reaction mixtures are recombined in the upper reservoir
12
and co-electrophoresed down the gel tube
16
. As the fluorophore dye labeled DNA fragments pass by the detector
18
, they are excited by an argon ion laser that causes the dye to fluoresce. The dye emits a spectrum of light energy that falls within a range of wavelengths. A photo-multiplier tube in the detector
18
scans the gel and records data for the spectrum for each of the dyes. The resulting fluorescent bands of DNA are separated into one of four channels, each of which corresponds to one of the bases. The real time detection of the bases in their associated channels is transferred to the computer
20
which assembles the data into the sequence of the DNA fragment.
FIG. 2
illustrates an ideal data stream generated by the apparatus
10
. As shown in
FIG. 2
, a color is associated with each of the four bases; with green identifying A; blue, C; black, G; and red, T. The data in each of the channels is shown as a horizontal line with the detection of a base appearing in real time as a pulse. The resulting time sequence of pulses received, and hence the DNA sequence, is shown as the top line in FIG.
2
. However, the actual data stream differs from the ideal data stream because of several factors. First of all, the emission spectra of the different dyes overlap substantially. Because of the overlap, peaks corresponding to the presence of a single fluorophore dye can be detected in more than one channel. Additionally, the different dye molecules impart non-identical electrophoretic mobilities to the DNA fragments. Furthermore, as the photo-multiplier tube in the detector
18
scans the gel, data detection does not occur at the same time for the four signals. Finally, imperfections of the chemical separation method can result in substantial variations in the intensity of bands in a given reaction. Thus, a set of typical actual raw data streams is shown in FIG.
3
. The notations along the vertical axis in
FIG. 3
refer to wavelengths for the detected colors. As in
FIG. 2
, four data streams are shown with each data stream corresponding to one of the base identifiers, as indicated by the letters in parenthesis.
As illustrated by the flow chart shown in
FIG. 4
, it is known to enhance the raw data streams by a series of operations following the sampling of the DNA data in functional block
32
. First, in functional block
34
, high frequency noise is removed with a low-pass Fourier filter. Typically, each of the four data streams has a different base line level that varies slowly over time. These variations are corrected by passing the data through a high-pass Fourier filter in functional block
35
.
The data streams are corrected with respect to signal strength, or magnitude, in functional block
36
. This process is referred to a baseline adjustment. The data signal in each of the four channels is divided into a number of windows with each of the windows including approximately 30 signal peaks. The minimum signal strength is determined within each of the windows. A succession of segments is constructed connecting the consecutive minimum signal strengths. The absolute minima is determined for the consecutive segments. The minimum in each segment is then set to zero and the non-minimum points in the segment is adjusted by subtracting the difference between the absolute minimum and the minimum value for the segment. This signal strength adjustment is commonly referred to as baseline adjustment.
Next, a multicomponent analysis, or data filtering, is performed on each set of four data points, as shown in functional block
38
. The filtering determines the amount of each of the four dyes present in the detector as a function of time. After filtering, the mobility shift introduced by the dyes is corrected in functional block
40
with empirically determined correction factors. Following this, the peaks present in the data are located in functional block
42
. The application of the above series of operations to the raw data streams shown in
FIG. 3
results in processed data streams in functional block
44
where the DNA sequence is read. The processed data streams are shown in FIG.
5
. The corresponding DNA sequence is shown below the processed data streams in FIG.
5
and consists of the sequential combination of the four processed data streams A, T, G and C.
For the data processing described above, it is assumed that the transformation from raw data to filtered data is linear in order to develop the filter for removing the cross-talk. Assuming a linear transformation, the filtering step, shown in functional block
38
in
FIG. 4
, utilizes a transformation matrix, M, and involves a multi-component analysis that is embodied in the matrix M. With a multi-component analysis, the relationship between the measured signal s
j
and the actual fluorescence intensities f
j
, with j=1, 2, 3 and 4, is given by the relationship:
s
j
=

4
j
=
1
i
=
1

m
i
,
j
·
f
j
,
where m
i,j
is a constant coefficient indicating the cross talk between intensity signals i and j. Writing the above relationship in matrix form results in:
s
=M·
f
,
where
s
and
f
are vectors with four elements and M is a 4×4 matrix.
Typically, the transformation matrix M is determined by a conventional method that includes an iterative process in which known raw data streams are processed through the matrix M and the matrix

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for reducing cross-talk within DNA data does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for reducing cross-talk within DNA data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for reducing cross-talk within DNA data will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3065155

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.