Expert system for analysis of DNA sequencing electropherograms

Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Biological or biochemical

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C435S006120, C435S091100, C435S287100, C702S020000

Reexamination Certificate

active

06442491

ABSTRACT:

BACKGROUND OF THE INVENTION
The volume of data now produced by automated DNA sequencing instruments has made fully automatic data processing necessary. The raw data from these instruments is a signal produced by a sequence of electrophoretically separated DNA fragments labeled with reporter groups, typically but not always with various fluorescent dyes. Data processing entails detecting the fluorescence peaks for each fragment, determining which dyes they correspond to, and constructing a DNA base sequence corresponding to the determined fragments. This overall procedure is known as base-calling. Base-calling software must produce very accurate sequences and supply numerical confidence estimates on the bases, to preclude expensive and time-consuming editing of the resulting sequence by technicians.
Approaches to the base-calling problem include neural networks [Tibbetts et al., 1994; U.S. Pat. No. 5,365,455 & U.S. Pat. No. 5,502,773], graph theory [Berno, 1996], homomorphic deconvolution [Ives et al., 1994; U.S. Pat. No. 5,273,632], modular (“object oriented”) feature detection and evaluation [Giddings et al., 1993 & 1998], classification schemes [Li and Yeung, 1995; WO 96/36872 & others], correlation analysis [Daly, 1996], and Fourier analysis followed by dynamic programming [Ewing et al., 1998]. Additional related patents describe base-calling by blind deconvolution combined with fuzzy logic [Marks, WO 98/11258], by comparison to a calibration set of two-base prototypes in high dimensional “configuration space” [CuraGen, WO 96/35810], and by comparison to singleton peak models [Visible Genetics, WO 98/00708]. There are also several reports specifically related to confidence estimates [Lipshutz et al., 1994; Lawrence and Solovyev, 1994; Ewing and Green, 1998].
The neural network approach (Tibbetts) only functions well when the input data are very similar to the training set. This requires retraining for each type of instrument, dye chemistry, and set of separation conditions. It is difficult or impossible to make small changes to, or to extend for other types of datasets, the output of a particular training session. Furthermore, the types of neural networks whose internal operations in obtaining a particular result can be readily explained are the least capable class of neural network.
The graph-theoretic approach (Berno) relies on effective deconvolution by a crude peak-sharpening filter. This produces a lot of noise peaks, which the method attempts to winnow out based on poor height and spacing. The filter is fast but does not result in a high-quality deconvolution, and the winnowing procedure is inflexible.
The homomorphic deconvolution (Ives) uses blind deconvolution to enhance information on peak location. However, the subsequent peak detection and base assignments are overly simplistic.
An object-oriented method (Giddings) tries to adopt a flexible, modular program design, in which each piece is as independent as possible from the rest of the program. Preprocessing is done in many independent steps by different user-configurable tools. Subsequent base-calling is done by combining independent confidences on quality of peak spacing, peak height, and peak width. Considerable time must be spent by the user to configure the modules for a particular type of data. Moreover, the base-calling module is relatively unsophisticated. More abstractly, some tasks may be intrinsically dependent on each other, creating problems when the tasks are separated into independent modules. The most recent implementation uses deconvolution to increase accuracy, but this greatly increases execution time and can create artifact peaks, and it also requires finely tuned digital filtering.
The classification of channel amplitude ratios at peak positions (Li & Yeung) is restricted to relatively high peak resolution and high signal-to-noise ratios.
The method of Fourier analysis followed by dynamic programming (Ewing) exploits the regularity of peak spacing in properly preprocessed data. Base-calling matches observed peaks to predicted base positions. The method relies heavily on optimized preprocessing (color separation, noise removal, background subtraction, amplitude normalization, and peak repositioning), and poorly predicts base positions at low peak resolution. It is relatively inflexible and difficult to extend or adapt to changes in data characteristics; e.g., data resulting from a new protocol that gives more variable peak spacing.
The fuzzy logic approach (Marks) as described requires prior deconvolution. Furthermore, the inference system is limited in the complexity of the rules that can be incorporated, especially if they must be optimized.
The use of two-base prototypes (CuraGen) suffers from problems similar to the neural network method.
The use of singleton peak models (Visible Genetics) does not provide for complex relations between peaks and base-calls.
An expert system simulates the reasoning of human experts in a particular problem domain. Expert systems are most often useful for applications in which human experts perform well and can describe their reasoning in detail. The expert system consists primarily of a set of if-then rules, sometimes called productions, and a mechanism to reason with them, usually called an inference engine [Stefik, 1995; Durkin, 1994; Jackson, 1990]. The firing of a rule causes an action to be taken; e.g., adding to working memory the knowledge that a particular peak in the fluorescence signal has a certain width or contains a particular number of bases.
The pervasive limitations in prior art for base-calling are the lack of integration among subtasks, and the relative absence of flexibility and sophistication in the methods that assign bases to peaks. The principal benefits of a production system over prior art are in the ability to produce very high integration and complex, sophisticated program logic in a form that is easy for people to understand and extend. This is because the rules can be stated in natural language (e.g., English), and because greater generality, flexibility, and accuracy can be obtained simply by adding new rules or modifying existing ones. The inference engine can then combine the rules to produce a degree of integration, sophistication, and thoroughness that is hard to reproduce by an orthodox procedural software approach.
BRIEF SUMMARY OF THE INVENTION
A method of analyzing DNA fragments separated electrophoretically is presented. The method includes the use of an expert system that interprets raw or preprocessed signal from the separation. The expert system can be used for real-time base-calling, or applied offline after data acquisition is complete. The expert system is directly applicable to all types of electrophoretic separation used for DNA sequencing, i.e. slab gel, capillary and microchip. Each lane of a multiplex system can consist of 1 to 4 (or even more) different fragment labels. The expert system may also be used with other base-coding schemes, such as those in which more than one base is labeled with a given dye, but the amount of label is different for each base [Kheterpal et al., 1998]. When the presently disclosed method is applied to DNA sequencing, the resulting interpretation comprises a DNA base sequence with numerical confidences assigned to each base. By use of the presently disclosed method the degree of automation of data processing in high-throughput DNA sequencing is improved, as is the quality of the results.


REFERENCES:
patent: 5374527 (1994-12-01), Grossman
patent: 5993634 (1999-11-01), Simpson et al.
Simultaneous Monitoring of DNA Fragments Separated by Electrophoresis in a Multiplexed Array of 100 Capillaries, Kyojl Ueno et al., Anal. Chem. 1994, 66, 1424-1431.
DNA Sequence Confidence Estimation, Robert J. Lipschultz et al., Genomics 19, 417-424 (1994).
PRIMO: A Primer Design Program That Applies Base Quality Statistics for Automated Large-Scale DNA Sequencing, Ping L

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Expert system for analysis of DNA sequencing electropherograms does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Expert system for analysis of DNA sequencing electropherograms, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Expert system for analysis of DNA sequencing electropherograms will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2949978

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.