Chemistry: molecular biology and microbiology – Measuring or testing process involving enzymes or... – Involving nucleic acid
Reexamination Certificate
2000-12-13
2004-02-10
Martinell, James (Department: 1631)
Chemistry: molecular biology and microbiology
Measuring or testing process involving enzymes or...
Involving nucleic acid
Reexamination Certificate
active
06689563
ABSTRACT:
FIELD
The invention pertains to methods for determining the order of a set of subsequences, and more particularly, a method for determining the sequence of a series of nucleic acids by ordering a collection of probes.
BACKGROUND OF THE INVENTION
The ability to determine nucleic acid sequences is critical for understanding the function and control of genes and for applying many of the basic techniques of molecular biology. Sequencing the human genome and other model organisms was first made possible by the inventions of Sanger et. al.
PNAS
74: 5463-5467 (1977) and Maxam et. al.
PNAS
74: 560-564 (1977). The Sanger method has seen great advances including automation, but still only 300 to 500 bases can be sequenced under optimum conditions.
Sequencing by hybridization (SBH) is a new and promising approach to DNA sequencing which offers the potential of reduced cost and higher throughput over traditional gel-based approaches. Strezoska, et.al.
PNAS USA
88: 10089-10093 (1991) first accurately sequenced 100 base pairs of a known sequence using hybridization techniques, although the approach was proposed independently by several groups, including Bains and Smith,
Journal of Theoretical Biology
135:303-307 (1988); Drmanac and Crkvenjakov U.S. Pat. No. 5,202,231; Fodor et. al. U.S. Pat. No. 5,424,186; Lysov, et al.
Dokl. Acad. Sci. USSR
303: 1508- (1988); Macevicz, U.S. Pat. No. 5,002,867; and Southern, European Patent EP 0 373 203 B 1 and IPN WO 93/22480. More recently, Crkvenjakov's and Drmanac's laboratories report sequencing a 340 base-pair fragment in a blind experiment (Pevzner and Lipshutz, 19th Int. Conf. Mathematical Foundations of Computer Science, Springer-Verlag LNCS 841 143-158 (1994)). All of the above articles and patents are incorporated herein in their entirety.
The classical sequencing by hybridization (SBH) procedure attaches a large set of single-stranded fragments or probes to a substrate, forming a sequencing chip. A solution of labeled single-stranded target DNA fragments are exposed to the chip. These fragments hybridize with complementary fragments on the chip, and the hybridized fragments can be identified using a nuclear detector or a fluorescent/phosphorescent dye, depending on the selected label. Each hybridization or the lack thereof determines whether the string represented by the fragment is or is not a substring of the target. The target DNA can now be sequenced based on the constraints of which strings are and are not substrings of the target. The surveys Pevzner and Lipshutz, 19th Int. Conf. Mathematical Foundations of Computer Science, Springer-Verlag LNCS 841 143-158 (1994) and Chetverin and Kramer
Bio/Technology
12: 1093-1099 (1994) give an excellent overview of the current state of the art in sequencing by hybridization, biologically, technologically, and algorithmically.
Sequencing by hybridization is a useful technique for general sequencing, and for rapidly sequencing variants of previously sequenced molecules. Furthermore, hybridization can provide an inexpensive procedure to confirm sequences derived using other methods.
The most widely used sequencing chip design, the classical sequencing chip C(k), contains all 4
k
single-stranded oligonucleotides of length k. In C(8) all 4
8
=65,536 octamers are used. The classical chip C(8) suffices to reconstruct 200 nucleotide-long sequences in only 94 of 100 cases (Pevzner, et.al.
J. Biomolecular Structure and Dynamics
9: 399-410 (1991)), even in error-free experiments. Unfortunately, the length of unambiguously reconstructible sequences grows slower than the area of the chip. Thus, such exponential growth of the area inherently limits the length of the longest reconstructible sequence by classical SBH, and the chip area required by any single, fixed sequencing array on moderate length sequences will overwhelm the economies of scale and parallelism implicit in performing thousands of hybridization experiments simultaneously when using classical SBH methods.
Other variants of SBH (including nested-strand SBH (Rubinov and Gelfand
J. Computational Biology
(1995) and positional SBH (Broude, Sano, Smith and Cantor,
PNAS
(1994)) have been proposed to increase the resolving power of classical SBH, but these methods still require large arrays to sequence relatively few nucleotides.
The algorithmic aspect of sequencing by hybridization arises in the reconstruction of the test sequence from the hybridization data. The outcome of an experiment with a classical sequencing chip C(k) assigns to each of the 4
k
strings a probability that it is a substring of the test sequence. In an experiment without error, these probabilities will all be 0 or 1, so each k-nucleotide fragment of the test sequence is unambiguously identified.
Although efficient algorithms do exist for finding the shortest string consistent with the results of a classical sequencing chip experiment, these algorithms have not proven useful in practice because previous SBH methods do not return sufficient information to sequence long fragments. One particular obstacle inherent in this method is the inability to accurately position repetitive sequences in DNA fragments. Furthermore, this method cannot determine the length of tandem short repeats, which are associated with several human genetic diseases (Warren S T,
Science
1996; 271:1374-1375). These limitations have prevented its use as a primary sequencing method.
Additionally, sequencing by hybridization has so far failed to perform near the theoretical maximum efficiency. For example, the classical probing scheme uses a complete set of all 4
k
-nucleotide probes, wherein k is the length of each probe sequence. The set of hybridized probes is then used to construct a directed graph, either a Hamiltonian path or its equivalent Eulerian path. Probabilistic analysis and empirical evidence confirmed that using this method, k-nucleotide probes were adequate to reliably reconstruct sequences of length proportional only to the square root of 4
k
, rather than to 4
k
, as information theory predicts. Improvements to this algorithm (e.g., Skiena, U.S. Pat. No. 5,683,881, incorporated herein by reference) have been reported, but the maximum efficiency has been elusive.
A more efficient strategy for sequencing genes by hybridization would be a tremendous boon to the biotechnology industry. For example, the tremendous potential utility of genomic sequencing projects is directly restrained by the speed of the sequencing process itself. Methods which increase the speed and efficiency of DNA sequencing proportionally increase the speed at which such projects can unlock the secrets of evolution and molecular biology.
SUMMARY OF THE INVENTION
The systems and methods described herein relate to the sequencing of nucleotide sequences using probes comprising a pattern of universal and designate nucleotides. Such probes are referred to herein as ‘gapped probes’ to reflect the sequence gaps created by the universal nucleotides. A universal nucleotide, as the term is used herein, describes a chemical entity which, when present in the probe, will engage in a base-pairing relationship with any natural nucleotide. Exemplary universal nucleotides include 5-nitroindole and 3-nitropyrrole, although other universal nucleotides useful for the systems and methods described herein will be known to those of skill in the art. A universal nucleotide is represented herein as U, and a designate nucleotide, e.g., A, C, G, or T, is represented as X.
Although the pattern may comprise any sequence of designate and universal nucleotides, in certain systems, the pattern is an iterative pattern, i.e., a pattern which alternates a predetermined number of universal nucleotides with a predetermined number of designate nucleotides. Exemplary gapped probes may be defined in terms of the two variables and r, wherein s represents the number of nucleotides in a designate nucleotide sequence of the probe, and r represents the number of iterations in the pattern, each iteration of length s and comprising a string of
Preparata Franco P.
Upfal Eliezer
Brown University Research Foundation
Martinell James
Quisel John D.
Ropes & Gray LLP
Vincent Matthew P.
LandOfFree
System and methods for sequencing by hybridization does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and methods for sequencing by hybridization, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and methods for sequencing by hybridization will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3298152