Computer-aided probability base calling for arrays of...

Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Biological or biochemical

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C435S006120

Reexamination Certificate

active

06546340

ABSTRACT:

COPYRIGHT NOTICE
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the xeroxographic reproduction by anyone of the patent document or the patent disclosure in exactly the form it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
SOFTWARE APPENDIX
A Software Appendix comprising twenty one (21) sheets is included herewith.
BACKGROUND OF THE INVENTION
The present invention relates to the field of computer systems. More specifically, the present invention relates to computer systems for evaluating and comparing biological sequences.
Devices and computer systems for forming and using arrays of materials on a substrate are known. For example, PCT application WO92/10588, incorporated herein by reference for all purposes, describes techniques for sequencing or sequence checking nucleic acids and other materials. Arrays for performing these operations may be formed in arrays according to the methods of, for example, the pioneering techniques disclosed in U.S. Pat. No. 5,143,854 and U.S. Pat. No. 5,571,639, both incorporated herein by reference for all purposes.
According to one aspect of the techniques described therein, an array of nucleic acid probes is fabricated at known locations on a chip or substrate. A fluorescently labeled nucleic acid is then brought into contact with the chip and a scanner generates an image file (also called a cell file) indicating the locations where the labeled nucleic acids bound to the chip. Based upon the image file and identities of the probes at specific locations, it becomes possible to extract information such as the monomer sequence of DNA or RNA. Such systems have been used to form, for example, arrays of DNA that may be used to study and detect mutations relevant to cystic fibrosis, the P53 gene (relevant to certain cancers), HIV, and other genetic characteristics.
Innovative computer-aided techniques for base calling are disclosed in U.S. Pat. No. 5,795,716, which is incorporated by reference for all purposes. However, improved computer systems and methods are still needed to evaluate, analyze, and process the vast amount of information now used and made available by these pioneering technologies.
SUMMARY OF THE INVENTION
An improved computer-aided system for calling unknown bases in sample nucleic acid sequences from multiple nucleic acid probe intensities is disclosed. The present invention is able to call bases with extremely high accuracy (up to 98.5%). At the same time, confidence information may be provided that indicates the likelihood that the base has been called correctly. The methods of the present invention are robust and uniformly optimal regardless of the experimental conditions.
According to one aspect of the invention, a computer system is used to identify an unknown base in a sample nucleic acid sequence by the steps of: inputting a plurality of hybridization probe intensities, each of the probe intensities corresponding to a nucleic acid probe; for each of the plurality of probe intensities, determining a probability that the corresponding nucleic acid probe best hybridizes with the sample nucleic acid sequence; and calling the unknown base according to the nucleic acid probe with the highest associated probability.
According to another aspect of the invention, an unknown base in a sample nucleic acid sequence is called by a base call with the highest probability of correctly calling the unknown base. The unknown base in the sample nucleic acid sequence is identified by the steps of: inputting multiple base calls for the unknown base, each of the base calls having an associated probability which represents a confidence that the unknown base is called correctly; selecting a base call that has a highest associated probability; and calling the unknown base according to the selected base call. The multiple base calls are typically produced from multiple experiments. The multiple experiments may be performed on the same chip utilizing different parameters (e.g., nucleic acid probe length).
According to yet another aspect of the invention, an unknown base in a sample nucleic acid sequence is called according to multiple base calls that collectively have the highest probability of correctly calling the unknown base. The unknown base in the sample nucleic acid sequence is identified by the steps of: inputting multiple probabilities for each possible base for the unknown base, each of the probabilities representing a probability that the unknown base is an associated base; producing a product of probabilities for each possible base, each product being associated with a possible base; and calling the unknown base according to a base associated with a highest product. The multiple base calls are typically produced from multiple experiments. The multiple experiments may be performed on the same chip utilizing different parameters (e.g., nucleic acid probe length).
According to another aspect of the invention, both strands of a DNA molecule are analyzed to increase the accuracy of identifying an unknown base in a sample nucleic acid sequence by the steps of: inputting a first base call for the unknown base, the first base call determined from a first nucleic acid probe that is equivalent to a portion of the sample nucleic acid sequence including the unknown base; inputting a second base call for the unknown base, the second base call determined from a second nucleic acid probe that is complementary to a portion of the sample nucleic acid sequence including the unknown base; selecting one of the first or second nucleic acid probes that has a base at an interrogation position which has a high probability of producing correct base calls; and calling the unknown base according to the selected one of the first or second nucleic acid probes.


REFERENCES:
patent: 5002867 (1991-03-01), Macevicz
patent: 5143854 (1992-09-01), Pirrung et al.
patent: 5202231 (1993-04-01), Drmanac et al.
patent: 5235626 (1993-08-01), Flamholz et al.
patent: 5288514 (1994-02-01), Ellman
patent: 5365455 (1994-11-01), Tibbetts et al.
patent: 5384261 (1995-01-01), Winkler et al.
patent: 5445934 (1995-08-01), Fooder et al.
patent: 5470710 (1995-11-01), Weiss et al.
patent: 5502773 (1996-03-01), Tibbetts et al.
patent: 5503980 (1996-04-01), Cantor
patent: 5733729 (1998-03-01), Lipshutz et al.
patent: 6066454 (2000-05-01), Lipshutz et al.
patent: WO 89/10977 (1989-11-01), None
patent: WO 92/10092 (1992-06-01), None
patent: WO 92/10588 (1992-06-01), None
patent: WO 95/11995 (1995-05-01), None
Fodor et al., “Light-directed, spatially addressable parallel chemical synthesis,” 1991, Science, vol. 251, pp. 767-773.
Brown et al., “An inexpensive MSI/LSI mask making system,” 1981, Proceedings of 1981 Univ. Govt. Indus. Microelec. Symposium, pp. III-31 through III-38.
Dear et al., “A sequence assembly and editing program for efficient management of large projects,” 1991, Nucleic Acids Research, vol. 19, No. 14, pp. 3907-3911.
Drmanac et al., “Journal of biomolecular structure and dynamics,” 1991, 8(5), pp. 1085-1102.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Computer-aided probability base calling for arrays of... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Computer-aided probability base calling for arrays of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computer-aided probability base calling for arrays of... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3057992

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.