Polymorphic repeats in human genes

Chemistry: molecular biology and microbiology – Measuring or testing process involving enzymes or... – Involving nucleic acid

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C536S023100

Reexamination Certificate

active

06472154

ABSTRACT:

INTRODUCTION
FIELD OF THE INVENTION
The field of the invention is computational genetic analysis as applied to identifying and detecting polymorphic repeats in human genes.
BACKGROUND OF THE INVENTION
The exponential rate of accumulation of genomic sequence data in public databases is making it possible to eventually understand, treat and eliminate potentially thousands of genetic diseases, predispositions and adverse drug/treatment reactions. Causes and contributing factors for hundreds of afflictions have been mapped to thousands of specific mutations, some being simple sequence repeat polymorphisms and some single nucleotide polymorphisms (SNPs). Although the now-routine process for identifying these mutations is long and expensive, the medical and commercial value of these discoveries has prompted numerous public and private institutions to apply massive resources to the identification of new potentially causative and correlative genetic variations. These large-scale SNP-centric projects currently underway generally involve the random and therefore low-efficiency re-sequencing of DNAs from several individuals (sometimes panels of patients with a particular affliction of interest) followed by attempts to relate a particular genotype and phenotype. In addition, the inherent sequencing error rate is generally higher than the incidence of most sequence variations, resulting in a very high false-positive rate. This effect is compounded by the well-established population genetic principle that the higher the impact of an allele, the more deleterious it is, the more rare it will be, and hence the less likely it is to be found by these methods. Despite these shortcomings, the value of knowing which variations are present in the population (whatever their frequency) and are linked to phenotypes is such that many companies and institutions are racing in an effort to discover these valuable alleles.
The positional cloning of many disease loci has been facilitated by high-resolution genetic maps. The precise localization of the DNA sequence responsible for a disease usually requires the development of very high density physical and genetic maps. The availability of multiple polymorphic genetic markers is crucial to this effort. Current widely-used methods for the identification of new simple sequence repeat polymorphisms involve PCR based and subcloning strategies, but the large quantities of human genomic sequence being generated by the human genome project are rapidly making these approaches obsolete. The knowledge of the frequency and level of polymorphism of the various types and sizes of microsatellites and variable number tandem repeats (VNTRs) allows one to predict, a priori, which tandem repeats are likely to be highly polymorphic from a single genomic sequence. For microsatellites, the level of heterozygosity has been observed to be directly proportional to the number of repeated units and inversely proportional to the size of the repeated unit.
While there currently exist several software applications for locating some microsatellites or larger tandem repeats, a comprehensive tool for the identification of, and generation of primer sequences for, those repeats correlated with a high probability of polymorphism has been lacking. Because of this need, we wrote software that takes as input human genomic sequence data and will output a list of oligonucleotide sequences that may be used as primers for PCR amplification of those tandem repeat sequences that are predicted to be highly polymorphic based on observations from the literature. A computational system for the prediction of polymorphic loci directly and efficiently from human genomic sequence was developed and verified. A suite of programs, collectively called POMPOUS (
PO
lymorphic
M
arker
P
rediction
O
f
U
biquitous
S
imple sequences) detects tandem repeats ranging from dinucleotides up to 250-mers, scores them according to predicted level of polymorphism, and designs appropriate flanking prime its for PCR amplification
1
.
Scheme 1. POMPOUS is a series of programs whose execution is directed by a Perl script. Together, TandMin and TandMax identify di-nucleotide to 250-mer repeats, which are then consolidated by the Perl script, and then oligonucleotide primers are designed using our code PRIMO for those repeats whose characteristics are above thresholds set to select only for those repeats that are highly probable to vary.
An association between certain repeating microsatellite elements and polymorphism has been reported
17
, caused by the expansion and contraction of the core repetitive unit by what is believed to be the mechanism of either slipped-strand mispairing
25
, uneven recombination or some combination of both
15
. In fact, several inherited neurological disorders have been linked to changes in the copy numbers of certain tri-nucleotide repeats. The repetitive element in some diseases lies directly in the coding sequence, such as Machado-Joseph
8
(CAG repeat), Haw River Syndrome
7
(CAG), Huntington's Disease
3.19
(CAG) and Fragile-X Syndrome
4
(CGG). However, the location of other polymorphic repetitive elements vary with respect to the coding sequence as in Fredreich's Ataxis
5
(GAA, intron), Myotonic Dystrophy
6
(CAG, 3′UTR), and a gene suspected to be linked to Hyperandrogenaemia for which the repeat occurs in the 5′ UTR
12
. Triplet repeat expansion diseases (TREDs) also include spinal and bulbar muscular atrophy (SBMA) and fragile X syndrome (FRAXA). Short cytosine-adenine-guanine (CAG) expansions are characteristic for spinal and bulbar muscular atrophy (SBMA), dentatorubral-pallidoluysian atrophy (DRPLA) and spinocerebellar ataxia (SCA) type 1, 2, 3, 6 and 7 (Nilssen O., Tidsskrift for Den Norske Laegeforening. 119(20):3021-7, Aug. 30, 1999). Besides studies of tri-nucleotide repeats associated with neurological disorders
3-8,16,24,36
, work on repeats has focused on specific genes of interest containing predominant repeat units having a known association with polymorphism such as CAG or CCG
13
.
SUMMARY OF THE INVENTION
The invention provides methods and compositions for identifying polymorphic repeats in genes. In one embodiment, the invention provides methods for identifying a candidate polymorphic repeat within a coding sequence. This embodiment involves scanning a coding sequence for multiple, different candidate polymorphic repeats and generally involves (a) detecting tandem repeats in a target coding sequence; (b) scoring the repeats for polymorphic probability; and (c) generating a dataset correlating the repeats with polymorphic probability. The coding sequence is a transcribed sequence, includes a CDS region and 3′ and 5′ untranslated regions (UTRs) and may be derived from a single coding sequence, a concatamerized coding sequence, a coding sequence library (e.g. cDNA library), etc. The coding sequence may be derived physically from transcribed polynucleotide or in silico from genomic sequence or other sequence comprising coding sequence. The detecting, scoring and generating steps are preferably implemented by a computer program, preferably wherein the scoring step comprises determining at least the type, number and purity of the repeats. In a particular embodiment the program is the Rep-X algorithm.
In another embodiment, the invention provides methods for identifying an actual polymorphic repeat by validating a computationally identified candidate polymorphic repeat in populations of natural genes. This embodiment generally involves (a) computationally identifying a candidate polymorphic repeat; (b) detecting the candidate polymorphic repeat in each of a population of different coding sequences, each from different individuals; and (c) determining whether the candidate polymorphic repeat is polymorphic in the population.
In yet another embodiment, the invention correlates validated, computationally derived polymorphic repeats with phenotypic variations, which may be manifested or prospective, i.e. present as a predisposition. These correlates a

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Polymorphic repeats in human genes does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Polymorphic repeats in human genes, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Polymorphic repeats in human genes will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2987047

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.