Chemistry: molecular biology and microbiology – Measuring or testing process involving enzymes or... – Involving nucleic acid
Reexamination Certificate
1999-11-10
2001-09-18
Brusca, John S. (Department: 1631)
Chemistry: molecular biology and microbiology
Measuring or testing process involving enzymes or...
Involving nucleic acid
C702S027000
Reexamination Certificate
active
06291182
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to methods and apparati using nucleic acid markers having a statistical association with a detectable trait to identify one or more genes responsible for the trait or for a predisposition for expressing the trait.
BACKGROUND OF THE INVENTION
Recent advances in genetic engineering and bioinformatics have enabled the manipulation and characterization of large portions of the human genome. While efforts to obtain the full sequence of the human genome are rapidly progressing, there are many practical uses for genetic information which can be implemented with partial knowledge of the sequence of the human genome.
As the full sequence of the human genome is assembled, the partial sequence information available can be used to identify genes responsible for detectable human traits, such as genes associated with human diseases, and to develop diagnostic tests capable of identifying individuals who express a detectable trait as the result of a specific genotype or individuals whose genotype places them at risk of developing a detectable trait at a subsequent time. Each of these applications for partial genomic sequence information is based upon the assembly of genetic and physical maps which order the known genomic sequences along the human chromosomes.
The present invention relates to methods and apparati using nucleic acid markers having a statistical association with a detectable trait to identify one or more genes responsible for the trait or for a predisposition for expressing the trait.
SUMMARY OF THE INVENTION
The present invention relates to methods and apparati for identifying one or more genes associated with a detectable phenotype. As described in more detail below, the present invention involves the use of biallelic markers, which are polymorphic nucleic acid sequences which differ from one another at a single nucleotide. The allelic frequencies of the biallelic markers are compared in nucleic acid samples derived from individuals expressing the detectable trait and individuals who do not express the detectable trait. In this manner, candidate genomic regions suspected of harboring a gene associated with the detectable trait under investigation are identified.
The existence of one or more genes associated with the detectable trait within the candidate region is confirmed by identifying more biallelic markers lying in the candidate region. A first haplotype analysis is performed for each possible combination of groups of biallelic markers within the genomic region suspected of harboring a trait-associated gene. For example, each group may comprise three biallelic markers. For each of the groups of markers, the frequency of each possible haplotype (for groups of three markers there are 8 possible haplotypes) in individuals expressing the trait and individuals who do not express the trait is estimated. For example, the haplotype frequencies may be estimated using the Expectation-Maximization method of Excoffier L and Slatkin M,
Mol. Biol. Evol.
12:921-927 (1995), the disclosure of which is incorporated herein by reference and which is described in more detail below. In some embodiments, the Expectation-Maximization method may be performed using the EM-HAPLO program (Hawley M E, Pakstis A J & Kidd K K,
Am. J. Phys. Anthropol.
18:104 (1994), the disclosure of which is incorporated herein by reference). Alternatively, the frequency of each allele of individual biallelic markers may be determined in nucleic acid samples from individuals who express the trait under investigation and control individuals who do not express the trait.
The frequencies of each of the possible haplotypes of the grouped markers (or each allele of individual markers) in individuals expressing the trait and individuals who do not express the trait are compared. For example, the frequencies may be compared by performing a chi-squared analysis. Within each group, the haplotype (or the allele of each individual marker) having the greatest association with the trait is selected. This process is repeated for each group of biallelic markers (or each allele of the individual markers) to generate a distribution of association values, which will be referred to herein as the “candidate region” distribution.
A second haplotype analysis is performed for each possible combination of groups of biallelic markers within random genomic regions. For example, each group may comprise three biallelic markers. For each of the groups of markers, the frequency of each possible haplotype (for groups of three markers there are 8 possible haplotypes) in individuals expressing the trait and individuals who do not express the trait is estimated. For example, the haplotype frequencies may be estimated using the Expectation-Maximization method of Excoffier L and Slatkin M, as described above. In some embodiments, the Expectation-Maximization method may be performed using the EM-HAPLO program as described above. Alternatively, the frequency of each allele of individual biallelic markers may be determined in nucleic acid samples from individuals who express the trait under investigation and control individuals who do not express the trait.
The frequencies of each of the possible haplotypes of the grouped markers (or each allele of individual markers) in individuals expressing the trait and individuals who do not express the trait are compared. For example, the frequencies may be compared by performing a chi-squared analysis. Within each group, the haplotype (or the allele of each individual marker) having the greatest association with the trait is selected. This process is repeated for each group of biallelic markers (or each allele of the individual markers) to generate a distribution of association values, which will be referred to herein as the “random region” distribution.
The “candidate region” distribution and the “random region” distribution of are then compared to one another to determine if there are significant differences between them. For example, the candidate region distribution and the random region distribution can be compared using either the Wilcoxon rank test (Noether, G. E. (1991) Introduction to statistics: “The nonparametric way”, Springer-Verlag, New York, Berlin, the disclosure of which is incorporated herein by reference) or the Kolmogorov-Smirnov test (Saporta, G. (1990) “Probalites, analyse des donnees et statistiques” Technip editions, Paris, the disclosure of which is incorporated herein by reference) or both the Wilcoxon rank test and the Kolmogorov-Smirnov test.
If the candidate region distribution and the random region distribution are found to be significantly different, the candidate genomic region is highly likely to contain a gene associated with the detectable trait. Accordingly, the candidate genomic region is evaluated more fully to isolate the trait-associated gene. Alternatively, if the candidate region distribution and the random region distribution are equal using the above analyses, the candidate genomic region is unlikely to contain a gene associated with the detectable trait. Accordingly, no further analysis of the candidate genomic region is performed.
The present invention solves the need for empirical assessments of the statistical significance of the association of biallelic markers with detectable traits. The present invention considers the trait being investigated as well as the populations of individuals utilized to determine the significance of the association. In particular, the present invention allows the reference points (i.e. the controls) for evaluating significance to be derived from the same populations as those used to detect the association between the biallelic markers and the trait. In addition, in some embodiments, the present invention allows all the data available for candidate genomic regions suspected of harboring a gene associated with a detectable trait to be utilized in the determination of whether the candidate region does in fact harbor such a gene. Accordingly, the present invention avoids the risk of failin
Blumenfeld Marta
Cohen Daniel
Cohen-Akenine Annick
Essioux Laurent
Schork Nicholas J.
Brusca John S.
GENSET
Kim Young
Knobbe Martens Olson & Bear, L.L.P.
LandOfFree
Methods, software and apparati for identifying genomic... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods, software and apparati for identifying genomic..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods, software and apparati for identifying genomic... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2444505