Methods for using functional site descriptors and predicting...

Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Biological or biochemical

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C702S027000, C435S004000, C436S086000

Reexamination Certificate

active

06631332

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention concerns methods and systems for predicting the function of proteins. In particular, the invention relates to materials, software, automated systems, and methods for implementing the same in order to predict the function(s) of a protein. Protein function prediction includes the use of functional site descriptors for a particular protein function.
2. Background of the Invention
The following description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art to the presently claimed invention, nor that any of the publications specifically or implicitly referenced are prior art to that invention.
A central tenet of modern biology is that heritable genetic information resides in a nucleic acid genome, and that the information embodied in such nucleic acids directs cell function. This occurs through the expression of various genes in the genome of an organism and regulation of the expression of such genes. The pattern of which subset of genes in an organism is expressed at a particular time in a particular cell defines the phenotype, and ultimately cell and tissue types. While the least genetically complex organisms, i.e., viruses, contain on the order of 10-50 genes and require components supplied by a cell of another organism in order to reproduce, the genomes of independent, living organisms (i.e., those having a genome that encodes for all the information required for the organism to survive and reproduce) that are the least genetically complex have more than 400 genes (for example,
Mycoplasma genitalium
). More complex, multicellular organisms (e.g., mice or humans) contain genomes believed to be comprised of tens of thousands or more genes, each of which codes for one or more different expression products.
Most organismal genomes are comprised of double-stranded DNA. Each strand of the genomic DNA is comprised of a long polymer of the four deoxyribonucleotide bases A (adenine), T (thymine), G (guanine), and C (cytosine). Double-stranded DNA is formed by the anti-parallel, non-covalent association between two DNA strands. This association is mediated by hydrogen bonding between nucleotide bases, with specific, complementary pairing of A with T and G with C. Each gene in the genomic DNA is expressed by transcription, wherein a single-stranded RNA copy of the gene is transcribed from the double-stranded DNA. The transcribed strand of RNA is complementary to the coding strand of the DNA. RNA is composed of ribonucleotide (rather than deoxyribonucleotide) bases, three of which are similar to those found in DNA: A, G, and C. The fourth RNA ribonucleotide base, uracil (U), substitutes for T found in DNA and is complementary to the A base. Following transcription, the RNAs transcribed from many genes are translated into polypeptides. The particular sequence of the nucleotide bases normally determines what protein, and hence what function(s), a particular gene encodes.
Some genes are transcribed, but not translated; thus, the final gene products of these genes are RNA molecules (for example, ribosomal RNAs, small nuclear PNAs, transfer RNAs, and ribozymes (i.e., RNA molecules having endoribonuclease catalytic activity). However, most RNAs serve as messengers (mRNAs), and these are translated into polypeptides. The particular sequence of the ribonucleotides incorporated into an RNA as it is synthesized is dictated by the gene found in the genomic DNA from which it was transcribed. In the translation of an mRNA, the particular nucleotide sequence determines the particular amino acid sequence of the polypeptide translated therefrom. Briefly, in a coding region of an mRNA (and in its corresponding gene), each nucleotide triplet, or “codon” (of which there are 4
3
, or 64, possibilities) codes for one amino acid, except that three codons code for no amino acids (each being a “stop” translation codon). Thus, the sequence of codons (dictated by the nucleotide sequence of the corresponding gene) specifies the amino acid sequence of a particular protein, and it is the amino acid sequence that ultimately determines the three-dimensional structure of the protein. Significantly, three-dimensional structure dictates the particular biological function(s) of any biomolecule, including proteins.
The elegant simplicity of the foregoing schema is obscured by the complexity and size of the genomes found in living systems. For example, the haploid human genome comprises about 3×10
9
(three billion) nucleotides spread across 23 chromosomes. However, it is currently estimated that less than 5% of this encodes the approximately 80,000-100,000 different protein-coding genes believed to be encoded by the human genome. Because of its tremendous size, to date only a portion of the human genome has been sequenced and deposited in genome sequence databases, and the positions of many genes and their exact nucleotide sequences remain unknown. Moreover, the biological function(s) of the gene products encoded by many of the genes sequenced so far remain unknown. Similar situations exist with respect to the genomes of many other organisms.
Notwithstanding such complexities, numerous genome sequence efforts designed to determine the exact sequence of the nucleotides found in genomic DNA of various organisms are underway and significant progress has been made. For example, the Human Genome Project began with the specific goal of obtaining the complete sequence of the human genome and determining the biochemical function(s) of each gene. To date, the project has resulted in sequencing a substantial portion of the human genome, and is on track for its scheduled completion in the near future. At least twenty-one other genomes have already been sequenced, including, for example,
M. genitalium, M. jannaschii, H. influenzae, E. coli,
and yeast (
S. cerevisiae
). Significant progress has also been made in sequencing the genomes of model organisms, such as mouse,
C. elegans,
and
D. melanogaster.
Several databases containing genomic information annotated with some functional information are maintained by different organizations, and are accessible via the internet.
Such sequencing projects result in vast amounts of nucleotide sequence information, which is typically deposited in genome sequence databases. However, these raw data (much of it being known only at the cDNA level), being devoid of corresponding information about genes and protein structure or function, are in and of themselves of extremely limited use (Koonin, et al. (1998), Curr. Opin. Struct. Biol., vol. 8:355-363). Thus, the practical exploitation of the vast numbers of sequences in such genome sequence databases is crucially dependent on the ability to identify genes and, for example, the function(s) of gene-encoded proteins.
To maximize the utility of such nucleotide sequence information, it must be interpreted. For example, it is important to understand where each sequence is located in the genome, and what biological function(s), if any, the sequence encodes, i.e., what is the purpose of the sequence or, if transcribed (or transcribed and translated), the resulting product, in a biological system? For example, is the sequence a regulatory region or, if it is transcribed (or transcribed and translated), does the gene product bind to another molecule, regulate a cellular process, or catalyze a chemical reaction?
To answer these questions, significant effort has been directed towards understanding or describing the biological function(s) coded for in each nucleotide sequence. Predicting the function(s) of biomolecules encoded by genes, particularly proteins, is most often done by sequence comparison to known structures. The basis of this approach is the commonly accepted notion that similar sequences must have a common ancestor, and would therefore have similar structures and related functions. Accordingly, algorithms have been developed to analyze what a par

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Methods for using functional site descriptors and predicting... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Methods for using functional site descriptors and predicting..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods for using functional site descriptors and predicting... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3173641

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.