Determining protein function and interaction from genome...

Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Biological or biochemical

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C435S006120, C435S007100, C530S350000, C702S020000

Reexamination Certificate

active

06772069

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to methods and system for predicting the function of proteins. In particular, the invention relates to materials, software, automated system, and methods for implementing the same in order to predict the function(s) of a protein.
BACKGROUND OF THE INVENTION
A central core of modern biology is that genetic information resides in a nucleic acid genome. and that the information embodied in such a genome (i.e., the genotype) directs cell function. This occurs through the expression of various genes in the genome of an organism and regulation of the expression of such genes. The expression of genes in a cell or organism defines the cell or organism's physical characteristics (i.e., its phenotype). This is accomplished through the translation of genes into proteins.
Proteins (or polypeptides) are linear polymers of amino acids. The polymerization reaction, which produces a protein, results in the loss of one molecule of water from each amino acid, and hence proteins are often said to be composed of amino acid “residues.” Natural protein molecules may contain as many as 20 different types of amino acid residues, each of which contains a distinctive side chain. The particular linear sequence of amino acid residues in a protein defines the primary sequence, or primary structure, of the protein. The primary structure of a protein can be determined with relative ease using known methods.
In order to more fully understand and determine potential therapeutics, antibiotic and biologics for various organisms, efforts have been taken to sequence the genomes of a number of organisms. For example the Human Genome Project began with the specific goal of obtaining the complete sequence of the human genome and determining the biochemical function(s) of each gene. To date, the project has resulted in sequencing a substantial portion of the human genome (J. Roach, on the website of the University of Washington (Gibbs, 1995)). At least twenty-one other genornes have already been sequenced, including, for example,
M. genitalium
(Fraser et al., 1995).
M. jannaschii
(Bult et al., 1996),
H. influenzae
(Fleischmann et al., 1995),
E. coli
(Blattncr et al., 1997), and yeast (
S. cerevisiac
) (Mewes et al., 1997). Significant progress has also been made in sequencing the genomes of model organism, such as mouse,
C. elegans
, Arabadopsis sp. and
D. melanogaster
. Several databases containing genomic information annotated with some functional information are maintained by different organization, and are accessible via the internet, for example, the websites of the Institute for Genomic Research; the University of Wisconsin Laboratory for Genetics; Stanford Universily's Dept. of Genetics; the Los Alamos National Laboratories HIV databases; the National Center for Biotechnology Institution; the European Bioinformatics Institute; the Institut Pasteur Bio Netbook; and the Whitehead Institute/MIT Center for Genome Research. The raw nucleic acid sequences in a genome can be converted by one of a number of available algorithms to the amino acid sequences of proteins, which carry out the vast array of processes in a cell. Unfortunately, these raw protein sequence data do not immediately describe how the proteins function in the cell. Understanding the details of various cellular processes (e.g., metabolic pathways, signaling between molecules, cell division, etc.) and which proteins carry out which processes, is a central goal in modern cell biology.
Throughout evolution, the protein sequences in different organisms have been conserved to varying degrees. As a result, any given organism contains many proteins that are recognizably similar to proteins in other organisms. Such similar proteins, having arisen from the same ancestral protein, are called homologs.
To a degree homology between proteins is useful in assigning biological functions to new protein sequences. The most direct approach for assigning functions to proteins is by laborious laboratory experimentation. However, if a particular uncharacterized protein sequence is homologous to one that has already been studied experimentally, often the function of the former can be equated to the function of the latter.
Unfortunately, the ability to assign functions to proteins by homology is limited. Many protein sequences do not have experimentally characterized homologs in other organisms. Depending on the organism, between one-third and one-half of the proteins in a genomne cannot be assigned functions by homology or other available computational methods. Accordingly, new methods for predicting the functions of proteins from genome sequences are needed.
SUMMARY OF THE INVENTION
Determining protein functions from genomic sequences is a central goal of bioinformatics. Genomic sequences do not contain explicit information on the function of the proteins that they encode, yet this information is critical in medical and agricultural biotechnology. The invention provides materials, software, automated system, and methods that are useful for predicting protein function. Such information is useful, for example, for identifying new genes and identifying potential targets for pharmaceutical compounds.
In one embodiment, the invention provides a method to predict functional links (e.g., associations between proteins) based on the concept that proteins that function together in a pathway or structural complex can often be found in another organism fused together into a single protein. By identifying these patterns of relationship or gene fusion one can predict the interactions between unknown proteins based on the similar sequence information found in other related proteins (i.e., either functionally related or physically related). Through sequence comparison, one can identifv a fused protein, termed herein the “Rosetta Stone” protein, which is similar over different regions to two distinct proteins that are not similar to each other. This establishes a functional link between two otherwise unrelated proteins. The inventors have discovered that proteins that can be associated together via the Rosetta Stone protein tend strongly to be functionally linked.
In another embodiment, the invention provides a computational method that detects proteins that participate in a common structural complex or metabolic pathway. Proteins within these groups are defined as “functionally-linked.” Functionally-linked proteins evolve in a correlated fashion, and therefore they have homologs in the same subset of organisms. For instance, it is expected that flagellar proteins will be found in bacteria that possess flagella but not in other organisms. Simply put, if two proteins have homologs in the same subset of fully (or nearly fully) sequenced organisms but are absent in other organisms they are likely to be functionally-linked. The present invention provides a method wherein this property is used to systematically map functional interactions between all the proteins coded by a genome. This method overcomes the problems wherein pairs of functionally linked proteins in general have no amino acid sequence similarity with each other and therefore cannot be linked by conventional sequence alignment techniques.
One embodiment provides a method of identifying multiple polypeptides as functionally-linked, the method including aligning a primary amino acid sequence of multiple distinct non-homologous polypeptides to the primary amino acid sequences of a plurality of proteins; and for any alignment found between the primary amino acid sequences of all of such multiple distinct non-homologous polypeptides and the primary amino acid sequence of at least one such protein, outputting an indication identifying the at least one such protein as an indication of a functional link between the multiple polypeptides.
In another embodiment, a computer program is provided for identifying a protein as functionally linked, the computer program comprising instructions for causing a computer system to align a primary amino acid sequence of multiple

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Determining protein function and interaction from genome... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Determining protein function and interaction from genome..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Determining protein function and interaction from genome... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3305516

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.