Method for assessing significance of protein identification

Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Biological or biochemical

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06446010

ABSTRACT:

BACKGROUND
An unknown biological molecule can be identified by comparing the mass data of the unknown biological molecule with mass data of known biological molecules.
For example, the rapid growth of available high quality DNA sequence data has made mass spectrometry (MS) combined with genome database searching a popular and potentially accurate method to identify proteins. Protein identification by mass spectrometry has proven to be a powerful tool to elucidate biological function and to find the composition of protein complexes and entire organelles.
In protein identification experiments, proteins are typically separated by gel electrophoresis, subjected to a protease having high digestion specificity (e.g. trypsin) and the resulting mixture of peptides is extracted from the gel and subjected to MS-analysis (1998). The distribution of proteolytic peptide masses (peptide map) is compared with theoretical proteolytical peptide masses calculated for each protein stored in a protein/DNA sequence database.
There are various algorithms that attempt to identify an unknown protein by determining the database protein which has a peptide map with the highest degree of similarity to the experimentally obtained peptide map of the unknown protein. These algorithms yield the protein identified and an identification score. Due to imperfections in the protein separation and to incomplete extraction of the proteolytic peptides from the gel, the peptide map is typically incomplete with respect to the protein identified, and also contains a background of proteolytic peptide masses from one or several other proteins. Even if separation and extraction were perfect, posttranslational modifications of proteins would cause a proteolytic peptide mass distribution to be different from that predicted by the genome. Mass spectrometry determines a peptide mass m
i
to an accuracy ±&Dgr;m
i
, with &Dgr;m
i
/m
i
typically >30 ppm. Within the mass range m
i
±&Dgr;m
i
, proteolytic peptide masses of several proteins in the genome can match. For these reasons, a database search using the information in a peptide map will not always identify a protein unambiguously.
Despite the momentum mass spectrometric protein identification has given to protein research, the problem of objectively assessing the significance of a protein identification result has been overlooked. As increasingly complex biological problems are explored, knowledge of the significance of each protein identification result is likely to become critical.
The object of the present invention is to provide a method for assessing the significance of a biological molecule identification.
SUMMARY OF THE INVENTION
This and other objects, as will be apparent to those having ordinary skill in the art, have been met by providing a method of determining the statistical significance of a biological molecule identification score. The method comprises a) selecting a significance level that represents a level of confidence in a biological molecule identification b) calculating a score associated with an unknown biological molecule, wherein the score is a function of similarity between mass data of the unknown biological molecule and mass data generated for known biological molecules of a biological molecule database; c) comparing the score with a score frequency distribution, wherein the distribution is generated by comparing mass data of a hypothetical biological molecule with mass data generated for known biological molecules of a biological molecule database, and wherein the frequency distribution has associated therewith the significance level; and d) determining whether the score associated with the unknown biological molecule identification is within the significance level.
The invention further provides a method of generating a frequency distribution of scores for a particular experimental condition, wherein the scores relate to random identifications of biological molecules. The method comprises a) generating mass data for the particular experimental condition for known biological molecules in a biological molecule database; b) generating mass data of a hypothetical biological molecule for the experimental condition; c) comparing the data generated in step (b) with the data generated for each known biological molecule in step (a); d) calculating a score for each comparison in step (c), wherein the score is a function of similarity between the data generated in step (a) which corresponds to a particular known biological molecule and the data generated in step (b); e) selecting a score from the scores calculated in step (d), wherein the selected score corresponds to the comparison which denotes a high degree of similarity between the data generated in step (a) and the data generated in step (b); f) repeating steps (b) through (e) with different hypothetical biological molecules until a sufficient quantity of scores are selected; and g) determining the frequency of selecting each score and generating therefrom a frequency distribution of scores.
The invention provides another method of generating a frequency distribution of scores for a particular experimental condition, wherein the scores relate to random identifications of biological molecules. The method comprises a) generating mass data to for the particular experimental condition for known biological molecules in a biological molecule database; b) randomly selecting a biological molecule from the database; c) comparing the mass data of the randomly selected biological molecule with the mass data of each known biological molecule; d) calculating a score for each comparison in step (c), wherein the score is a function of similarity between the data; e) selecting a score from the scores calculated in step (d), wherein the selected score corresponds to the comparison which denotes a degree of similarity between the data which is lower than the highest degree of similarity; f) repeating steps (b) through (d) with different randomly selected biological molecules until a sufficient quantity of scores are selected; and g) determining the frequency of selecting each score and generating therefrom a frequency distribution of scores.
The invention also provides a method of identifying an unknown biological molecule for a particular experimental condition and a particular significance level. The method comprises a) selecting a significance level that represents a level of confidence in a biological molecule identification; b) cleaving the unknown biological molecule into constituent parts by a method that produces constituent parts; c) generating mass data for these constituent parts; d) comparing the mass data generated in step (c) with mass data generated for the experimental condition from known biological molecules of a biological molecule database; e) calculating scores for each comparison in step (d), wherein the scores are a function of similarity between mass data of the unknown biological molecule and mass data generated from the biological molecule database; f) selecting a score generated in step (e) wherein the score corresponds to a comparison which denotes a high degree of similarity and wherein the score corresponds to a particular known biological molecule in the biological molecule database; and g) determining whether the score selected in step (f) is equal to or larger than the critical score.
In another embodiment the invention comprises a computer program product which comprises a computer usable medium having computer readable program code means embodied in said medium for generating a frequency distribution of scores, wherein the scores relate to random identifications of biological molecules. The computer program product includes: a computer readable program code means for causing a computer to generate mass data for each known biological molecule in a biological molecule database for a particular experimental condition; computer readable program code means for causing the computer to generate mass data of a hypothetical biological molecule for the experimental conditio

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for assessing significance of protein identification does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for assessing significance of protein identification, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for assessing significance of protein identification will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2841740

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.