Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Chemical analysis
Reexamination Certificate
2003-01-22
2004-12-28
Sun, Xiuqin (Department: 2863)
Data processing: measuring, calibrating, or testing
Measurement system in a specific environment
Chemical analysis
C702S020000
Reexamination Certificate
active
06836733
ABSTRACT:
TECHNICAL FIELD
The present invention relates to the analysis of life science data. More particularly, the present invention relates to computer based interpretation of biological sequences. Even more particularly, the present invention relates to a client/server or Internet based computer tool and method for identification of potential protein, DNA or RNA sites of interest based upon the underlying amino acid or DNA or RNA nucleic acid sequence characteristics.
BACKGROUND ART
Currently, life scientists and molecular biologists are working with a wide variety of manual and automated tools to determine particular characteristics regarding molecular biology data. While automated protein sequencing tools presently generate large volumes of protein amino acid sequence data, tools for easily handling and interpreting the new data have yet to become commonplace. Scientists have been attempting to manage their molecular biology data in a wide variety of ways, from expensive, dedicated and proprietary computer systems to manually reviewing data placed into common word processors or text editors not optimized for handling large amounts of life science information.
The challenge of the first approach lies primarily in its limited accessibility by the average life scientist. Dedicated proprietary computer systems for sequencing and interpretation of molecular biology information often cost far beyond what the budgets of small research operations will permit. Other drawbacks exist in addition to acquisition price, such as closed and user unfriendly proprietary system architecture which does not facilitate cross-platform sharing of molecular biology sequence information. Often researchers using state of the art proprietary systems purchased at great expense will encounter difficulty sharing molecular biology data with other researchers on different computer systems in the same lab, let alone with colleagues in another institution or country.
The difficulty encountered at the other end of the spectrum is just as common, if not more so. Researchers not able to gain access to high cost dedicated molecular biology computer systems may resort to utilizing the most rudimentary of toolsets to interpret their genetic sequence or corresponding protein data. Manually screening through volumes of protein sequence data using basic text editors and word processors is not unheard of, despite the fact that these tools are not optimized for or never designed to handle genetic data in any form. In addition, despite the high degree of sophistication the average life scientist may have with respect to his or her particular field, often a commensurate computer ability is not present in the average life science user. User interfaces currently designed for state of the art molecular biology computer systems can be so user unfriendly that a life scientist may actually prefer to work with a simple and easy to use text editor instead of an inflexible proprietary system. As for the problem of collaborative work on related sequence data, neither approach facilitates remote access to lab generated or public domain sequence library information.
In the end, a technologically robust and user friendly system for remotely interpreting and managing life science data is truly needed. Such a system would aid not only the research process itself, but would speed the end product of the research as well. An improved and broadly accessible tool for interpreting life science data would not simply aid research in and of itself, but bring about discoveries in an accelerated manner. Data brought closer to understanding by the life scientist consequently means accelerated medical breakthroughs, improved drug therapies, and better understood systematic models of disease and regulatory processes.
DISCLOSURE OF INVENTION
By combining a powerful biological sequence site scoring tool with remote computer access functionality, a web-based tool for the identification of molecular biology sequence sites is hereby disclosed. An example of the present system and method functionality is provided using the identification of Caspase cleavage sites as a working example. Scoring as applied to potential protein modifications sites is based on amino acid sequence characterization, and is easily modifiable to be utilized by nucleic acid sequences.
Disclosed is an objective, quantitative method and apparatus for searching and evaluating biological sequence data relative to a selected functional characteristic, such as enzyme cleavage site, binding site, secondary structure, or potential modification site. Software is used to scan known target sequences of amino acids, DNA or RNA base pairs, searching for sequence regions exhibiting composition characteristics derived from scoring matrices provided by user input. Characteristics may include number of residues, presence of specific residues, or specific sequences of residues. Sequence regions exhibiting characteristics similar to the predetermined characteristics are identified, flagged and quantitatively scored for closeness of fit to the group of all predetermined characteristics, including quantitative scoring for mandatory characteristics and exclusionary characteristics. Scoring takes place based upon one or more scoring matrices which detail the individual predetermined characteristics and their respective quantitative scores. The scoring matrices can be used to predict the relative functional effect of individual biological sequences within a potential sequence site and help interpret combinations of sequences relative to the specific functional characteristic of interest. The invention further provides the user with the ability to select threshold cutoff values to be used by the software for evaluation of scoring matrix results, thereby assisting the user in the site location identification process, and providing the user the ability to evaluate the effect of substitutions of characteristics.
To practice the claimed invention on a particular protein amino acid sequence cleavage site, one or more scoring matrices are developed. These scoring matrices are derived by comparing the cleavage sites from known protein targets and determining the frequency of amino acid content at each position. A score for each possible amino acid is then set for each position based on this frequency. For example, a particular cleavage site in a protein may contain 5 amino acids. If it were found that Aspartic acid occurred 50% of the time at the first position and Leucine 50% of the time at this position, then each of these amino acids would have a score of 0.5 and the remaining amino acids would have a score of 0 at this position. To ensure the return of particular results, such as when particular amino acids must be present (weighted score greater than or equal to 1) or must not be present (negative weighted score), scores outside of the anticipated frequency range can also be inserted. Thus, for each of the five positions in a protein cleavage site a score for each of the 20 amino acids is created, this information is stored in the scoring matrix. Each possible cleavage site in a target protein is assigned a cumulative score based on this matrix. All possible cleavage sites can be listed, sorted by this score. A threshold can be set such that only scores above a certain level of identity are returned when queried. This search can be performed on a single protein or on large public protein databases, residing anywhere from the initial client computer, the central server computer, or remotely on public databases accessible via the Internet. The data searched can be resident on the server undertaking the analysis, or remotely retrieved from public or private sequence databases. In addition, the results returned can be sent to a single remote client computer, or to a plurality of remote systems. Due to the pervasive nature of the Internet, it is an intended consequence of the claimed invention that multi-user collaboration is made possible under the client/server computer model, with data sets and stored queries are ea
Kozlowski Jeff
Olson N. Eric
LaRiviere Grubman & Payne, LLP
Sun Xiuqin
Vizx Labs, LLC
LandOfFree
Biological sequence pattern probe does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Biological sequence pattern probe, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Biological sequence pattern probe will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3304268