Systems for the analysis of gene expression data

Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Biological or biochemical

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C435S004000, C435S006120, C536S023100, C536S024300, C702S019000

Reexamination Certificate

active

06263287

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to the field of computer systems. Specifically, the present invention relates to computer systems for the analysis and manipulation of gene expression data. Advances in the genomics area, specifically in the development of the microarray (Schena et al., Science 270: 467-470 (1995)) and GeneChip® (Lockhart et al.,
Nature Biotech
. 14: 1675-1680 (1996)) technologies, require new bioinformatics tools for the manipulation, analysis and processing of gene expression data. Many disease states and related conditions are characterized by differences in the expression levels of various genes. These differences may occur through changes in the copy number of DNA or through changes in levels of transcription of the genes. Indeed, the control of the cell cycle and cell development, as well as diseases, may be characterized by variation in the transcription levels of genes.
Of particular interest to those in the bioinformatics area are systems for identifying the biological functions of genes based on their temporal pattern of expression. One system, known as clustering analysis, clusters genes according to the shape similarity of their temporal pattern of expression, with clusters related to specific biological functions. This approach has been applied to identify genes involved in a metabolic shift from the yeast genome (DeRisi et al.,
Science
278: 680-686 (1997)), and in the central nervous system development in rats (Wen et al.,
Proc. Natl. Acad. Sci. USA
95: 334-339 (1998)). A second approach is reverse engineering, which assumes that the genes dynamically interact with one another as a genetic network (Liang et al.,
Proceedings of the Pacific Symposium on Biocomputing
, Maui, Hi., 1998). The reverse engineering approach can potentially systematically decipher the complex circuitry of the genetic network from the temporal gene expression pattern.
While such clustering analysis and reverse engineering systems are useful, it is desirable to have available a general and flexible system for the visualization, manipulation, and analysis of gene expression data. Such a system preferably includes a graphical user interface for browsing and navigating through the expression data, allowing a user to selectively view and highlight the genes of interest. The system also preferably includes sort and search functions and is preferably available for general users with PC, Mac or Unix workstations. Also preferably included in the system are clustering algorithms that are qualitatively more efficient than existing ones. The accuracy of such algorithms is preferably hierarchically adjustable so that the level of detail of clustering can be systematically refined as desired.
A preferred algorithm for such a system is a clustering algorithm for, e.g., identifying functionally related genes with different time curves. In particular, the clustering algorithm may be used for clustering genes whose functional correlation involves a scale change, a time delay, a vertical flip or any combination of the three. The system preferably also includes a time-curve representation that is both literal and numerical. Literal representations assist in making SQL (Standard Query Language) type database queries. Numerical representations assist in allowing for the arithmetical transformation of curves. Such transformations are useful in differentiating tissue and disease specificity of gene expression. In addition, clustering algorithms and mathematical calculations preferably are tightly integrated with a graphical user presentation interface. Finally, graphics preferably are included to assist in navigation and analysis of the expression data in an intuitive, interactive, and iterative fashion.
Indeed, there is a need for improved computer-aided techniques for the analysis and manipulation of gene expression data. The present invention reflects the preceding attributes and relates to systems and computer programs used for the analysis and manipulation of gene expression data. In a specific embodiment, the systems of the present invention comprise two new clustering algorithms, a presentation interface, and a set of graphical display tools. The system is preferably written in the Java™ programming language (e.g., 100% JDK 1.1, Sun Microsystems, Inc., Palo Alto, Calif.), and thus platform independent.
SUMMARY OF THE INVENTION
The present invention relates to systems for manipulating and analyzing gene expression data. In one embodiment, the system comprises a means for receiving gene expression data for a plurality of genes; a means for comparing the gene expression data from each of said plurality of genes to a common reference frame; a means for assigning a grid representation to each of said gene expression data from said plurality of genes; and a means for presenting said assigned grid representation. More specifically, this system further comprises means for clustering said grid representations. Still further, the grid representation may be normalized to within [−1,1]. The gene expression data for each of said plurality of genes comprises a plurality of expression levels and a plurality of associated time points.
Clustering preferably may be grid clustering or &sgr;-&tgr; clustering. The presentation step of the methods and systems of the invention preferably comprises one or more of the following for each grid representation or cluster thereof: temporal pattern of expression; file designation; gene identification number; major class; sub class; gene description; grid representation; and time curve. This data may then be hyperlinked within said display. Further, clustered grid representations may be compared, for example, based on tissue origin or gene. The clusters themselves may be created based on, for example, gene or tissue origin.
Another embodiment of the present invention relates to a method, in a computer system, of manipulating expression data associated with a gene, comprising the steps of: inputting expression data for a plurality of genes; comparing the expression data from said plurality of genes to a common reference frame; and assigning a grid representation to said expression data based on said comparing step. Based on its assigned grid representation, the expression data may be clustered and presented by relative expression levels. The clustering may also be presented by time stage, or by both relative expression level and time stage. The grid representation preferably comprises a relative expression level component and a time stage component. The relative expression level may preferably comprise three, five, seven, nine, eleven, thirteen, or fifteen relative expression levels. The time stage may preferably comprise two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, or fifteen time stages. Clustered expression data may be sorted by relative expression level, time stage, or by both relative expression level and time stage.
In a further embodiment of the present invention, the resolution of the cluster may be adjusted. A finer grid or a coarser grid may be used for displaying the expression data clusters. Still further, the grid representation may be normalized to within [−1,1].
Another aspect of the present invention relates to the determination of quantitative differences between said grid representations and the measurement of a variance between grid representations. The quantitative differences between said grid representations may exhibit a time shift, a vertical flip, or a time curve.
In another aspect of the present invention, the method of analyzing differential gene expression data comprises the steps of providing a template time curve; associating said time curve with a grid representation; and clustering said grid representations of said expression data based on said grid representation of said time curve.
In yet another aspect, the present invention relates to computer programs for analyzing gene expression data comprising: computer code that receives as i

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Systems for the analysis of gene expression data does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Systems for the analysis of gene expression data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Systems for the analysis of gene expression data will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2507344

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.