Method and system for the creation, application and...

Data processing: artificial intelligence – Knowledge processing system – Knowledge representation and reasoning technique

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C706S047000, C706S061000, C706S924000

Reexamination Certificate

active

06741976

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to the field of sorting and analysis of vast amounts of scientific data, and more particularly to methods and systems for the discretizing of biological, medical or biochemical data and generation of logical rules from that data, followed by processing of the generated rules.
BACKGROUND OF THE INVENTION
Scientists create vast amounts of raw data. The sheer volume of such data renders difficult, if not impossible, the ability to draw complete conclusions from that data. Accordingly, mathematicians are requested to develop processes to analyze such data and, in particular, to study, organize, and determine rules (also called logical statements) for the presentation and analysis of such raw data in ways that can become scientifically important by exposing rules (and their bases) in a manner that permits conclusions to be drawn.
Traditional approaches to the creation of scientific data (and, more importantly, biological, medical and biochemical data) followed by human analysis fall short because literally thousands if not millions of data points are created. The intuitive ability of the human mind to analyze such data and draw logical, rational and appropriate conclusions has been emulated by computer-assisted analytical techniques including, e.g., the creation and analysis of logical rules for such data.
With respect to biological data in particular, vast amounts are created virtually daily. Within the class of biological data, lies a subclass of genetic data. With respect to genetic data, which is a particular segment of the biological community, the Human Genome Project and its progeny have created a simply unmanageable quantity of potentially relevant information relating to gene sequencing and expression. One of the major goals of molecular biology is to study such data and determine how different genes regulate one another. Thus, a major research effort has been targeted towards understanding and discovering gene regulation patterns. Likewise, huge amounts of medical data is created by laboratory and other analyses of medically significant biochemical moieties and their variations. The coding of protein interactions (and their DNA/RNA interfaces for synthesis) in the field of proteomics also results in significant data creation. Not all the data is relevant, yet some of the data that might appear at first human blush to be marginal, when combined with other data points, can reveal logical rules with appropriate statistical reliability, thereby enhancing the ability to modulate the experimental protocols employed or the conclusions determined.
One of the main techniques used by biologists for the creation of data concerning genetic expression is the oligonucleotide microarray method, which has reached popularity in the last few years. This technique permits biologists to produce large quantities of gene microarray data points that profile gene expressions under different conditions, at different times during development or in the presence of different factors that include, without limitation, drugs, environmental conditions, biochemical compounds, and the like. Typically, biologists generate a set of tests applying this method to a biological sample, where a single test would contain information on the expression levels of genes in the sample, and the number of tests would result in a range of measurements from a few dozen to a few hundred.
Gene regulation may be understood by measuring the amounts of different gene products produced by a cell. This production process, called gene expression, creates as a product a form of RNA. The oligonucleotide microarray method is a standard method employed to measure amounts of this form of RNA, in which this form of RNA is hybridized to an oligonucleotide microarray that allows the measurement of expression levels of up to tens of thousands of genes in a single experiment. From the computational point of view, the expression level is represented as an arbitrary real number. Therefore, the result of a single experiment is an array of “N” real numbers, where “N” remains the same across different experiments and depends upon the genes sought to be measured by the experimenter.
In order to discover how different genes regulate one another, biologists typically conduct multiple experiments to determine the manner in which different gene expressions change depending upon the type of tissue, age of the organism, therapeutic agents, and environmental conditions. Moreover, biologists are more interested in the method by which gene expressions vary in these experiments relative to normal expression levels in an organism, rather than absolute values of gene expression.
Accordingly, the manner in which patterns of genetic output change across different samples reflect underlying biological processes in the organism whose genes are being studied. It is of crucial importance for biologists to understand these biological processes, and a major research effort has been launched towards the discovery and biological interpretation of gene regulation patterns. As a result, millions of data points have been generated.
Typical data analysis techniques for handling vast amounts of oligonucleotide microarray data are based mainly on manual selection, querying and clustering techniques. Manual selection of patterns is usually performed by a direct “eyeballing” of the data by a person with some amount of experience or specialized expertise. This traditional approach is virtually impossible when the size of the database gets too large.
Database-querying techniques include SQL querying methods, and permit the data analyst to apply pertinent queries to the data and receive responsive information. While such techniques are effective in instances when the analyst is cognizant of the attributes of the data and thus can determine the queries, when the data is vast in size and the queries are less obvious, these techniques prove to be ineffective.
Clustering methods are shown in, for example, Eisen, et al. “Cluster Analysis and Display of Genome-wide Expression Patterns,”
Proc. Nat'l. Acad. Sci. USA
, 95(25):14863-8, 1998, and also include self-organizing maps as shown in Tamayo, et al., “Interpreting Patterns of Gene Expression with Self-Organizing Maps: Methods and Application to Hematopoetic Differentiation,” Proc.
Nat'l. Acad. Sci. USA
, Vol 96, pp. 2907-2912, March 1999. Such methods group genes into clusters that exhibit “similar” types of behavior in the experiments. These clustering methods allow biologists to design experiments helping them to understand further the relationships among the underlying data points, and hence the genetic expressions shown by those data points. However, such traditional clustering methods fail to provide deep insights into specific relationships among genes and biological processes in the cell because the clusters are, by definition, broad categories.
Support vector machines (“SVM's”) have been employed to overcome the problems associated with the querying, clustering and self-organizing map approaches, as shown in Brown, et al., “Knowledge-Based Analysis of Microarray Gene Expression Data by Using Support Vector Machines,” in
PNAS
, vol. 97, Issue I, pages 262-267, Jan. 4, 2000. In particular, the SVM method described in Brown, et al. builds a gene classifier based on some training data by using SVM methods that draw hyperplanes that separate different classes of data (e.g., positives from negatives). Then these classifiers are used to identify unknown functions of genes. SVM methods seek to solve an important but very specific problem of identifying functions of genes based upon predetermined classifications of functions using supervised machine learning methods. There is, however, much more to the analysis of biological and genomic data than just the identification of gene functions.
It is thus an object of the present invention to overcome the shortcomings of the prior art and provide a mathematical system and method that employs a computer for processing

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for the creation, application and... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for the creation, application and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for the creation, application and... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3267809

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.