Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Biological or biochemical
Reexamination Certificate
1998-06-23
2001-01-23
Hoff, Marc S. (Department: 2857)
Data processing: measuring, calibrating, or testing
Measurement system in a specific environment
Biological or biochemical
C702S019000, C345S215000, C435S007100, C435S007200, C435S069300
Reexamination Certificate
active
06178382
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to methods for the application of multiple discrete analyses to one or more data sets, and to methods for representing such multiple discrete analyses. More particularly, the invention relates to the use of such methods in fields such as flow cytometry, wherein multiparameter data is recorded for cells analyzed by the instrument, and the evaluation of demographic data, or the analysis and evaluation of any other complex data sets that require multiple discrete operations to be performed in succession.
BACKGROUND OF THE INVENTION
Analysis of complex data often follows a reductionist approach. In other words, discrete analysis steps are performed on the data that, in general, simplify or reduce the number of data values or group the data values into similar clusters. Further analysis steps are then carried out independently on the results of these initial algorithms until the data is finally reduced to one or more outputs that the user desires. These outputs can be as simple as a single number (for example, a mean of values), or as complex as a series of graphs representing different aspects of the data.
Visualizations of the outputs of algorithms can be as simple as a display of a single number, or as complex as a dynamic multidimensional series of graphs. The generation of a visualization is itself an algorithmic process, and is as important to the analysis of data as the functional manipulation of the data. Visualizations can be associated with any given analysis step; thus, a user can completely analyze a data sample by associating successive algorithms and viewing the associated visualization in order to monitor the analysis process.
The advantages of a reductionist approach involving discrete analysis steps is that parts of the analysis can be applied to different datasets that may require different pre-analysis For example, some datasets require smoothing or elimination of spurious data before proceeding with further analysis.
This mode of data analysis is particularly useful in the field of flow cytometry. For example, scientists studying the very heterogeneous composition of white blood cells will typically employ measurements that discriminate these cells by revealing the presence or absence of particular proteins on the cells. Some of these proteins can discriminate major classes of white blood cells (i.e., B cells vs. T cells); others can discriminate subsets of these major classes. However, most of the proteins are expressed by many of the subsets; thus, it is necessary to use a combination of many different measurements to identify unique kinds of blood cells.
Typically, a scientist will first separate flow cytometric data values into sets corresponding to the major white blood cell types. To further differentiate between subsets, the researcher will view graphs that are derived from data only corresponding to these sets. As more and more restrictions are placed on the data, finer and finer subsets of cells are identified. Once the subsets have been identified, the scientist will typically desire a variety of different statistics to be determined for the cells contained in that subset
Often, the steps taken to analyze flow cytometric data can be repeatedly applied to multiple data samples. The specific gating (i.e., the restriction of the data values to particular sets) can be applied to, for example, different samples obtained from different individuals. A particular gating can also be used within the same sample to differentiate subsets of different major classes (for example, the same gating may identify subsets of B cells or subsets of T cells, depending on which data values are inputted to the algorithm). This is an underlying principle of batch analysis: the repetitive application of a series of algorithms in order to achieve similar analysis results on multiple samples.
A significant drawback of this approach is that different samples may require slightly (or significantly) different algorithms to achieve the same principal goal. In other words, in one sample, the major cell divisions may require a different type of gating than that required in another sample. However, subsequent analyses such as further gating or statistics may be identical between the two samples. Current analysis techniques do not provide the flexibility to allow for specific modification of certain algorithms within an analysis scheme while still allowing for easy batch analysis.
It will be apparent to one knowledgeable in the field of data analysis that the analysis processes and inherent limitations described above for flow cytometric data can be equally found in other types of data analysis. These include, but are not limited to, the analysis of demographic data and the analysis of clinical data. These data types are examples of highly multiparametric datasets (wherein many measurements are made for each member of the dataset) that can require complex analysis that may take many steps.
Current implementations of data analysis programs are extremely poor in the area of batch-mode analysis (i.e., repetitive analysis of multiple sample datasets). In general, batch-mode analyses are accomplished by the identical and repeated application of an algorithm, without allowing for sample-specific modifications to such algorithms. Therefore, after application of the batch process, the user must go back and re-analyze those samples requiring different steps. This process becomes especially tedious and error-prone when the batch analysis must be repeated (for example, to change one step in the batch analysis). This puts an enormous demand on the user to remember which samples require modifications of the algorithms, and what those modifications are. Current implementations also have no “automatic” mechanisms for scheduling batch analysis. Typically, users must select a set of sample data files and issue the command to apply a given algorithm to that entire set. When a new set of data samples has been collected, the user must re-issue the batch command for every algorithm to the new data samples. Finally, most implementations of data analyses do not allow the user to associate a descriptive name with the algorithms employed. The algorithms are often cryptic and difficult to immediately understand; thus, the user often will make mistakes by not recognizing subtle modifications to algorithms. Even when implementations allow users to annotate algorithms, the annotation itself has no functionality to the implementation, which tends to dissuade users from performing the annotation.
In the end, current data analysis programs place too much of a burden on users to keep track of the precise algorithms used to analyze samples. In addition, they provide few tools to employ these algorithms repetitively, and when they do provide such tools, these tools do not allow for any flexibility in the application to datasets requiring specific modifications of those algorithms.
SUMMARY OF THE INVENTION
This invention encompasses features derived from the application of three interrelated concepts to address the needs of analysis of multiple complex multiparameter datasets. In general, analysis steps are considered to be discrete algorithms that can be applied in succession or in parallel in order to carry out operations that manipulate the data. Such operations can modify the data values, reduce or increase the number of data values, or generate summary statistics; in most cases, operations can be applied to the data that results from other operations.
The first concept, “Functional equivalence by Algorithmic Polymorphism” (FEAP), allows for the referencing of mathematical (or other) algorithms by abstract names that may or may not be unique. Thus, the user or the program can assign a specific name to an analysis step to indicate what the purpose of the step is (algorithmic polymorphism). Future dependent analyses are assigned to this step by its name rather than by virtue of the precise algorithm (functional equivalence). This allows the use of distinct mathematical algori
Bigos Martin
Herzenberg Leonore A.
Moore Wayne A.
Parks David R.
Roederer Mario
Bui Bryan
Hoff Marc S.
Lumen Intellectual Property Services
The Board of Trustees of the Leland Stanford Junior University
LandOfFree
Methods for analysis of large sets of multiparameter data does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods for analysis of large sets of multiparameter data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods for analysis of large sets of multiparameter data will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2480025