Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Chemical analysis
Reexamination Certificate
2001-11-20
2004-02-03
Delcotto, Gregory (Department: 1751)
Data processing: measuring, calibrating, or testing
Measurement system in a specific environment
Chemical analysis
C702S029000, C702S030000, C702S031000, C702S032000, C702S033000, C703S002000, C703S023000, C703S027000
Reexamination Certificate
active
06687621
ABSTRACT:
TECHNICAL FIELD
The present invention relates to an improved computational method for predicting a property and/or performance of polymers, and/or identifying and designing polymers that provide said desired property and/or performance, wherein the desired property can be provided by the neat, undiluted polymers, or diluted polymers in a composition.
BACKGROUND OF THE INVENTION
An experienced chemist can tell much about the chemical reactivity or physical properties of a molecule just by looking at its structure. As the pool of chemical experience and knowledge accumulates, and the speed of computers increases, there is a growing desire to design methods to correlate the chemical and physical properties as well as other useful properties (such as biological activities) of the chemicals to their chemical structure.
The general method is described as a quantitative structure-activity relationship (QSAR) or quantitative structure-property relationship (QSPR), and is described in, e.g., H. Kubini in QSAR: Hansch Analysis and Related Approaches, published by VCH, Weinheim, Germany, 1993, and, D. J. Livingstone, Structure Property Correlations in Molecular Design, in Structure-Property Correlations in Drug Research, Han van de Waterbeemd, ed., Academic Press, 1996, said publications are incorporated herein by reference. In this method the structures of a representative set of materials are characterized using physical properties such as log P (base-10 logarithm of the octanol-water partition coefficient P), fragment constants like Hammett's sigma, or any of a large number of computed molecular descriptors (for example, see P. C. Jurs, S. L. Dixon, and L. M. Egolf, Representations of Molecules, in Chemometric Methods in Molecular Design, Han van de Waterbeemd, ed., published by VCH, Weinheim, Germany, 1995.
In the general case, a “representative set”, sometimes also called a “training set”, of materials is a collection of materials that represent the expected range of change in both the property of interest (the property to be predicted using the model) and also the range of molecular structure types to which the model is designed to apply. The size of the set of materials necessary to constitute a “representative set” is dependent on the diversity of the target structures and the range of property values for which the model needs to be valid. Typically, one needs to have about 20 to about 25 materials to begin to generate statistically valid models. However, it is possible to obtain valid models with smaller sets of materials if there is a large degree of similarity between the molecular structures. A general rule of thumb suggests that the final model should include at least about five unique materials in a training set for each parameter (molecular descriptor or physical property) in the model in order to achieve a statistically stable equation and to avoid “overfitting”, the inclusion of statistical noise in the model. The range of the experimental property being modeled must also be broad enough to be able to detect statistically significant differences between members of the representative set given the magnitude of the uncertainty associated with the experimental measurement. For biological properties, a typical minimum range is about two orders of magnitude (100 fold difference between the lowest and highest values) because of the relatively large uncertainty associated with biological experiments. The minimum range requirement for physical properties (e.g. boiling points, surface tension, aqueous solubility) is usually smaller because of the greater accuracy and precision achieved in measuring such properties.
There are practical limits to the size of the molecules that can be studied using known QSAR techniques. Typically, these methods are applied to small organic molecules. The term “small” usually refers to non-polymeric materials with less than about 200 atoms including hydrogens. The practical reason for this limitation is that the vast majority of calculated molecular descriptors begin to lose the ability to distinguish one structure from another as the size of the molecules gets larger. For example, the addition of one methyl group (a carbon and three hydrogens) to benzene increases the molecular weight (an example of a molecular descriptor) by about 17.9% whereas the addition of the same methyl group to a C
100
linear alkane changes the molecular weight by less than 1%.
The model developed is often a multivariate, (involving many parameters, linear regression equation that is computed by regressing a selected set of molecular descriptors or physical properties against measured values of the property of interest (e.g., Y=m
0
+m
1
x
1
. . . +m
n
x
n
, wherein Y is the measured property of interest, x
1
, x
2
. . . x
n
are the molecular descriptors or physical properties, m
0
, m
1
. . . m
n
are the regression coefficients, and n is the number of descriptors or physical properties in the model). A number of different methods have been employed for the selection of the parameters to be included in the regression equation, such as stepwise regression, stepwise regression with progressive deletion, best-subsets regression, etc. More recently, evolutionary methods such as genetic algorithms, or learning machines such as neural networks have been used for parameter selection.
The first indicator used to judge the quality of a regression model is the coefficient of multiple determination, or R
2
. This measures the proportion of the variation of the observed property (the property being modeled, the dependent variable) that is accounted for by the set of descriptors (independent variables) in the model. The correlation coefficient between the fitted property values (calculated using the model) and the experimentally observed property values is termed the coefficient of multiple correlation, commonly called the correlation coefficient, or R, which is the positive square root of R
2
. All commercial statistical packages report R
2
as a standard part of the results of a regression analysis. A high R
2
value is a necessary, but not a sufficient condition for a good model. It's important that a model account for as much variation in the dependent variable as possible. However, the validity of the model must be determined using a variety of other criteria.
Once a model has been developed, it must be validated. This process includes the consideration of statistical validation of the model as a whole (e.g., overall-F value from analysis of variance, AOV) and of the individual coefficients of the equation (e.g., partial-F values), analysis of collinearity between the independent variables (e.g. variance inflation factors, or VIF), and the statistical analysis of stability (e.g., cross-validation). Most commercial statistics software can compute and report these diagnostic values. If possible, one employs an “external prediction set”, a set of materials for which the property of interest has been measured, but which were not included in the development of the model, to evaluate and demonstrate the predictive accuracy of the model.
A wide variety of software is available to perform various parts of the model development process. Descriptors can be pulled from databases (e.g., in the case of fragmental constants), or computed directly from the molecular structure of the materials. Non-limiting examples of programs which can be used to compute descriptors are SYBYL (Tripos, Inc., St. Louis, Mo.), Cerius2 (Accelrys, Princeton, N.J.), and ADAPT (P. C. Jurs, Pennsylvania State University, University Park, PA). These same programs can also be used to perform the statistical model development which includes the determination of the correlation coefficient between the computed estimates and the experimentally-derived property of interest plus subsequent model validation. Alternatively, commercial statistical programs like Minitab for Windows (Minitab, INC., State College, Pa.) can be used to generate and validate model equations.
One approach for describing
Gosselink Eugene Paul
Kramer Michael Lee
Laidig William David
Schneiderman Eva
Stanton David Thomas
Camp Jason J.
Delcotto Gregory
Miller Steven W.
The Procter & Gamble & Company
Zerby Kim W.
LandOfFree
Predictive method for polymers does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Predictive method for polymers, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Predictive method for polymers will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3285298