Data processing: measuring – calibrating – or testing – Measurement system – Statistical measurement
Reexamination Certificate
1999-09-20
2002-07-23
Hoff, Marc S. (Department: 2857)
Data processing: measuring, calibrating, or testing
Measurement system
Statistical measurement
C370S254000, C342S450000
Reexamination Certificate
active
06424929
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to the field of measurement of activity of various kinds, such as (but is not restricted to) traffic in a data communication network, and more particularly to a method of measuring the activity to an improved accuracy.
BACKGROUND TO THE INVENTION
In measuring activity of various kinds, such as electrical, fluid, information, object, etc. flow, and conditions, performance, and other activities, measurements are taken at particular times. These measurements often result in the determination of outlier points, particularly in conditions where the distribution of points is unpredictable. This problem is very severe in data communication networks.
Measurements are usually made with error; a set of measurements of the same value is often normally distributed about an exact point. However, some of these measurements will have enormous errors associated with them, due to some gross experimental mistake such as recording millimeters rather than meters. The detection and rejection of these outliers can be readily made where the expected distribution of the measurements is known and has a finite variance. However, in self similar distributions the variance is infinite (or is only limited by the maximum capacity for activity of the object). Outlier points in self similar distributions cannot therefore be recognized and rejected by using the expected distribution.
Leland et al, as described in the publication “On the Self Similar Nature of Ethernet Traffic” by W. E. Leland, W. Willinger, M. S. Taqqu, D. V. Wilson: ACM SIFCOMM, Computer Communication Review: pp 204-213, January 1995, discovered that the distribution of data network traffic is self similar, so that traffic measurements made on data networks suffer from the problem of outlier rejection.
Moreover, the frequency of outliers in data networks is extremely high. When measured over a wide variety of data networks the average outlier rate was 1%, practically all of these outliers being high. These outliers make the detection of alarm levels of activity in data networks very prone to error and also makes the forecasting of activity in data network very unreliable.
The only previously known general outlier detection method is one which rejects outlier measurements if the measurement is greater than or less than a possible range of values. All other specific methods rely on knowledge of the distribution of possible values.
A complex model allows the use of more than one variable in forecasting the distribution of a single variable, where feasible. In the absence of any model, the previous distribution of a variable is the best forecaster of the future distribution of that variable. Data communications network managers very much want to know when their networks or parts of their networks will run out of capacity. It was believed for many years that models of network behaviour could predict the future; since this used to be true for voice networks, it was assumed it would be true for data. Therefore models were developed which tried to predict future peaks based on previous means and peaks.
However, the work of Leland et al referred to above, and many others subsequently, have now shown that data network traffic is self similar (fractal). This implies that the variance of data network traffic is not only infinite but also is not even related to the mean. Therefore attempting to predict future peaks using any model that includes the mean is clearly wrong.
Moreover, if the variance is truly infinite, future peaks cannot even be predicted from previous peaks. In other words, self similar distributions may have a lower limit but do not have an upper limit.
However, it had not previously been observed that since communications lines do have upper limits in capacity, therefore the distributions in them cannot be truly self similar and their variances are definitely finite. Under these conditions the previous peak values can be used to predict future peak values, but the relationship between the mean and the peak remains indeterminate.
The idea of using linear fits to prior peaks to forecast future peaks in data communications networks had been previously invented by N. W. Dawes. Once tested, however, the problem of the peaks being heavily contaminated by invalid data points was noticed. The problem rate was found by experiment to be very high, with most forecasts being significantly faulty. Moreover, attempting to report the peak activity of any port in a network (the top talker) over even the last 24 hours was found to be routinely wrong, as the following will illustrate.
Activity values recorded by data communications devices about their own activity has been found in practice to be astonishingly error prone, with an average outlier rate of 1%. For example, attempting to determine the daily peak traffic rate on a single interface by measuring the rate every minute requires measuring 1,440 points per day, but on average 14 of these would be outliers, almost all being high. The daily peak point under these conditions would be 14 times more likely to be an outlier than a genuine value. The outliers were observed to be randomly distributed, so a simple filter that rejected activity levels outside the physical capacity of the interface was added. This rejected 10,000 outliers for every 1 accepted.
However, in monitoring even moderate sized networks of 1,000 devices and 10,000 communications interfaces, about 10 outliers still passed through this filter every day (scattered over these 10,000 interfaces). This left about 4% of all forecasts seriously in error. Moreover, analyses such as finding the busiest interface even just over the last day were routinely wrong. Analysis of the immediately previous year on such a network (a not uncommon requirement) would require 3.65×10
9
points to be cleared of outliers. A far better outlier rejection method is clearly required to enable both accurate historical analysis and accurate forecasting in data communications networks.
SUMMARY OF THE INVENTION
The present invention provides a method that rejected in a successful prototype approximately 10
15
outliers for every 1 accepted (in Ethernet networks), while rejecting effectively no genuine points. The invention provides similar performance on ATM, Frame Relay and other protocol based data communications networks. The present invention therefore renders practical and effective the linear forecasting method mentioned above, surprisingly only requiring use of peak data. The method can be used as a filter for the measured points.
It is an important aspect of the present invention that it does not rely on knowledge of the distribution of possible values. It provides very reliable detection and rejection of outliers and so enables very significant improvements in the accuracy of both alarm detection and activity forecasting.
The present invention has application to all fields that involve the measurement of self similar activity and all fields in which measurable activity flows from one object to another. The set of fields with self similar distributions to which the present invention has application is enormous. Therefore the small fraction of those given as examples in this specification are only some of those in which the present invention has applicability. Further, the set of fields which include measurable flows is similarly vast. The embodiments described herein should only be taken as representative of those applications, and the present invention is applicable to all such fields.
In accordance with an embodiment of the present invention, a method of detecting outliers measured during progression of an activity of an entity from one point to another, comprises measuring activity at a point in a first dimension, measuring the same activity at the same point in a second dimension at the same time as measuring the activity in the first dimension, and rejecting outliers which have values outside a maximum expected difference between the activity measured in the first and second dimensions.
DETAILED DESCRIP
Baker Harold C.
Charioui Mohamed
Hendry Robert G.
Hoff Marc S.
Loran Network Management Ltd.
LandOfFree
Method for detecting outlier measures of activity does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for detecting outlier measures of activity, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for detecting outlier measures of activity will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2913403