Feature classification for time series data

Data processing: measuring – calibrating – or testing – Measurement system – Performance or efficiency evaluation

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06735550

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention is related to the field of data processing, and in particular, to classifying features in time series data.
2. Statement of the Problem
The analysis of times series data plays a fundamental role in science and engineering. An important analysis step is the identification and classification of various features in the data. Quality control can be viewed as a subclass of general feature identification and classification, for example, differentiating between a true signal and a contaminating signal. Many algorithms exist for the quality control of time series data, such as Fourier or wavelet analysis, as well as robust and standard statistics. However, for other classification problems, image processing techniques have been used to great advantage. Human analysts are adept at feature identification and classification, nevertheless in many applications it is desired to have an automated algorithm that performs this role.
In time series data, the image that the analyst considers is simply a plot of the time series. Subconsciously, the analyst identifies clusters of points, correlation structures, and also uses a prioi knowledge related to the structure of features in the data. Further transformations and subsequent images of the data are often useful in performing these tasks, such as plotting on different scales and creating histograms and correlation scatter plots. Additionally, the analyst tends to think of data quality in terms of a probability, i.e. the level to which a datum is good or bad. Another important technique the analyst uses is a combination of local and global analyses. For instance, an isolated outlier in the data is easily detected by the analyst looking on a local scale. However, for numerous consecutive outliers, the analyst must consider the data over a larger scale to identify the sequence as outliers.
Typical outlier detection and quality control algorithms are Boolean in nature. That is, they indicate that a data point is either good or bad. Data points that are very bad are grouped with data points that fall just below the “good” threshold. Furthermore typical outlier detection and quality control algorithms tend to use strong a priori assumptions, and usually rely on a single test or method.
Most time series analysis methods perform on either a local or global scale. For instance, the running median is an example of a local algorithm over the scale of the median window, whereas typical histogram methods use the data over a longer time scale.
FIGS. 1 and 2
illustrate how an algorithm can work well on one time scale but fail on another.
FIG. 1
shows actual time series data where the instrument was failing, the top plot shows the data coded by a confidence index (high confidence to low confidence correlates respectively to circle, square, triangle, and cross). The confidence in this case was calculated using statistics from a global histogram. Notice that the data in the primary mode is given a high confidence value (circles), while the excursions from the main mode are assigned low confidence values (cross). This algorithm does a good job of flagging the most egregious outliers, but at the same time, valid peaks in the data are given low confidence values. Of course, these peaks can be given higher confidence values by changing parameters in the algorithm, however, this change would also raise the confidence of some of the outliers.
The lower plot in
FIG. 1
shows the same data overlaid with a 30 point running median line. The running median does an excellent job of eliminating the outliers in the center right of the plot, however it fails for the “dropouts” in the left hand side. This results from “saturation” of the filter, i.e., when over half the window length of data are outliers.
FIG. 2
illustrates two sequences of data which have identical distributions. The upper left hand plot is simply a sigmoid function with small uniform fluctuations. The upper right hand plot is a histogram of this data. The lower left hand plot shows the data from upper left hand plot re-ordered in a random manner. Suppose a global histogram method was used on these two examples. The algorithm would correctly identify many of the points the points in the lower left hand plot as outliers, however, for the data in the upper left hand plot, many of the points would incorrectly be identified as outliers.
The National Center for Atmospheric Research (NCAR) is developing a terrain-induced wind turbulence and wind shear warning system for the aviation community in Juneau, Ak. As part of this system, pairs of anemometers are located on nearby peaks and around the runways which measure the wind every second. For operational purposes, a requirement is to produce reliable one minute averaged wind speeds, wind speed variances, wind speed peak values, and average wind directions. Since these values are updated every minute, it is possible to perform extensive calculations on the data. In general, the anemometers are highly reliable, however there are cases where the sensors make erroneous measurements. Since the mountain-top sensors are sometimes inaccessible, it is important to differentiate between good and bad data even when an instrument is failing. For example, the strong winds encountered in Juneau have been known to vibrate and then loosen the nuts holding the anemometers in place. An example data set from an anemometer exhibiting this problem is shown in FIG.
3
. The actual wind speed as measured by the anemometer varies around the range of about 17 m/s. The horizontal axis is time in seconds. Data “dropouts” caused by the mechanical failure can be seen intermittently in the data, centered near 1 m/s.
FIG. 4
is data for the same time interval from a second anemometer in close proximity (3 meters) to the first. As can be seen from the plots, the data dropouts are not present in
FIG. 4
, hence the dropouts are an artifact of a mechanical failure and not caused by turbulent structures in the wind.
Other failure modes can be caused by icing of the anemometer or shielding from certain wind directions by ice build-up. Furthermore, it is known from video footage that certain wind frequencies excite normal modes of the wind direction head and can cause the device to spin uncontrollably. Data from such a case can be seen in
FIG. 5
where the vertical axis is wind direction measured in a clockwise direction from North. The horizontal axis is again time measured in seconds. Between about 500 seconds and 1000 seconds the wind direction measuring device is spinning and the data becomes essentially a random sample of a uniform distribution between about 50 degrees to 360 degrees. The true wind direction is seen as an intermittent data at about 225 degrees, which is in general agreement with the value from the nearby anemometer.
FIG. 6
shows the wind direction at another time distinct from that in
FIG. 5
, where in this example, the true wind direction is around 40 degrees. Notice the suspicious streaks in the time series data near 200 degrees.
In the context of these anemometer examples, the crux of the quality control problem is to determine which data points are “bad” (not part of the atmospheric data) and which data points are “good” (part of the atmospheric data). Separating the good data from the bad can be especially difficult when some bad data points have characteristics of good points. For example, during an episode of highly changing, gusty winds there may be sensor problems that manifest in a way that are similar to valid wind gusts, such as some of the dropout data in FIG.
3
. Consequently the problem is to identify the suspect data without mislabeling similar looking good data.
Time series algorithms such as Auto-Regressive Moving Average (ARMA) may be used to remove isolated outliers in stationary data. Data are used to compute model coefficients and variance estimates, if the point in question is a large distance from the model prediction in terms of the estimated variance, such a point may be calle

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Feature classification for time series data does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Feature classification for time series data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Feature classification for time series data will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3190320

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.