Anomaly detection in multidimensional time series—a graph-based approach

As the digital transformation is taking place, more and more data is being generated and collected. To generate meaningful information and knowledge researchers use various data mining techniques. In addition to classification, clustering, and forecasting, outlier or anomaly detection is one of the most important research areas in time series analysis. In this paper we present a method for detecting anomalies in multidimensional time series using a graph-based algorithm. We transform time series data to graphs prior to calculating the outlier since it offers a wide range of graph-based methods for anomaly detection. Furthermore the dynamic of the data is taken into consideration by implementing a window of a certain size that leads to multiple graphs in different time frames. We use feature extraction and aggregation to finally compare distance measures of two time-dependent graphs. The effectiveness of our algorithm is demonstrated on the numenta anomaly benchmark with various anomaly types as well as the KPI-anomaly-detection data set of 2018 AIOps competition.


Introduction
As a result of digitalization, data is increasingly being generated and collected digitally. Obtaining immense amounts of data was once very costly and difficult, but nowadays the cost of storage options has rapidly decreased [1]. Thus, a massive amount of data can be collected over time and used with various data mining techniques to generate meaningful information and knowledge [2].
In addition to classification, clustering, and forecasting, outlier or anomaly detection is one of the most important research area in time series analysis i.e. [3][4][5]. Outlier detection in time series characteristically involves the labeling of unusual changes, subsequences, and temporal patterns in the data as outliers [6]. Typically, outlier detection is used for credit card fraud detection, attack vectors in computer networks, fault diagnosis in the industry or predictive health management [6].
Representing time series data in graphs can be helpful in displaying outliers or anomalies in the data. The transformation of a times series to a graph enables the comparison of one time series segment to another time series segment, allowing the study of data objects that are now interdependent. The assumption in the research of graph-based algorithms for outlier detection is that these algorithms can detect outliers or anomalies in time series. Furthermore, it is competitive to the use of neural networks [3].
In this paper we explore existing graph-based outlier detection algorithms applicable to static and dynamic graphs. We optimize the NetSimile algorithm [7] that extracts and statistically compares the structural properties of data over time and apply it to time series that are structured as graphs. NetSimile, a scalable approach to size-independent network similarity, is solely designed for network structures and does not work on traditional time series data. Given that we want to take advantage of graph-based algorithms for time series, in this work, we contribute a way to represent time series data as graphs and likewise, detect outliers in multidimensional time series. Our results can be summarized as follows. We have applied a novel dynamic graph-based algorithm to multidimensional time series data and successfully detected different types of outliers. We have presented a method that transforms time series data into graphs. For the NetSimile-based algorithm, we have developed additional features to apply it to fully connected graphs. This combination of transforming time series into graphs and then applying the proposed modification of the NetSimile algorithm has not been evaluated in the literature before. We have tested the efficiency of our algorithm on synthetic as well as real data sets.

Related work
The following taxonomy subdivides the algorithms evaluated for this work and gives a general overview on the research of graph-based methods used to analyse time series.
Akoglu et al [3] provide a survey on outlier detection in static graphs. In static graphs, outlier detection methods can be structured in the following way.
Structure-based approach: with the help of the representation of nodes and edges in an ego network [8], structure-based approaches extract associated properties of each ego network, such as the number of nodes and edges. Subsequently, this kind of algorithms identifies those nodes and edges that structurally differ massively from the rest of the ego networks. Kang et al [9] use a structure-based approach to show nonzero patterns in the adjacency matrix. Their algorithm provides a solution to find structural anomalies visually in plots. In [10] small pattern changes are declared as structural anomalies.
Clustering-based approach: in [11], two nodes and the intersection of their neighbouring nodes are contrasted. Thus, the assumption is made that nodes with very few neighbours compared to the rest of the nodes in their neighbourhood are outliers.
IsoMap-based approach: by performing dimension reduction using the IsoMap-based algorithm in [12] information loss about outliers occurs. Reconstructing the whole graph with the reduced dimensions cannot recover the outlier information. Through a subsequent comparison of the extracted information, outliers become visible.
Percolation-based approach: the percolation-based algorithm in [12] makes the assumption that outlier nodes have higher edge weights than their neighbours. In this approach, the edges with the highest weights are removed from a graph step by step. This removal separates outliers from the rest of the network.
Edge-based approach: in [13], the algorithm iterates randomly over the network. It starts at any node of the network. The algorithm then switches to a neighbouring node on a random basis. The higher the edge weights to a neighbouring node are, the higher the probability that this node will be visited. The algorithm records how often nodes are visited, since nodes with high edge weights to their neighbours are visited more often than nodes with lower edge weights to their neighbours. Nodes that are visited less frequently are declared as outliers.
Ranshous et al [4] provide a survey on outlier detection in dynamic graphs. The approach in [7], which is similar to our approach, considers a snapshot of the graph at temporal intervals. However, we extend the NetSimile approach so that it can be applied to weighted graphs and use it to identify outliers or anomalies in multivariate times series. Different structural features of two snapshots of a graph are compared to evaluate the similarity between them. An outlier is declared in case of a significant change in the graph.
Bathia et al compare, in [14], each incoming edge with the occurrence of the same edges in previous time instances. In doing so, it detects microclusters that are anomalous.
To use graph-based algorithms for detecting outliers in time series data, at first the data must be converted to graphs [13]. Here, distance measures are used to improve the quality of outlier detection. Hence, the use of suitable distance measures is essential for detecting anomalies and outliers via graph-based algorithms.
A comparison of the different distance measures can be found in [15]. For transforming our time series data into graphs, we decided to use the Minkowski distance with either p = 2 or p = 3 for two times series, which have equal length, but other distance measures are also possible and can be evaluated in future work.

Graph-based times series anomaly detection
The following section presents the principles of our graph-based anomaly detection algorithm. Applied to times series, multivariate and univariate, it is able to identify outliers or anomalies that occur in specific time periods. Figure 1 presents an overview of our proposed model. First the transformation from the time series to a graph is illustrated as described in section 3.1. Then the comparison of two or more graphs as described in section 3.2 and 3.3 is layed out.

Transformation
To apply the graph-based algorithm to time series data, the sequential data of the multivariate time series x = x 1 , x 2 , . . . , x T of length T has to be transformed into graphs. Here and in the following we always use small letters and italic font, when referring to a vector. The only exception we make, is when we deal with the signature vectors in subsection 3.3, where we use an arrow on top of the vector in order to indicate, that is a different vector which is not of the same dimension. The transformation of the time series takes place in non-overlapping windows w = x i−n+1 , x i−n+2 , . . . , x i of size n. In order to avoid conflict of the indexes, we omitted the index i for the window w, whenever we deal with the elements within a single window. The size n of the window w depends on the use case and can be seen as a hyperparameter. When dealing with realtime data, the first step is to wait until n data points have been recorded. For the transformation to a graph structure, we use the method presented in [13]. For this purpose, a distance measure between the respective elements of a window w is calculated. Assume x i and x j are two elements of a window w, which might even be d- The distance between the two elements x i and x j is calculated with the Minkowski distance as follows, where k runs over the dimensions of each x i and x j , with either p = 2 or p = 3 see also figure 1.
The n × n matrix D with elements D ij describes the distance matrix of window w. At the same time, the matrix D is the adjacency matrix of a graph G. The distances represent the edge weights of G. In our algorithm each window has the same length n, since otherwise the features calculated in each graph are not comparable. Windows of different lengths might be possible, but it remains unclear how to determine the different lengths in one algorithm. Depending on the use case, applying another distance measure can be reasonable and our usage of the Minkowksi distance could be seen as exemplary.

Feature extraction on graphs
Our algorithm is closely related to the NetSimile algorithm in [7] comparing the similarity between two given graphs. The similarity of two graphs in the NetSimile algorithm is based on a set of structural features i.e. the number of neighbors for each node in the ego network, the number of outgoing edges for each ego network. We use here the definition of [8], where an ego network is defined as a subgraph consisting of a node together with its neighboring nodes and their corresponding edges.  However, the original NetSimile algorithm works on unweighted graphs and can therefore not be used on our transformed data set. If we leave out the weights in the graphs G developed in subsection 3.1, the graphs of each window w can be considered identical as can be seen in figure 1. If we then use the original NetSimile features for each ego network, this results in identical features for each respective graph. In order to run the algorithm on weighted graphs as well, we extend the NetSimile algorithm [7] with features, that are calculated with each edge weight v i in a graph, which are presented in table 1. In our case they are identical with the distances D ij calculated in equation (1). Also in contrast to the original NetSimile algorithm, we do not calculate these features in an ego network, but directly construct for each graph G a signature vector s G , that consists of the five aggregated features median, mean, standard deviation, skewness and kurtosis of the features in table 1 for each matrix D or graph G. A detailed description of the method of signature vectors is provided in subsection 3.3.

Outlier score
As in [7] we use as a distance measure for outlier detection the Canberra distance of two signature vectors s G i and s G j as it is presented in equation (2) Equation (2) represents at the same time the distance between two graphs G i and G j as it is indicated in figure 1. Here k represents the index of each component of the aggregated feature vector of a graph, where all components sum up to 25. The value is 25, as in our implementation we used 5 features as presented in table 1 and for each of these features 5 aggregations as explained in subsection 3.2. Thus a signature vector with 25 components for each graph G is calculated.
In order to calculate an outlier score the Canberra distance of the signature vectors is calculated between the current graph G i and the previous m graphs G i−1 , G i−2 , . . . ,G i−m i.e. we obtain the m distances The outlier score o i is then calculated as the average of these m distances In order to calculate a threshold t for the outlier score o i we calculate a mean value μ and a standard deviation σ of the previous m outlier scores and set: The variable z is a hyperparameter, which is usually taken to be z 1. It controls how much a time series window must deviate from another window to be classified as an anomaly. By only including m preceding graphs in the determination of the threshold t, the algorithm can forget information. The hyperparameter m is therefore called forgetting factor. This allows our algorithm to handle structural changes in the data.

Data
To investigate our algorithm on times series, we use two data sets. One is the numenta anomaly benchmark (NAB) 1 , and the other is the KPI-anomaly-detection data set of the 2018 AIOPs competition 2 .

Numenta anomaly benchmark
The NAB consists of real and synthetic time series data. The benchmark was developed to meet real-time streaming applications. Thereby, all time series in the NAB are labeled. The data sets in the NAB are univariate time series [16]. These data are well suited to study the conduct of our algorithm on one-dimensional data.
We examine a total of 10 different anomaly types from the synthetic time series part of the NAB data set such as signal drift, individual peaks, rising amplitude, signal jumpsdown, signal jumpsup small noise, increasing noise, sequence change, signal flatmiddle, signal jumpsup, signal nojump, where the anomaly types signal drift, individual peaks, rising amplitude, signal jumpsup small noise, increasing noise, sequence change are data sets similar to the NAB data sets that are constructed for anomaly detection. Since the synthetic times series in the NAB data set are univariate time series, we also expanded them to multivariate times series by adding an additional dimension with the same time series type but without outlier (cf appendix A) in order to evaluate our algorithm also on two-dimensional time series.

2018 AIOps's KPI-anomaly-detection
The 2018 AIOps's KPI data set comprises time series data collected from various existing internet companies. It consists of key performance indicators (KPIs) and ground truth labels. These KPIs are web service KPIs, more precisely performance metrics that represent the quality and scale of a web service. The metrics are, e.g. the page response time, the number of page views and the occurrence of connection errors.

Experimental evaluation
To investigate the performance of our novel algorithm we applied it on both data sets described in section 4.

Numenta anomaly benchmark
We performed tests on one-dimensional and two-dimensional time series. In both cases, a window size n = |ω| = 288 was chosen for the respective non-overlapping windows ω. The value for n corresponds to the periodicity of the different time series. For the threshold in equation (5) z = 3 is selected.
Thus, values that deviate from the mean by three times the standard deviation are categorized as anomalies. The calculation of an outlier score is based on the previous graphs with m = 5. For this reason, it is not possible to calculate an outlier score for the first five windows of a time series, since not enough values are available to calculate the distances in equation (4). Likewise, the calculation for the next five graphs is not reliable, as the value 0 is included in the computation of μ and σ. In the original time series of the NAB data set, individual time points are labeled as anomalies. However, our algorithm classifies windows as outliers. Because of this, the outlier indicator in figure 2 spans a whole window from beginning to start although in an online application this indicator would be calculated at the end of each window. Figure 2 illustrates the results for the one-dimensional data. It can be seen, that the starting points of the anomalous behavior are detected in each of the 10 examples. Whereas, whenever the anomalies span multiple windows the end of the anomalies is often not detected accurately, because the algorithm already adapts to the new behavior.
For all of the respective times series in the NAB data set we also calculated in table 2 an F1-score for outlier detection, in order to compare our results with existing models in the literature i.e. [17,18]. However, only the NAB time series signal nojump, signal jumpsup, signal jumpsdown and signal flattmiddle are contained in the original NAB dataset and evaluated in [17,18].
Thus, our graph based model performs on the original NAB dataset, when looking on the F1-score, most of the time better than 0.8 which is the F1-score obtained in [17] or [18] for the best deep learning model. ARIMA models perform in both papers only with an F1-score less than 0.36. Additionally, we tested our algorithm on the two-dimensional NAB data as explained in subsection 4.1. As can be seen in figure 3(a), the same outliers as in the one-dimensional case are detected. Compared to the one-dimensional case, only the outlier score is lower, e.g. the outlier score for the 'signal jumpsup' anomaly is 80. In contrast, in the two-dimensional case,   it still is 15. Nevertheless, the outliers are detected by the algorithm. The complete results can be viewed in appendix A.

2018 AIOps's KPI-anomaly-detection
The 2018 AIOps's KPI-anomaly-detection data set is a web service KPI data set that consists of performance metrics of web services (cf subsection 4.2). This data set comprises 15 different time series. Each of the 15 time series is one-dimensional. As described in subsection 5.1, our algorithm always detects a whole window as an anomaly in contrast to detecting only a point anomaly. The algorithm thus detects if an abnormal data  The results for the web service KPI are at an acceptable level. The requirement of the competition was an F1 score of 0.55. This requirement is satisfied by our algorithm. The time series with the index 5, 11 and 14 do not contain any labeled outliers and our algorithms does not detect any outliers. Therefore our algorithm can be assumed to work correctly on these time series, too. We use the same hyperparameters for each of the 15 times series leading to acceptable results for all of them. Better results might be possible by using different hyper parameters for each of the 15 time series.

Conclusion
In this paper, we present a novel algorithm for outlier detection in data streams. This algorithm is characterized by dynamically transforming the time series into graphs before analyzing the respective signature vectors. The transformation leads to a new application area of the original NetSimile algorithm [7]. In addition, we developed an extension of NetSimile to apply it to weighted fully-connected graphs. Our algorithm uses only a few hyperparameters, therefore making it easy to calibrate. In addition our algorithm is able to learn structural changes in the time series after a certain amount of window sizes as explained in subsection 3.3. Furthermore, the algorithm can be extended or reduced by any number of features, as described in table 1, depending on the use case. This makes the algorithm very flexible.
When applying the algorithm to the numenta benchmark data set, excellent results were achieved. On the one hand, all different outlier types are recognized. On the other hand, they are recognized not only in the one-dimensional but also in the two-dimensional data sets. Our algorithm also achieves acceptable results with the AIOps's KPI-anomaly-detection data set. In a future work we plan to extend the algorithm to provide an outlier score for individual data points in a time series.

Data availability statement
The data that support the findings of this study are openly available at the 2018 AIOps Challenge at Tsinghua University under http://iops.ai as well as from the Numenta Anomaly Benchmark datasets under https://github.com/numenta/NAB.