Prediction Approaches - Predicting Performance Anomaly 93

II. Scalable Monitoring, Performance Anomaly Detec-

6. Predicting Performance Anomaly 93

6.4. Prediction Approaches

Figure 6.2.: Workflow of Analytics Framework

6.4. Prediction Approaches

In this work, we assume a breach of an SLO is an indicator of performance anomalies. We explore different time series estimation and machine learning (ML) methods to predict the SLO violation in large scale cloud computing scenarios. In our evaluations, time series estimation methods are used in two distinct cases. In the first case, they are used directly to predict the in-dividual SLO. In the second case (estimation-classification approach), they are combined with ML algorithms. This estimates the parameters for pre-viously trained machine learning algorithms, which in turn determines the compliance or violation of a particular SLO. Following subsections briefly describe the algorithms used for SLO prediction.

6.4.1. Time Series Analysis Methods

For time series analysis, we employ three modeling methods: autoregressive (AR) model, autoregressive integrated moving average (ARIMA) model and

innovative state space models for exponential smoothing (ETS) [45]. We selected AR and ARIMA model based on the assumption that past QoS values are serially dependent over time, and linear models can fit these values. Since Cloud platforms are characterized by highly dynamic and random workloads, use of non-linear forecasting models can not be ruled out. Realizing this, ETS models have been considered as well. All these methods selects the best model based on Akaike’s Information Criterion (AIC) where the best model is one that has minimum AIC value. AIC is defined as:

AIC = 2k−2ln(L)

where k is the number of parameters in the model and L is the likelihood function. AIC is a goodness of fit measure for models, taking into consid-eration both their accuracy and complexity determined by their number of parameters. Each of the modeling methods is described briefly below.

AR model

AR model is a linear autoregressive model, where we forecast the variable of interest using a linear combination of past values of the variable. Thus an AR model of order p can be written as:

y(t) =^P^p_i=1φ(i).y(t−i) +(t)

where y_t is the time series sample at time t, p is the model order, φ₁, ..., φ_p are parameters of model, c is a constant and (t) is white noise. There are many ways to estimate the parameters φ_i. Among those methods, we selected Yule-Walker method of parameter estimation. Employing AR to fit the cloud platform traces is substantially smooth operation, but it does not always produce precise results. We used the approach proposed in R’s ar() function [102].

6.4. Prediction Approaches

ARIMA model

ARIMA is an extension of the AR model with autoregressive and moving average terms. It predicts future movements of time series using differences between values in the series instead of using actual data values. Lags of the differenced series are denoted as “autoregressive” and lags within forecasted data are denoted as “moving average”. An ARIMA model of order (p, d, q) can be written as:

1−^P^p_i=1φ_i.Lⁱ(1−L)^dX_t=1 +^P^q_i=1θ_iLⁱ_t

where L is the lag operator, p is autoregressive order, d is the integration order, q is the moving average order and θ_i is the i-th moving average parameter. We used the function auto.arima() from R’s forecast package [124], which implements a unified approach to specify the model parameters.

This approach also considers the seasonality of the trace.

Innovative state space models for exponential smoothing (ETS) Forecasts produced using exponential smoothing methods are weighted av-erages of past observations, where weights decrease exponentially as the observations get old. The simplest form of exponential smoothing is given by the formulae:

s0 =x0

st =αxt+ (1−α)st−1, t >0

In literature, there exists many different models for which the various ver-sions of exponential smoothing are optimal. One such class of models is innovative state space models for exponential smoothing. This class of models is very general and includes linear and non-linear models. Here we used the ets() function from the forecast package [124]. In this framework, every exponential smoothing method has two relating state-space models, each with a single source of error (SSOE). One model has an additive error and the other has a multiplicative error. Therefore, in total there exist 30

such state space models: 15 with additive errors and 15 with multiplicative errors. The framework estimates each model’s parameters by maximizing the “likelihood”.

6.4.2. Classification Algorithms

The second class of methods we considered was estimation-classification prediction models of the feature space. These methods are based on the assumption that performance anomalies manifest in system-level metrics and they consider the large volume of system metrics to predict the future QoS measures. System metrics are periodically sampled at fixed intervals (e.g. each minute). The result will be a time-series X containing a sequence of the last w observations:

X =x_t, xt−1, xt−2, ..., xt−w+1

The estimation-classification approach can be divided into two parts. The first part estimates the future values of the system metrics using time-series prediction techniques. While, the second part consists of classification of these predicted values. Several classification approaches can be used, among them we have chosen to use the Naive Bayes classifier, random forest, decision tree and support vector machine (SVM). In the previous section, we have already discussed the time series analysis methods, now we will briefly describe the chosen machine learning algorithms below.

Naive Bayes

The Naive Bayes classifier is an elementary probabilistic classifier based on Bayes rule along with crucial (naive) independence assumptions. The clas-sifier suppose that the presence of a certain feature of a class is unassociated to the presence of any other feature. Regardless of their simple design and assumptions, the method has been shown to perform considerably well in numerous real world domains. An superiority of the method is that it re-quires slightly small-scale training data to estimate parameters essential for classification [75].

Im Dokument Distributed Anomaly Detection and Prevention for Virtual Platforms (Seite 117-121)