• Keine Ergebnisse gefunden

ARPP Online Pricing

6.5 OFM Evaluation

6.5.1 Methodology

Resource pricing is of strategic importance for any service provider. Generally, the inter-nal pricing mechanisms are private due to the fear of losing a competitive edge over other service providers. We believe that cloud pricing data of service providers are unavailable.

Balserioet al.[128] generate stochastic dataset based on Google AdX real data to evalu-ate their AdX placement algorithms. Bevalu-ateniet al.[49] extend this dataset by augmenting volatile information to model the sensitivity of shocks due to social and news trends such as negative publicity about the resources in the news.

Each dataset consists of varying advertisers (6 to 101) and impression types (7 to 406).

The number of advertisers and impressions are different in each dataset. Also, the utilities of the advertisers are sparse in one data set while dense in another dataset. Arrivals are assumed to be the Ornstein-Uhlenbeck process. An Ornstein-Uhlenbeck process is a diffu-sion process for modeling the velocity of a particle in Brownian motion and widely used in mathematical finance to model market prices and volatility. The parameters for Ornstein-Uhlenbeck process are estimated on the dataset presented in [128] and emulated arrival data is generated by preserving statistical properties of the real dataset. The mean values of im-pression type jand advertiseriis considered as utilityui j. The estimation methodology can be found in [49].

We divide our evaluation into two scenarios namely the case of afixed set of resourcesand the case of avarying set of resources. The latter scenario is more generic and challenging than the earlier. We measure the regret and competitive ratio ofOFM. Regret is the distance between our OFM online algorithm objective without hindsight and an optimal algorithm with hindsight. Conversely, the competitive ratio is the ratio ofOFM online solution and optimal offline solution.

6.5.1.1 Fixed resource set

It is evident from Section 6.2 that buyer utility affects the resource prices but not the number of buyers. The modified AdX dataset [49] cannot be directly applied for evaluatingOFM

since the probability distribution is limited to an Ornstein-Uhlenbeck process. We perform the following steps to modify the dataset for evaluatingOFM.

• Each impression type is treated as a resource and advertisers are treated as buyers.

Let Λtj be the Ornstein-Uhlenbeck arrival rate of the resource j at time interval t.

In our case, we treat mean values of impression type j and advertiser ias a base utilityui j and generate a new utilityui j for a resource and a buyer at every period as the product of base utility and Ornstein-Uhlenbeck arrival rate of the resource, i.e., uti j=ui jΛtj. In this way, we ensure that the volatility of buyer utilities in every period t. InOFMrandom permutation model, the expectation of the data varies at each time interval. Hence, maintaining volatility captures the random permutation scenario in the evaluation.

• In a real marketplace, buyers arrive with different budgets, and it is essential to incor-porate them in ourOFMevaluation. In the dataset of [49], the budget of an impression type is calculated based onCj, the total number of the impression type of jwhich advertisers are willing to buy. We also calculate budget along similar lines. In our case, the budget of buyeriis given bybi= ∑jui j

{j:ui j >0} .

• The goal ofOFM is to handle utilities from a different distribution. Hence, we use different distributions for generating utilities, namely uniform and normal distribu-tions. A uniform distribution is a simple and widely used distribution and the utility is generated in the interval[0,1]uniformly. According to the central limit theorem, non-heavy tailed distribution over a period will converge to a normal distribution [70].

Hence, evaluation on a normal distribution guarantees similar behavior as in other non-heavy-tailed distributions.

In summary, the AdX dataset is modified to evaluate with different probability distributions of a buyer’s utility for a fixed set of resources scenario.

6.5.1.2 Varying resource set

Consider the scenario where the resources vary every time instance. Hence, theOFMsolution set is frequently modified. Since the number of resources offered at the current time instance is revealed only after current prices are predicted, this scenario is not trivial. Furthermore, this is a typical real case scenario in cloud subletting. In cloud subletting, the users can monetize their unused or underutilized resources by subletting to other users [138]. The service provider can act as a broker for subletting resources. At every period, interested users can submit underutilized resources to the service provider. The service provider can sublet for a specified period. Hence, the service provider lacks complete knowledge of the total number of resources offered at every instance.

99 6.5 OFMEvaluation The AdX dataset cannot be used for this scenario since both buyers and resources are fixed. Hence, we perform a trace-based simulation forOFMevaluation in this scenario.

We use a Google cluster data trace [129] of around 12.5kmachines collected over a period of 29 days in a Google data center. In this dataset, jobs arrive and VMs are allocated for the execution. Each job has different CPU requirement and hence, different VMs are allocated which eventually leads to different CPU usage and CPU demand. In other words, the CPU demand is not uniform for all time intervals and depends on the demands of incoming jobs.

We use this demand information to simulate the demand behavior of resources in OFM.

We generate the number of resources offered at each time instance by randomly sampling CPU usage without replacement from the 41GBdataset since random permutation model is sampling without replacement. Furthermore, we assume that service providers introduce more resources during high demand time periods. We perform time series analysis and use Box Jenkin’s method to build an ARIMA model to predict the CPU usage for future prediction.OFMuses this prediction information to set the prices for current time instance.

The results ofOFMevaluation for both the scenarios are presented in the next subsection.

6.5.2 Results

6.5.2.1 Predicting number of offered resources

We use three types of time series models namely AR (autoregressive), MA (moving aver-age) and ARIMA. In AR model, the output is regressed from the previous values. Similarly, in MA model, the output is regressed from the residual of the previous values. ARIMA combines both AR and MR. In other words, ARIMA forecasts the current output by tak-ing previous values and residuals into account. Apart from time series models, we perform additional prediction. In the first method, the current prediction is the mean of all the pre-vious values. In the second method, the current prediction is the immediate past value. The result of time series modeling and additional approaches of the randomly sampled CPU us-age is presented in Figure 6.2. We tested the series for non-stationary of CPU usus-age using Dickey-Fuller test (test for finding stochastic process affecting time series statistical prop-erties). The sampled series is stationary with 99% confidence level. We determined order (number of past data in time series) and moving average statistically using autocorrelation plots 6.1. The mean absolute error of the prediction approaches is presented in Table 6.2.

It is evident from the Table 6.2 that ARIMA outperforms other approaches. However, the immediate previous approach is not only computationally more straightforward than the rest of the approaches but also closer to ARIMA forecast. For time-sensitive applications, the immediate previous approach is an ideal candidate for predicting the number of offered

0 50 100 150 200 Lags

0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0

Correlation

Autocorrelation

0 50 100 150 200

Lags