• Keine Ergebnisse gefunden

The concept for static and dynamic resource management can be divided into three phases.

The purpose of the first and second phase is to determine a static assignment of services to servers. In the final third phase, services are also dynamically reassigned to servers according to their current resource demand. Unused servers are powered down to save energy. An overview of the three phases and the tasks that are performed within each phase is given in Figure 3.3.

pessimistic static statistic static dynamic

resource management

planning operating

pessimistic static distribution optimized static distribution

• adapt models to changes by measured resource demand

• dynamically

redistribute services to servers according to their demand

• power on and off servers

online optimization

• determine maximal required resources by performing

benchmarks

• distribute services to servers guaranteeing maximal required resources all the time offline characterization

• model the demand behavior of services using observed data

• create optimized static distribution of services to servers considering the demand behavior online characterization

Figure 3.3: The three phases of the static and dynamic resource management concept and tasks that are performed within. The black colored tasks are addressed within this thesis. Tools and concepts already exist for performing the light gray task. Hence, it is only shortly discussed.

Benchmarks must be performed during the first phase comparable to classical capacity planning [8]. They are needed to determine the resources required by a service in case of the maximal expected workload defined in the SLOs. An algorithm then determines a distribution of the services to servers based on this information. The resulting distribution very pessimisti-cally provides the maximally required resources to all services all the time. It will work without violating any performance goals assuming that the maximal workload is not exceeded.

For the following second phase, services are deployed to servers according to that pessimistic

22

3.2 Conceptual View

distribution. Their resource demand behavior is observed while users are now doing their normal work with them. Models are characterized that describe the resource demand behavior.

Finally, an optimized but also static distribution of services to servers is derived based on these models. This new distribution can require less servers compared to the pessimistic one depending on the workload behavior. This optimized static distribution will also work without any SLO violations under the assumption that the workload behavior does not change with respect to the observed one.

The third phase begins with the optimized static distribution. The data center now operates in its normal mode. The services are redistributed to servers according to their current resource demand using the models trained in phase two. Unused servers are switched off to save energy.

Furthermore, the models are adapted by ongoing measures of the resource demand to take care of changed workload behavior.

The tasks to be performed within each phase will be worked out some more in the following to extract challenges that need to be addressed.

3.2.1 Pessimistic Static Resource Management

Hardly something is known about the maximally required resources when new services should be deployed in a data center. Nothing is known about the resource demand behavior at all.

But this information is required for proper resource management especially when different service should share the same server as targeted in this thesis.

Hence, benchmarks are typically performed on the services in an isolated environment first.

The benchmarks simulate the workload that is maximally expected according to the SLOs.

The resource capacity provided to the service can then be adjusted so that the performance goals defined by SLOs as well are achieved. This approach is already well known from ceiling based capacity planning [8]. Hence, this task will not be detailed much deeper within this thesis. Different concepts and tools [55, 122, 85] already exist for performing this tasks. A good overview is presented in [135].

Once the maximal required resources are found for each service, there is still no informa-tion present about the demand behavior at runtime. But this informainforma-tion is indispensable to determine the optimized static distribution and especially to perform dynamic resource man-agement. Hence, the services must be observed while users are doing their normal work with them to learn about the demand behavior. Therefore, an algorithm is needed that distributes services to servers in a way that none of the SLOs related to throughput constraints (cf. Sec-tion 3.1.1) is violated. Any violaSec-tion will lead to invalid resource demand values observed.

Such an algorithm can only draw on maximally required resources already determined. Hence, only a pessimistic distribution of services to servers is possible. This distribution ensures the maximal required resources to each service all over the time. An algorithm that finds such a

3 Problem Statement

distribution with a minimal number of required servers is presented in Chapter 4.

3.2.2 Optimized Static Resource Management

This phase aims to use knowledge about the demand behavior of the services to find a new distribution of services to servers. This distribution should require less servers compared to the pessimistic one in best case.

It was shown in [102, 117, 58] that typical services require their maximal resources only in a very small fraction of time. Furthermore, different services rarely require their maximum all at the same time. Regarding these facts, services can be assigned to less servers compared to the pessimistic approach. So called statistical static resource management approaches are based on this idea.

It is aimed to use such an approach within this second phase to reduce the overall number of servers required for a given set of services. But it has to be noticed that such approaches are overbooking hardware resources. This can lead to performance reductions caused by resource shortages in some cases. It must be guaranteed that a certain probability as well as a certain strength of performance reduction is not exceeded to use these approaches in real data centers.

Both parameters must be defined as SLOs in the SLA between the client and the Service Provider.

Appropriate modeling the resource demand of the services is essential to apply statistical approaches. Especially missing stationarity and possible correlations between the resource demand of different services must be considered. Furthermore, SLOs are needed that describe the performance goals in an appropriate way. Finally, an algorithm that based on the models distributes the services to servers with respect to the SLOs is needed. These challenges are addressed in this thesis in Chapter 5.

3.2.3 Dynamic Resource Management

The dynamic resource management phase starts with the optimized static distribution of services to servers obtained at the end of phase two. It is assumed that all SLOs that concern throughput constraints are satisfied at any time. Services do not have to be redistributed as discussed before.

Services can be now consolidated to fewer servers in times of less overall resource demand.

Unused servers can be switched off to save energy. Servers must be reactivated and VMs must be redistributed, when the resource demand increases to prevent resource shortages. The whole static distribution must be restored in worst case.

The main challenge to be addressed is that redistributing services and powering up servers takes time. The dynamic scheduling approach must ensure that the resource demand of the

24