Additional Benefits of the EPOP Model and Driver Programs

ing 7.1. When comparing the original MPI code to the EPOP conversion, it can be seen that once the program structure is created in theget program()routine, the programmer no longer needs to track the branching structure of the application and can instead focus on the body of the computation and adaptation blocks of both elastic phases. In addition to this, the structure is exposed at the beginning of the application to the driver program;

this structure can then be shared with other components such as the resource manager, a performance analyzer, etc. Finally, the driver program can report application progress to the resource manager or a tracker without extra development effort by the application programmer.

7.3.3 Program Structure

The program structure is a collection of program element instances and the application data. The array of program elements included in the structure is traversed by the driver program. The data is passed to each computational block.

This array is created in theget program()routine of an EPOP application, as seen in the code snippet presented in Listing 7.6. The structure of an EPOP program is overlaid on the flat array of program elements. This array is not allowed to be modified at runtime. The driver program traverses it by modifying the program counter and calling each program element based on its type, as seen in the example code of the driver program in Listing 7.3.

When a computational block completes, the program counter is incremented by one except when it is computed by a branch program element. Drivers are in complete control of the execution of EPOP programs through their program counters.

typedef s t r u c t { void * data ; i n t s i z e ;

program element t * elements ; } program t ;

Listing 7.7: C code of the program structure.

7.4 Additional Benefits of the EPOP Model and Driver Programs

The definition of phases in EPOP programs can help improve the quality of performance modeling techniques. The structure of programs is known before the application starts. A driver program can serialize the program structure of an EPOP application and transfer it to a resource manager or a monitoring program. A performance profile can then be pop-ulated by annotating the program structure with performance data. This can be done on completion or partially with updates at runtime. The types of the phases of an application are also known; because of this, performance modeling routines can be enabled selectively for EPs, while rigid-phases may be ignored especially when they do not loop.

Driver programs can also implement progress reporting transparently. A driver pro-gram can track phase changes and the iterations of phases that loop, and then transfer progress information to a resource manager. This can help increase the quality of resource adaptation and scheduling decisions, since the resource manager can better predict the fu-ture time when an application will complete. The rate of progress can be evaluated at dif-ferent application processes to evaluate their load balance. The collective rate of progress

7 Elastic-Phase Oriented Programming (EPOP)

can also be used to evaluate the quality of resource adaptations, since they correlate well with efficiency metrics.

A pausing mechanism can be added to driver programs. This can make preemption possible. It can also allow for the timing of applications or specific phases to be controlled.

For example, the start time of an application can be delayed. Timing control can be useful for schedulers that try to minimize the idle times of nodes, since application processes may be created early if there is sufficient free memory in the nodes of an allocation. These processes can then be released immediately after the preceding application completes.

8 Resource Management in High Performance Computing

Resource managers for distributed memory systems have traditionally differed signifi-cantly from their shared memory counterparts. In general, implementations have tried to solve very different problems, and therefore relied on entirely different approaches and scheduling algorithms.

In this chapter, a brief introduction to resource management in shared memory systems is provided. Afterwards, resource management is covered more specifically in the context of distributed systems. Towards the end of the chapter, the state of the art of resource management in HPC is summarized. Additionally, necessary changes needed to support of elastic execution in distributed memory HPC systems are identified. These are used as motivation for the development of the resource manager presented in Chap. 9. Finally, an overview of the implementation of the SLURM resource manager is provided, since it was used as basis for the elastic resource manager presented in this work.

8.1 Resource Management in Shared Memory Systems

Computers had no operating systems in earlier times. Early compute systems used to run a single program until it completed. There was no need for resource management since the program had exclusive access to all the resources in the machine. Subsequent developments in computer hardware and software made it possible to share resources among multiple programs in a single computer. Once resources could be shared, there was the need for arbitration in cases of contention. Operating systems provide this needed arbitration and are the resource managers found in shared memory systems today.

Shared memory schedulers have been developed as part of operating systems. These execute tasks immediately without queuing, with some exceptions. For example, a sys-tem may create tasks periodically based on its time-based job scheduler; however, when the task is started it runs immediately. Work is not postponed or queued. Compute and other resources are shared in time as necessary. There are some exceptions, such as in the case of real-time systems, where space-sharing is still favored over time-sharing. Granting exclusive access to resources with space-sharing helps ensure that any required deadlines are met in real-time systems.

Processing units first come to mind when thinking of hardware resources that need man-agement. In addition, an operating system needs to arbitrate access to IO (such as hard drives and other long term storage devices), network, random access memory, audio and video devices, keyboard and mouse, etc. Current systems support the concurrent execu-tion of tasks. While having large numbers of tasks running concurrently has the benefit of increasing resource utilization, it also has the disadvantage of creating resource contention.

The scheduler of an operating systems controls the placement of tasks in processing resources and their timing to minimize contention. Different trade-offs are made by real-time, desktop and server oriented schedulers in shared memory systems. Schedulers can

8 Resource Management in High Performance Computing

be optimized for real-time guarantees or for throughput. Real-time schedulers support the exclusive allocation of resources in space-sharing mode, while throughput optimized operating systems tend to operate in time-sharing mode exclusively. Desktop and server schedulers are throughput optimized. These differ in that resource utilization is maxi-mized for servers, while a compromise between response time and resource utilization is targeted for desktop systems.

As mentioned before, HPC systems today are a collection of nodes interconnected with a high-performance network. Each of these nodes is a shared-memory system with its own local operating system. Their schedulers are throughput optimized and operate in time-sharing mode. Time-time-sharing is achieved through preemption on these schedulers: started tasks use a processing element, such as a CPU, for a time slot before being interrupted by the operating system to give a time slot to another task. Most algorithms employ a round robin strategy, where the time of computing resources is distributed equally among tasks of equal priority.

Im Dokument Resource-Elasticity Support for Distributed Memory HPC Applications (Seite 77-80)