ATLAS computing components - Data intensive ATLAS workﬂows in the Cloud

price-per-GB are reached, it is quite possible that one technology will be replaced by another. An example would be, that instead of HDDs, only SSDs are purchased. However, it looks like this point lies far in the future.

4.4.3 Evolution

Even though most of the WLCG’s infrastructure persists for several years, the WLCG is in flux. This becomes apparent when looking at the transition out of the MONARCH model. There are several possible future courses on which this development can continue.

One viable direction is to consolidate the storage further into bigger sites. None of the smaller sites would host any data anymore, except for caching. The extreme of that scenario would be to have one big storage facility per continent. The reason for this centralisation is that the storage is the most manpower and maintenance intensive part of the WLCG. Afterwards, fewer experts and personnel would be required to handle the upkeep of the storage, relieving especially smaller sites.

Another path would be to shift the computing more and more into the Cloud. This would most probably be done at the site level. Each WLCG site has pledges it has to fulfil. It does not matter where the pledged resources originate from. The resource origin could even be completely transparent from outside of the site. A second option would be to spend budget on the WLCG level for Cloud computing, for example during times when there is a peak in demand. These considerations depend on whether the Cloud infrastructure will be cheaper than acquiring and running the infrastructure by the sites/WLCG themselves. The situation can be different for each site individually, depending on factors such as the country they are located in. A summary of the viability and cost considerations of the Cloud can be found in Section 9.

4.5 ATLAS computing components

Below, some ATLAS specific Grid components/implementations are introduced. These are introduced briefly, as they are relevant for an understanding of later chapters.

4.5.1 XRootD

XRootD is a system designed to enable access to data repositories [74]. It is used in HEP for concurrent access to repositories containing multiple petabytes of data. Scala-bility and high performance were important design parameters that were incorporated alongside with features such as authentication [74]. Throughout this thesis, when using remote data access, it is done via XRootD.

4.5.2 Athena

Athena is the ATLAS control framework. It handles all levels of ATLAS data processing, such as simulation, reconstruction, and analysis [75] [76]. Throughout this thesis it was used to execute the different workflows that were tested and examined. Athena was created with the goal in mind to keep the data and algorithms, as well as transient and persistent data, separate.

Most notably is that Athena includes a performance and resource monitoring, which was used in some cases to compare the performance of different VMs with each other [76].

Also worth noting is that Athena uses Python as a scripting language. This allows for a job steering and configuration, that can be understood and reproduced easily. In this thesis, multiple completely different workflows are used. Each of these uses tens of thousands of different lines of code and there are several hundreds of different software versions and millions of different input files. By providing the steering and configuration files for the different workflows (in the Appendix), each of the used workflows can be reproduced.

4.5.3 AthenaMP

AthenaMP (Athena Multi Process) is the framework that lets Athena run in a multi-core environment.

ATLAS works hard on not running into the hard RAM limit. Saving memory is one reason why many workflows have been parallelised within the multi-process framework AthenaMP. The idea behind this is, that a single-process forks into multiple ones, which share parts of their memory. The shared memory reduces the overall memory footprint on a multi-core machine, by decreasing the redundancies.

An important setting that is inherent to AthenaMP is the number of paral-lel processes that it spawns. This is steered by setting the environment variable

“ATHENA PROC NUMBER”. Setting it to zero results in single-process execution.

On the WLCG this setting is used to set the number of parallel processes equal to the number of cores that a VM on the Grid has, which can deviate from one VM to another.

In Chapter 8 it is additionally used to deviate from this setting and run a larger or smaller number of parallel processes.

4.5.4 PanDA

The Production and Distributed Analysis (PanDA) workload management system is responsible for processing Monte-Carlo simulations, performing the data reprocessing and executing the user and group production jobs [77].

In Figure 4.10, it is shown how jobs and production jobs are submitted by the user and the production managers. All job information and the task queue is then handled centrally by PanDA [78]. This means PanDA takes care of the entire scheduling.

Depending on the job requirements, the jobs get subsequently scheduled to matching available resources. This is done using the pilot model [79] [80] [81]. Pilot jobs are

4.5 ATLAS computing components

Figure 4.10: Schematic of the PanDA System [78] - DQ2 has been updated to Rucio.

basically “place-holders” for the actual payload that have been sent by the pilot factory to a batch system or Grid site. It prepares the computing element, and then pulls the job, executes it, and cleans up afterwards. This means the pilot job also handles the data stage-in and stage-out. All tests performed in this thesis were run in a controlled environment, therefore outside of the Grid, PanDA, and the pilot model. This is important to keep in mind for data staging considerations, as it is usually done by the pilot, see Subsection5.3.2.

Another important aspect of PanDA is its monitoring capability. It collects a plethora of metrics of each job, containing for example: the CPU consumption time, the job duration, and the input size. This information can be accessed from the web and is stored and replicated to multiple places, one of which being the analytix cluster at CERN IT. The analytix cluster was used to perform data analytics in Chapter7.

4.5.5 Rucio

Rucio is the new version of the ATLAS Distributed Data Management (DDM) sys-tem [82] [83]. It evolved from Don Quijote 2, which was previously used. It handles the accounts, the distributed storage systems and the distribution of the ATLAS data, including files and datasets. For easy usage it has a CLI and a web interface which makes the replication of datasets easier for users. Rucio was used to locate, move, and repli-cate input datasets for all the workflows that required input data in this thesis. It was important especially for moving the data, when investigating the workflow performance

in dependence to the input data location, see Subsection8.2.2.

4.5.6 JEDI

The Job Execution and Definition Interface (JEDI) is a PanDA component that was implemented in order to have a workload management at the task level [84]. It translates tasks definitions from the Database Engine For Tasks (DEFT) that user requested into jobs that are then executed by PanDA.

4.5.7 CVMFS

The CernVM File System (CVMFS) is used across the HEP experiments in order to access their software and conditions data [85] [86]. It is a read-only file system capable of delivering the necessary data to all different kinds of VMs located on the WLCG via HTTP, making use of caching. It acts like a cache for the ATLAS software and data.

A short investigation into the impact of the CVMFS cache on a workflow was done in Subsection 5.3.2. Throughout this thesis, all VMs were using CVMFS.

4.5.8 Tags

The TAG data is metadata, containing information about key quantities of events, which make it easier and faster to select specific events for a physics analysis [76] [87]. Individual physics events can be identified and selected via their TAG data, that is stored in a relational database.

4.5.9 AMI

The ATLAS metadata is accessed via the ATLAS metadata interface (AMI). It enables the aggregation of distributed meta data and the retrieval in web applications [88]. It is the tool for dataset selection within ATLAS [89]. The AMI components include the Tag Collector, which manages the various releases of the ATLAS software [88]. The processing history of datasets is saved using the AMI-Tags interface [88].

Im Dokument Data intensive ATLAS workﬂows in the Cloud (Seite 49-52)