• Keine Ergebnisse gefunden

The creation of a more flexible infrastructure for resource management may be the best long term option. New resource management research such as that of the Flux [31] project may benefit resource-elasticity greatly. A more scalable, modular and hierarchical ap-proach to resource management may be necessary to better support resource-elasticity at exascale.

Better performance models and analysis techniques can be added that produce the nec-essary ranges used as input by the Elastic Runtime Scheduler (ERS). These could consider additional performance metrics, progress reports and adaptation measurements. Since the design is modular, multiple performance models may be implemented in the future.

Energy optimization will also be of greater importance at exascale. Energy metrics need to be measured by the infrastructure. New performance models that consider energy met-rics need to be added. Multi-objective optimizations that optimize performance and en-ergy need to be developed in the future. Additionally, strategies for system-wide power-level stabilization and power capping will be required.

16.3 Elastic Resource Management

Machine learning and other history based techniques may be added to the scheduler and its performance model. These techniques have great potential in optimization problems such as scheduling. Given the highly dynamic nature of the presented system and the high cost of adaptations due to distributed memory, any technique that improves the quality of predictions is of great importance.

The pattern detection technique should continue to be improved. Currently it is effective at the detection of SPMD patterns, but it should be extended to handle other patterns as well. Master-worker patterns can be supported by representing them as separate SPMD blocks that are coupled. Additionally, support for arbitrary MPMD patterns is desirable.

Resource management may be performed on intra-node resources, such as cores and memory. Memory should be tracked as a resource by the scheduler and the resource man-ager. If memory usage is known, then the minimal number of nodes specified by users can be taken as a hint, instead of as a fixed constraint. This can further benefit the minimization of idle node counts by the scheduler.

Topology optimizations should be explored in the future. For this, first theSRUN pro-gram should be extended with a migration function. After that, full migrations from frag-mented allocations to dense allocations can be performed on jobs so that their network performance is efficient. The MPI library may be extended with communication pattern detection mechanisms to support this.

The quality of the time balancing operation in the runtime scheduler depends on the accuracy of the remaining time estimation of the job. This estimation is currently provided by users and is very unreliable. New modeling techniques that predict the remaining run times of jobs should be developed. Additionally, progress reporting APIs can be added to allow applications to report their progress and an estimation of their remaining time. The EPOP model can include a way to report the current iteration number and bound of the loops in Elastic-Phases (EPs).

Bibliography

[1] Standard for information technology–portable operating system interface (posix(r)) base specifications, issue 7. IEEE Std 1003.1, 2016 Edition (incorporates IEEE Std 1003.1-2008, IEEE Std 1003.1-2008/Cor 1-2013, and IEEE Std 1003.1-2008/Cor 2-2016), pages 1–3957, Sept 2016.

[2] Caliper: Application Introspection System. http://computation.llnl.gov/

projects/caliper, 2017. [Online].

[3] Charm++: Parallel Programming with Migratable Objects. http://charm.cs.

illinois.edu/research/charm, 2017. [Online].

[4] Clang: a C language family frontend for LLVM. https://clang.llvm.org/, 2017. [Online].

[5] GNU Hurd. https://www.gnu.org/software/hurd/hurd.html, 2017. [On-line].

[6] MPICH: High-Performance Portable MPI. http://www.mpich.org, 2017. [On-line].

[7] Open MPI: Open Source High Performance Computing. https://www.

open-mpi.org/, 2017. [Online].

[8] OpenFabrics Alliance. http://openfabrics.org/, 2017. [Online].

[9] OpenMP: An API for multi-platform shared-memory parallel programming in C/C++ and Fortran. http://www.openmp.org, 2017. [Online].

[10] Parallel Virtual Machine (PVM). http://www.csm.ornl.gov/pvm/, 2017. [On-line].

[11] SchedMD. http://www.schedmd.com/, 2017. [Online].

[12] Simple Linux Utility For Resource Management. http://slurm.schedmd.com/, 2017. [Online].

[13] SuperMUC Petascale System. https://www.lrz.de/services/compute/

supermuc/, 2017. [Online].

[14] The Barrelfish Operating System. http://www.barrelfish.org/, 2017. [On-line].

[15] The FreeBSD Project.http://www.freebsd.org, 2017. [Online].

[16] The Linux Kernel. http://www.linux.org/, 2017. [Online].

[17] The LLVM Compiler Infrastructure.http://llvm.org/, 2017. [Online].

Bibliography

[18] The MPI 4.0 standardization efforts. http://mpi-forum.org/mpi-40/, 2017.

[Online].

[19] Top 500 Supercomputers.http://www.top500.org/, 2017. [Online].

[20] Vampir - Performance Optimization. https://www.vampir.eu/, 2017. [Online].

[21] X10: Performance and Productivity at Scale.http://x10-lang.org/, 2017. [On-line].

[22] Umut A. Acar, Arthur Chargueraud, and Mike Rainey. Scheduling parallel programs by work stealing with private deques. InProceedings of the 18th ACM SIGPLAN Sym-posium on Principles and Practice of Parallel Programming, PPoPP ’13, pages 219–228, New York, NY, USA, 2013. ACM.

[23] Bilge Acun, Abhishek Gupta, Nikhil Jain, Akhil Langer, Harshitha Menon, Eric Mikida, Xiang Ni, Michael Robson, Yanhua Sun, Ehsan Totoni, Lukasz Wesolowski, and Laxmikant Kal´e. Parallel programming with migratable objects: Charm++ in practice. InProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’14, pages 647–658, Piscataway, NJ, USA, 2014.

IEEE Press.

[24] T. Agarwal, A. Sharma, A. Laxmikant, and L. V. Kal´e. Topology-aware task mapping for reducing communication contention on large parallel machines. InProceedings 20th IEEE International Parallel Distributed Processing Symposium, pages 10–20, April 2006.

[25] A. M. Agbaria and R. Friedman. Starfish: fault-tolerant dynamic MPI programs on clusters of workstations. InProceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469), pages 167–176, 1999.

[26] Kunal Agrawal, Yuxiong He, Wen Jing Hsu, and Charles E. Leiserson. Adaptive scheduling with parallelism feedback. InProceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’06, pages 100–

109, New York, NY, USA, 2006. ACM.

[27] Xavier Aguilar, Karl F ¨urlinger, and Erwin Laure. MPI trace compression using event flow graphs. InEuro-Par 2014 Parallel Processing: 20th International Conference, Porto, Portugal, August 25-29, 2014. Proceedings, pages 1–12. Springer International Publish-ing, 2014.

[28] Xavier Aguilar, Karl F ¨urlinger, and Erwin Laure. Automatic on-line detection of MPI application structure with event flow graphs. InEuro-Par 2015: Parallel Processing:

21st International Conference on Parallel and Distributed Computing, Vienna, Austria, Au-gust 24-28, 2015, Proceedings, pages 70–81, Berlin, Heidelberg, 2015. Springer Berlin Heidelberg.

[29] Xavier Aguilar, Karl F ¨urlinger, and Erwin Laure. Visual MPI performance analysis using event flow graphs. Procedia Computer Science, 51:1353 – 1362, 2015.

[30] Xavier Aguilar, Karl F ¨urlinger, and Erwin Laure. Event flow graphs for MPI per-formance monitoring and analysis. In Tools for High Performance Computing 2015:

Bibliography

Proceedings of the 9th International Workshop on Parallel Tools for High Performance Com-puting, September 2015, Dresden, Germany, pages 103–115, Cham, 2016. Springer In-ternational Publishing.

[31] Dong H. Ahn, Jim Garlick, Mark Grondona, Don Lipari, Becky Springmeyer, and Martin Schulz. Flux: A next-generation resource management framework for large HPC centers. In 10th International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems. IEEE Computer Society, 2014.

[32] Michail Alvanos, Montse Farreras, Ettore Tiotto, and Xavier Martorell. Automatic communication coalescing for irregular computations in UPC language. In Proceed-ings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research, CASCON ’12, pages 220–234, Riverton, NJ, USA, 2012. IBM Corp.

[33] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy H. Katz, Andrew Konwinski, Gunho Lee, David A. Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. Above the clouds: A berkeley view of cloud computing. Technical Report UCB/EECS-2009-28, EECS Department, University of California, Berkeley, Feb 2009.

[34] Axel Auweter, Arndt Bode, Matthias Brehm, Luigi Brochard, Nicolay Hammer, Her-bert Huber, Raj Panda, Francois Thomas, and Torsten Wilde. A case study of energy aware scheduling on SuperMUC. In Supercomputing: 29th International Conference, ISC 2014, Leipzig, Germany, June 22-26, 2014. Proceedings, pages 394–409, Cham, 2014.

Springer International Publishing.

[35] E. Ayguade, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Un-nikrishnan, and G. Zhang. The design of OpenMP tasks.IEEE Transactions on Parallel and Distributed Systems, 20(3):404–418, March 2009.

[36] Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ew-ing Lusk, and Rajeev Thakur. PMI: A scalable parallel process-management interface for extreme-scale systems. InRecent Advances in the Message Passing Interface: 17th Eu-ropean MPI Users’ Group Meeting, EuroMPI 2010, Stuttgart, Germany, September 12-15, 2010. Proceedings, pages 31–41, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.

[37] Richard F Barrett, Philip C Roth, and Stephen W Poole. Finite difference stencils implemented using Chapel. Oak Ridge National Laboratory, Tech. Rep. ORNL Technical Report TM-2007/122, 2007.

[38] Adam Beguelin, Jack Dongarra, Al Geist, Robert Manchek, Keith Moore, and Vaidy Sunderam. PVM and HeNCE: Tools for heterogeneous network computing. In Soft-ware for Parallel Computation, pages 91–99, Berlin, Heidelberg, 1993. Springer Berlin Heidelberg.

[39] Francine Berman, Richard Wolski, Henri Casanova, Walfredo Cirne, Holly Dail, Marcio Faerman, Silvia Figueira, Jim Hayes, Graziano Obertelli, Jennifer Schopf, Gary Shao, Shava Smallen, Neil Spring, Alan Su, and Dmitrii Zagorodnov. Adaptive computing on the grid using apples.IEEE Trans. Parallel Distrib. Syst., 14(4):369–382, April 2003.

Bibliography

[40] Maciej Besta and Torsten Hoefler. Fault tolerance for remote memory access pro-gramming models. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, pages 37–48. ACM, 2014.

[41] Abhay Bhadani and Sanjay Chaudhary. Performance evaluation of web servers us-ing central load balancus-ing policy over virtual machines on cloud. InProceedings of the Third Annual ACM Bangalore Conference, COMPUTE ’10, pages 16:1–16:4, New York, NY, USA, 2010. ACM.

[42] Milind Bhandarkar, L. V. Kal´e, Eric de Sturler, and Jay Hoeflinger. Adaptive load balancing for MPI programs. InComputational Science - ICCS 2001: International Con-ference San Francisco, CA, USA, May 28—30, 2001 Proceedings, Part II, pages 108–117, Berlin, Heidelberg, 2001. Springer Berlin Heidelberg.

[43] Abhinav Bhatel e, Eric Bohm, and Laxmikant V. Kal´e. Optimizing communication for Charm++ applications by reducing network contention.Concurrency and Compu-tation: Practice and Experience, 23(2):211–222, 2011.

[44] A. Bhatele and L. V. Kal´e. Application-specific topology-aware mapping for three dimensional topologies. In 2008 IEEE International Symposium on Parallel and Dis-tributed Processing, pages 1–8, April 2008.

[45] Wesley Bland, Aurelien Bouteiller, Thomas Herault, Joshua Hursey, George Bosilca, and Jack J. Dongarra. An evaluation of user-level failure mitigation support in MPI.

InRecent Advances in the Message Passing Interface: 19th European MPI Users’ Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, pages 193–

203, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.

[46] J. Blazewicz, M. Machowiak, G. Mouni´e, and D. Trystram. Approximation algo-rithms for scheduling independent malleable tasks. InEuro-Par 2001 Parallel Process-ing: 7th International Euro-Par Conference Manchester, UK, August 28–31, 2001 Proceed-ings, pages 191–197, Berlin, Heidelberg, 2001. Springer Berlin Heidelberg.

[47] Aurelien Bouteiller, George Bosilca, and Jack J Dongarra. Plan b: Interruption of on-going MPI operations to support failure recovery. InProceedings of the 22nd European MPI Users’ Group Meeting, page 11. ACM, 2015.

[48] B. Brandfass, T. Alrutz, and T. Gerhold. Rank reordering for MPI communication optimization. Computers and Fluids, 80:372 – 380, 2013. Selected contributions of the 23rd International Conference on Parallel Fluid Dynamics ParCFD2011.

[49] Matthias Braun, Sebastian Buchwald, Manuel Mohr, and Andreas Zwinkau. Dy-namic X10: Resource-aware programming for higher efficiency. Technical Report 8, Karlsruhe Institute of Technology, 2014. X10 ’14.

[50] J. Buisson, O. Sonmez, H. Mohamed, W. Lammers, and D. Epema. Scheduling mal-leable applications in multicluster systems. In2007 IEEE International Conference on Cluster Computing, pages 372–381, Sept 2007.

[51] Hans-Joachim Bungartz, Christoph Riesinger, Martin Schreiber, Gregor Snelting, and Andreas Zwinkau. Invasive computing in HPC with X10. In Proceedings of the Third ACM SIGPLAN X10 Workshop, X10 ’13, pages 12–19, New York, NY, USA, 2013. ACM.

Bibliography

[52] D. Buntinas, G. Mercier, and W. Gropp. Design and evaluation of Nemesis, a scal-able, low-latency, message-passing communication subsystem. In Cluster Comput-ing and the Grid, 2006. CCGRID 06. Sixth IEEE International Symposium on, volume 1, pages 10 pp.–530, May 2006.

[53] Darius Buntinas, Guillaume Mercier, and William Gropp. Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem.Parallel Computing, 33(9):634 – 644, 2007. Selected Papers from EuroPVM/MPI 2006.

[54] Lynn Elliot Cannon. A Cellular Computer to Implement the Kalman Filter Algorithm.

PhD thesis, Bozeman, MT, USA, 1969. AAI7010025.

[55] T. Cao, Y. He, and M. Kondo. Demand-aware power management for power-constrained HPC systems. In2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pages 21–31, May 2016.

[56] F. Cappello and D. Etiemble. MPI versus MPI+OpenMP on the IBM SP for the NAS benchmarks. InSupercomputing, ACM/IEEE 2000 Conference, pages 12–12, Nov 2000.

[57] T. E. Carroll and D. Grosu. Incentive compatible online scheduling of malleable par-allel jobs with individual deadlines. In2010 39th International Conference on Parallel Processing, pages 516–524, Sept 2010.

[58] T. L. Casavant and J. G. Kuhl. A taxonomy of scheduling in general-purpose dis-tributed computing systems. IEEE Transactions on Software Engineering, 14(2):141–

154, Feb 1988.

[59] M´arcia C. Cera, Yiannis Georgiou, Olivier Richard, Nicolas Maillard, and Philippe O. A. Navaux. Supporting malleability in parallel architectures with dynamic CPUSETs mapping and dynamic MPI. In Proceedings of the 11th International Con-ference on Distributed Computing and Networking, ICDCN’10, pages 242–257, Berlin, Heidelberg, 2010. Springer-Verlag.

[60] S. Chakraborty, H. Subramoni, A. Moody, A. Venkatesh, J. Perkins, and D. K. Panda.

Non-blocking PMI extensions for fast MPI startup. In2015 15th IEEE/ACM Interna-tional Symposium on Cluster, Cloud and Grid Computing, pages 131–140, May 2015.

[61] S. Chakraborty, H. Subramoni, J. Perkins, A. Moody, M. Arnold, and D. K. Panda.

PMI extensions for scalable MPI startup. In Proceedings of the 21st European MPI Users’ Group Meeting, EuroMPI/ASIA ’14, pages 21:21–21:26, New York, NY, USA, 2014. ACM.

[62] Sayantan Chakravorty, Celso L. Mendes, and Laxmikant V. Kal´e. Proactive fault tol-erance in MPI applications via task migration. InHigh Performance Computing - HiPC 2006: 13th International Conference, Bangalore, India, December 18-21, 2006. Proceedings, pages 485–496, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.

[63] Bradford L Chamberlain, David Callahan, and Hans P Zima. Parallel programmabil-ity and the Chapel language. The International Journal of High Performance Computing Applications, 21(3):291–312, 2007.

Bibliography

[64] Bradford L. Chamberlain, Steven J. Deitz, David Iten, and Sung-Eun Choi. User-defined distributions and layouts in Chapel: Philosophy and framework. In Proceed-ings of the second USENIX Conference on Hot Topics in Parallelism, HotPar’10, pages 12–12, Berkeley, CA, USA, 2010. USENIX Association.

[65] Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: An object-oriented approach to non-uniform cluster computing. SIGPLAN Not., 40(10):519–

538, October 2005.

[66] Yanpei Chen, Sara Alspaugh, and Randy Katz. Interactive analytical processing in big data systems: A cross-industry study of MapReduce workloads. Proc. VLDB Endow., 5(12):1802–1813, August 2012.

[67] I-Hsin Chung, Che-Rung Lee, Jiazheng Zhou, and Yeh-Ching Chung. Hierarchical mapping for HPC applications. Parallel Processing Letters, 21(03):279–299, 2011.

[68] J. Cohen. Graph twiddling in a MapReduce world.Computing in Science Engineering, 11(4):29–41, July 2009.

[69] Isa´ıas A. Compr´es Ure ˜na, Michael Gerndt, and Carsten Trinitis. Wait-free message passing protocol for non-coherent shared memory architectures. InRecent Advances in the Message Passing Interface: 19th European MPI Users’ Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, pages 142–152, Berlin, Hei-delberg, 2012. Springer Berlin Heidelberg.

[70] Isa´ıas A. Compr´es Ure ˜na, Ao Mo-Hellenbrand, Michael Gerndt, and Hans-Joachim Bungartz. Infrastructure and API extensions for elastic execution of MPI applica-tions. InProceedings of the 23rd European MPI Users’ Group Meeting, EuroMPI 2016, pages 82–97, New York, NY, USA, 2016. ACM.

[71] Isa´ıas A. Compr´es Ure ˜na, Michael Riepen, Michael Konow, and Michael Gerndt.

RCKMPI - lightweight MPI implementation for Intel’s single-chip cloud computer (SCC). InEuroMPI, volume 6960 ofLecture Notes in Computer Science, pages 208–217.

Springer, 2011.

[72] Isa´ıas A. Compr´es Ure ˜na, Michael Riepen, Michael Konow, and Michael Gerndt.

Invasive MPI on Intel’s Single-Chip Cloud Computer. InArchitecture of Computing Systems – ARCS 2012: 25th International Conference, Munich, Germany, February 28 -March 2, 2012. Proceedings, pages 74–85, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.

[73] G. Contreras and M. Martonosi. Characterizing and improving the performance of Intel threading building blocks. In 2008 IEEE International Symposium on Workload Characterization, pages 57–66, Sept 2008.

[74] David Cunningham, David Grove, Benjamin Herta, Arun Iyengar, Kiyokuni Kawachiya, Hiroki Murata, Vijay Saraswat, Mikio Takeuchi, and Olivier Tardieu. Re-silient X10: Efficient failure-aware programming.SIGPLAN Not., 49(8):67–80, Febru-ary 2014.

Bibliography

[75] D. Bailey et al. The NAS parallel benchmarks. Technical Report RNR-91-002, NAS Systems Division, January 1991.

[76] L. Dagum and R. Menon. OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science and Engineering, 5(1):46–55, Jan 1998.

[77] Robert I. Davis and Alan Burns. A survey of hard real-time scheduling for multipro-cessor systems. ACM Comput. Surv., 43(4):35:1–35:44, October 2011.

[78] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107–113, January 2008.

[79] Jeffrey Dean and Sanjay Ghemawat. MapReduce: A flexible data processing tool.

Commun. ACM, 53(1):72–77, January 2010.

[80] James Dinan, Pavan Balaji, Ewing Lusk, P. Sadayappan, and Rajeev Thakur. Hybrid parallel programming with MPI and Unified Parallel C. InProceedings of the 7th ACM International Conference on Computing Frontiers, CF ’10, pages 177–186, New York, NY, USA, 2010. ACM.

[81] S. G. Domanal and G. R. M. Reddy. Load balancing in cloud computingusing modi-fied throttled algorithm. In2013 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), pages 1–5, Oct 2013.

[82] S. G. Domanal and G. R. M. Reddy. Optimal load balancing in cloud computing by efficient utilization of virtual machines. In 2014 Sixth International Conference on Communication Systems and Networks (COMSNETS), pages 1–4, Jan 2014.

[83] Richard A. Dutton and Weizhen Mao. Online scheduling of malleable parallel jobs.

In Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems, PDCS ’07, pages 136–141, Anaheim, CA, USA, 2007. ACTA Press.

[84] Jr. E. G. Coffman, M. R. Garey, and D. S. Johnson. An application of bin-packing to multiprocessor scheduling. SIAM Journal on Computing, 7(1):1–17, 1978.

[85] Deepak Eachempati, Hyoung Joon Jun, and Barbara Chapman. An open-source compiler and runtime implementation for Coarray Fortran. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS ’10, pages 13:1–13:8, New York, NY, USA, 2010. ACM.

[86] J. Ekanayake, S. Pallickara, and G. Fox. MapReduce for data intensive scientific analyses. In2008 IEEE Fourth International Conference on eScience, pages 277–284, Dec 2008.

[87] Tarek El-Ghazawi and Lauren Smith. UPC: Unified Parallel C. InProceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC ’06, New York, NY, USA, 2006.

ACM.

[88] M. Etinski, J. Corbalan, J. Labarta, and M. Valero. Parallel job scheduling for power constrained HPC systems. Parallel Computing, 38(12):615 – 630, 2012.

Bibliography

[89] Yoav Etsion and Dan Tsafrir. A short survey of commercial cluster batch sched-ulers. School of Computer Science and Engineering, The Hebrew University of Jerusalem, 44221:2005–13, 2005.

[90] Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn. Parallel job scheduling

— a status report. In Proceedings of the 10th International Conference on Job Schedul-ing Strategies for Parallel ProcessSchedul-ing, JSSPP’04, pages 1–16, Berlin, Heidelberg, 2005.

Springer-Verlag.

[91] Dror G. Feitelson, Larry Rudolph, Uwe Schwiegelshohn, Kenneth C. Sevcik, and Parkson Wong. Theory and practice in parallel job scheduling. In Job Scheduling Strategies for Parallel Processing: IPPS ’97 Processing Workshop Geneva, Switzerland, April 5, 1997 Proceedings, pages 1–34, Berlin, Heidelberg, 1997. Springer Berlin Hei-delberg.

[92] Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Al-isafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. Clearing the clouds: A study of emerging scale-out workloads on modern hardware. SIGPLAN Not., 47(4):37–48, March 2012.

[93] Lance Fortnow. The status of the P versus NP problem.Commun. ACM, 52(9):78–86, September 2009.

[94] Andrew Friedley, Greg Bronevetsky, Torsten Hoefler, and Andrew Lumsdaine. Hy-brid MPI: Efficient message passing for multi-core systems. InProceedings of the Inter-national Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pages 18:1–18:11, New York, NY, USA, 2013. ACM.

[95] Karl F ¨urlinger and David Skinner. Capturing and visualizing event flow graphs of MPI applications. InEuro-Par 2009 – Parallel Processing Workshops: HPPC, HeteroPar, PROPER, ROIA, UNICORE, VHPC, Delft, The Netherlands, August 25-28, 2009, Revised Selected Papers, pages 218–227, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.

[96] Edgar Gabriel, Graham E. Fagg, George Bosilca, Thara Angskun, Jack J. Dongarra, Jeffrey M. Squyres, Vishal Sahay, Prabhanjan Kambadur, Brian Barrett, Andrew Lumsdaine, Ralph H. Castain, David J. Daniel, Richard L. Graham, and Timothy S.

Woodall. Open MPI: Goals, concept, and design of a next generation MPI implemen-tation. InRecent Advances in Parallel Virtual Machine and Message Passing Interface: 11th European PVM/MPI Users’ Group Meeting Budapest, Hungary, September 19 - 22, 2004.

Proceedings, pages 97–104, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg.

[97] G. Galante and L. C. E. d. Bona. A survey on cloud computing elasticity. In2012 IEEE Fifth International Conference on Utility and Cloud Computing, pages 263–270, Nov 2012.

[98] Michael R. Garey and David S. Johnson. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1990.

[99] GA Geist, James A Kohl, and Phil M Papadopoulos. PVM and MPI: A comparison

[99] GA Geist, James A Kohl, and Phil M Papadopoulos. PVM and MPI: A comparison