• Keine Ergebnisse gefunden

TeraGrid

Im Dokument Final Report (Seite 119-124)

PART 1 – The Empirical Picture

4.18 TeraGrid

Case Overview

What does the project do mainly? TeraGrid is a national e-Infrastructure that provides a distributed set of high-capability computational, data-management and visualization resources to academic users.

Motivations for setting it up: A 1999 report by the President's Information Technology Advisory Committee (PITAC) suggested that for the US to retain its leading role in basic research, scientists and engineers needed to gain appropriate access to the most powerful computers, which at this time were at the teraflop level (1012 operations per second). Along with the growing interest in Grid systems and a more specific focus on data grids, these discussions have also raised attention around the problem of managing, interoperating, analyzing and visualizing an exponentially growing amount of data from scientific instruments.

Funders identified Grid computing as a technological infrastructure that could meet needs that go beyond the individual technical elements of computing, data and storage

technologies, moving toward a more holistic facility of seamless, balanced, integrated computational and collaborative environment that supports scientific research.

Main goals of the project: Three objectives guide the project: (1) “petascale science”—the use of intensely high-end computational capabilities to advance computational science in multiple fields; (2) empowering science leaders through “science gateways” methodologies (see “user recruitment”); and (3) providing a reliable, general purpose set of e-Infrastructure services and resources. More recently TeraGrid stated that its aim is to enable science that could not be done without TeraGrid; to broaden the user base, simplify users’ lives, improve operations, and enable connections to external resources.

Project maturity: With continuous streams of funding since 2001, a partnership of organizations that include some of the world’s most experienced institutions in supercomputing provision, and five years of production, TeraGrid is a very mature e-Infrastructure provider.

Project funding: Over the past eight years the National Science Foundation has directly and indirectly awarded approximately $250 million to the TeraGrid. Of the $12.1 million allotment to the Grid Infrastructure Group (GIG), the largest expenditure went to outreach and user support (44%). It was followed by allocation of basic infrastructure, resources and services (24%); Management, finance, and administration (14%), Science gateways (11%), and CI development (7%). Having a different mandate, most of the $31.1 million to the Resource Providers (RPs) supported basic infrastructure, resources and services (56%). Other efforts included user support and science outreach (26%), management, finance and administration (12%), science gateways (4%) and CI development (2%).

Organizational Structure

Size and composition: TeraGrid began operation in 2001 as a partnership among four RPs:

University of Chicago/ANL, California Institute of Technology (Caltech), National Center for Supercomputing Applications (NCSA), and San Diego Supercomputer Center (SDSC). As displayed in Figure 1, organizational membership has grown over the years to include eleven resource providers (RPs), the Grid Infrastructure Group (based at the University of

Chicago/ANL), as well as four Software Integration partners. As noted elsewhere, this expansion was not planned at the outset and was mostly a result of subsequent NSF awards

Page 96 made to additional sites. The combined resources support a staff of approximately 130 full-time equivalent positions.

Governance: Work in TeraGrid is distributed following a matrix approach to the distribution of work, so that individuals responsible for particular areas or tasks are not necessarily the direct supervisors of those who work on those tasks, and team members are often located across several sites. Two main entities in the TeraGrid lead the project: the TeraGrid Forum and the Grid Infrastructure Group. A recently established body, the Science Advisory Board, provides external evaluation and consulting role.

Managing internal and external relations

Management of the project: Resource Provider (RP) Forum is responsible for setting policy and governance for the project, the Forum consists of principal investigators from the RPs and the GIG. An elected Chairperson, who is funded through the GIG, leads the Forum. Working closely with RPs that implement and support resources and services, the GIG is charged with providing coordination, operation, software integration, management and planning for the TeraGrid. Work is divided across various subject areas, each with its own Area Director. Area Directors manage, oversee, coordinate and maintain TeraGrid activities within their area.

Working groups consist of teams of experts available in partner sites. The GIG management team provides general oversight and management to working groups. Different organizational cultures, goals and competition among TeraGrid organizations make collaboration challenging.

To address these challenges TeraGrid has recently implemented project management processes that support clearer division of labour, and bolstered communication and coordination mechanisms to help the synchronizing of its inter-organizational activities.

Users: According to the NSF Cyberinfrastructure Allocation Policy, individuals eligible for resource allocations are those who are a “researcher or educator at a U.S. academic or non-profit research institution.” In recent years, with the expansion of the scope of PIs, many more people are eligible to use—and do use—TeraGrid services, growing from fewer than 1,000 users in October 2005 to over 4,000 users at the end of 2008 (TG annual report 2009).

The number of active PIs in 2008 was about 1,500. A breakdown of all active users shows that most are graduate students (36%), followed by faculty (22.3%) and post-doctorates (12.7%).

While the stated number of industrial users is negligible, individual computing centres may have separate undisclosed provision contracts with commercial clients.

User recruitment: TeraGrid uses traditional publicity mechanisms to attract users: a project website, press releases, and public news announcements directed at the served scientific community (TeraGrid Science Highlights and International Science Grid this Week), as well as dedicated, often large-scale training events, where participation is partially supported. In addition, to broaden its direct reach to users, TeraGrid has implemented in the past years two novel mechanisms: Science Gateways and Campus Champions. Science Gateways enable users to maintain their familiar work environment, while porting their applications to the Grid.

Campus Champions involves technology leaders in a campus that advocate the use of TeraGrid in their local community.

Drivers and barriers to adoption: Access to the unique resources TeraGrid offers is the main driver to adoption for those researchers who need high-end computational data or data visualization resources. No less important, according to some informants, NSF channels funded research to facilitate this e-Infrastructure. However, even with these carrots and sticks both our informants and past analyses have indicated several challenges relating to barriers to users. These barriers can be categorized based on two distinct user populations: 1) the highly computer savvy and 2) those less familiar with the operation of supercomputers or

e-Infrastructure computer resources. The first group has repeatedly complained about the functioning of TeraGrid, claiming that the system is unreliable at times and that they often need to wait a long time to have their job reach the top of the processing queue. Technical

Page 97 design constitutes perhaps a more considerable barrier to a second, larger group of scientists.

Since these people are less computer savvy, they have little tolerance to accommodate cumbersome interfaces, software that requires them to spend much time to obtain new knowledge.

Challenges in interdisciplinary collaboration: N/A

Collaboration with other organizations: TeraGrid has limited external relationships with other e-Infrastructure providers. These partners include the US Open Science Grid (OSG), and international collaborations mainly at the level of sharing knowledge and experiences with Enabling Grids for E-sciencE (EGEE), National Research Grid Initiative (NAREGI), Distributed European Infrastructure for Supercomputing Applications (DEISA) and others on occasions that bring together e-Infrastructure providers, such as the annual Supercomputing conference or the more specialized meetings that providers tend to organize.

Technology

Main technologies, resources and services: As of 2009, TeraGrid hardware capacity include 161,000 processor cores across 22 systems, offering more than a petaflop of computing capability and more than 30 petabytes of online and archival data storage, with rapid access and retrieval over high-performance networks. Another major service TeraGrid provides is Science Gateways. In addition to supporting individual gateway projects, TeraGrid personnel provide and develop general services for all projects. Among these efforts are: help desk support, documentation, SimpleGrid for basic gateway development and teaching, gateway hosting services, a gateway software registry, and security tools including the Community Shell, credential management strategies, and attribute-based authentication.

Role of technology development: See main technologies and interoperability.

Data sharing: N/A

Interoperability with similar or connecting infrastructures: TeraGrid development efforts aim to provide transparent use of the project’s distributed resources among the heterogeneous set of computers and devices found in participating sites. Toward that end, sites have developed a Coordinated TeraGrid Software and Services (CTSS) Capability Kits, which are defined as

“collections of software related by users-oriented HPC [high-performance computing] tasks.”

Examples of CTSS Kits include: Remote Login, Remote Compute, Data Movement and Science Workflow Support. TeraGrid representatives have worked with the Open Science Grid on interoperability across the two e-Infrastructures, specifically MPI parallel job submission through Globus. In addition, senior TeraGrid members have participated in the Grid

Interoperation Now group, which, under the auspices of the Open Grid Forum, aims to develop and demonstrate interoperation among the major e-Infrastructure providers.

Contribution

Main contributions of project: Advancing the set of technologies required to integrate distributed heterogeneous supercomputers and other high end performing computers into a cohesive and persistent fabric is one of the most direct and important outcomes of the project. Another less direct, but nonetheless important contribution of this work, is that collaboration across sites that did not traditionally work together has created the social and organizational fabric that has enabled important technology advancements. These

relationships are likely to sustain additional collaborative research partnerships, particularly in 2011, when the next funding program will be implemented. TeraGrid also offers significant improvement in resources available to scientists in fields that have traditionally relied upon advanced computational infrastructure to advance their research, including high energy physics and climatology.

Page 98 Challenges: There are currently no sustainability mechanisms being implemented that would enable TeraGrid development to continue should funding cease. In fact, informants note that without continued, persistent streams of funding, many of TeraGrid’s efforts will be

terminated. Perhaps more challenging is the development of commodity commercial alternatives, which are based on cloud computing. Should comparable resources be offered through these vendors at a lower operational cost, the prospects of continued investments in TeraGrid are limited.

Informants’ recommendations to policy makers

Informants suggested operating longer funding cycles. While more dynamic temporality is suitable for scientific research, it is less efficient for infrastructure construction—especially across multiple organizations—because it is an activity that requires a much longer time horizon. In addition, they also recommended more direct involvement of program officers that would allow funders to gain a clearer understanding of the complexities involved in their funded project, and would also enable them to recognize individual contributions each partner makes.

SWOT analysis

Table 4-36: TeraGrid strengths and weaknesses

Strengths Weakness

Long-term funding

Long-term funding is secured.

Sustainability In 2010, TeraGrid will continue as TeraGrid Extreme Digital Resources for Science and Engineering, likely though a different mixture of participating organizations.

User recruitment The infrastructure has a strategy for recruiting new users.

Involvement of current users

After recognizing that a “build it and they will come” approach is untenable, TeraGrid has moved to an innovative three pronged strategy that includes marketing and information

dissemination, a novel Science Gateways program that minimizes the need for users to change in adopting the TeraGrid, and Campus Champions that leverages local presence of technology advocates in university campuses. These programs managed to attract users that were not traditionally associated with

supercomputing, but require high-end computation and data resources Organizational

bedding

While strongly embedded in participating institutions, continual competition for grants—especially the upcoming TeraGrid Extreme Digital Resources—weakens the overall commitment to the project.

Institutionalised Aside from efforts to collaboration with

Page 99

links the Open Science Grid, there are no

established interoperation mechanisms, only exchanges of knowledge and practices.

External use of software, tools

TeraGrid has a very large number of users. At the same time most of its developments serve various participating sites in TeraGrid, but there is no evidence to suggest that it is used elsewhere.

Table 4-37: TeraGrid opportunities and threats

Opportunities Threats

Funding of member organizations

Although the organizational composition would likely change, the allocation of next round public funding has been ensured. Nevertheless, until the winner of the bid is announced, there is fierce competition among current collaborators, which clouds day-to-day operation.

Technology monitoring

TeraGrid involves some of the world’s most renowned experts on distributed academic computing. However, there is no indication that efforts are being made to consider alternative technologies, such as clouds.

Competition with other

infrastructures or technologies

Being a highly complex and specialized operation, there are no alternatives e-Infrastructure technologies that can be implemented across participating sites to support the provision level of TeraGrid.

Still, cloud computing poses a significant threat, should it be publicly available and be able to serve the specialized needs of supercomputing/high-end computation and data scientific users.

Security risks There is a stream of research and development on security, including identity management and advanced authentication mechanisms.

Change of user communities and fields

It does not seem likely that the need for high-end distributed resources TeraGrid provides will quickly expand beyond communities that are currently served.

Page 100

5 Multi-case comparison

16

Im Dokument Final Report (Seite 119-124)