• Keine Ergebnisse gefunden

C3-Grid

Im Dokument Final Report (Seite 39-44)

PART 1 – The Empirical Picture

4.1 C3-Grid

Case Overview

What does the project do mainly? C3-Grid links distributed data archives in several German institutions for earth system sciences. With the help of Grid technologies it creates an infrastructure which provides tools for effective data discovery, data transfer and processing for scientists in climate research. This can increase productivity in scientific work by climate scientists5.

Motivations for setting it up: The original motivation for initiating the project was improving access to data needed for simulations. Until then no overview of the existing data archives of earth science existed; accessing proprietary data at other institutions was even less feasible.

And if access was possible, researchers faced the problem that the format and structure of the data from other disciplines were completely different. It was quickly realized that Grid technology could solve the problem of connecting the distributed data repositories.

Main goals of the project: The mission for C3-Grid was to build a collaborative environment to facilitate data discovery, data access and data processing (Kindermann and Stockhause 2008).

The C3-Grid user accesses data from simulations and observational data stored in institutionally and geographically distributed archives (e.g. WDCC/Hamburg,

Pangaea/Bremen, DLR, DWD and others). Access to the data is provided via one portal and the data come in a standardized format. An integrated data management system supports typical workflows.

Project maturity: The project started in September 2005 and has been terminated officially in February 2009 after an extension in August 2008. The C3-Grid is a founding member of the German D-Grid initiative. A follow-up project is currently being evaluated for funding. To date a working prototype of the Grid has been implemented. To make the Grid ready for a reliable service production a further 3 years of development are necessary. The software is not fully stable yet; during access peaks the system frequently gets overloaded. A further problem is the improvement of the international operability, e.g. tools allowing a first data analysis will be implemented to get the users a first impression of the data.

Up to now there are around 50 users. Most of them are scientists working in Germany. In the future the Grid will be opened for scientists from other countries as well. A main target of the successor project is to expand the Grid architecture and functionality in a way that enables uncomplicated access for scientists from all over the world.

Project funding: C3-Grid is part of the German D-Grid initiative. The D-Grid Initiative (German Grid Initiative) builds a sustainable Grid infrastructure for education and research in Germany.

The German Ministry for Education and Research funded D-Grid and C3-Grid accordingly. The funding was awarded for personnel costs only. Hardware and other infrastructure had to be provided by the participating organizations. Consequently a significant proportion of the costs were borne by the participating institutes.

4 The authors of this section are: Franz Barjak, Oliver Bendel, Erica Coslor, Kathryn Eccles, Tobias Hüsing, Zack Kertcher, Eric Meyer, Simon Robinson, Ralph Schroeder and Gordon Wiegand

5 This description is based on 90 minutes of face-to-face and telephone interview time with 3 informants as well as documents available on the C3-Grid website (http://www.c3grid.de) and some published material as cited.

Page 16

Organizational Structure

Size and composition: C3-Grid consists of eight project partners, plus six associated partners.

This is the core group of official project participants. The consortium consists of eight data-providing institutions. The task of this group is to provide and arrange the data, including e.g.

descriptions of the data with metadata. Eight further members of the consortium, so called operators, represent the users. These are mainly universities and other scientific institutes.

Two further members of the consortium are responsible for informatics. Additionally there are three associated academic partners (all universities) and three industrial partners.

Governance: The Alfred-Wegener-Institute for Polar- and Marine Research (AWI),

Bremerhaven, coordinates the project. Each group of project partners has its specific task.

The role of the domain scientists is to specify the requirements and to provide the domain-specific applications, like diagnostic scripts. The data providers contribute processed data.

The computer science partners have to supply all the middleware.

Managing internal and external relations

Management of the project: If possible decisions are made consensually. Controversial issues are discussed with the AWI having the final decision-making power. To date all decisions have been taken jointly by all partners. The project partners meet roughly every six months.

Users: Up to now (May 2009) around 50 users have used the C3-Grid. In the terminology of the project users are always individual scientists. All users so far are based in Germany, but in the future the Grid will be opened to scientists from all over the world. Despite the users as persons are based in Germany the projects, in which they are involved, are typically

international. Hence the impact is not limited to Germany. The formal extension of the user community to other countries is scheduled for the end of the project. Since the Grid has not yet reached its desired final production functionality the users still have to have a good knowledge of the technology. It is for many purposes not yet possible to use the Grid routinely. Hence not even in Germany itself all potential users are included.

User recruitment: After trying out different strategies with little success, it turned out, that the most effective way to raise interest among scientists for C3-Grid is to visit their

institutions. So, C3-Grid project members travel and visit the institutions with potential interest making the user-recruitment strategy more flexible and thus user-orientated.

Presentations at specialist conferences also turned out to be a powerful way to motivate scientists to become involved in C3-Grid.

Drivers and barriers to adoption: The main driver for a scientist to join C3-Grid is the easy access to huge data archives coming from both real measurement and simulations. Since the data are distributed over many institutions, C3-Grid is the only possible way to get the data.

To find a demand for the Grid within the community is most difficult. But since the use of Grid technology is not as trivial as the use of other software, a certain amount of insight in the technology is required on the users’ side. The C3-Grid is from the viewpoint of the user not a black box, but a "grey box" (C3-Grid interview). This involves the need to acquire specific knowledge and an inclination to computational research among the potential users.

Challenges in interdisciplinary collaboration: Different scientific cultures are a major problem in the project management. To solve the problem interdisciplinary task forces were installed which convened face-to-face meetings for discussing the appropriate way to proceed. A good deal of the workload of the project managers concerns the coordination of the different disciplines. The coordinator estimates that approximately 20% of the overall workload of each project member is needed to find a common basis with other project members. Additional 20% of the workload is required for amendments where the supposed mutual understanding only seemingly was existent.

Page 17 Collaboration with other organizations: Many of the relevant German organizations which would qualify as partners are already included in the project consortium. This is especially true for all earth science institutions. The collaboration with partners from computer science is broad in scope as well. C3-Grid is embedded in the D-Grid initiative for example. Thus an exchange of experience concerning grids is guaranteed. Furthermore there is a dedication to the world wide Grid community, e.g. the project engages in collaboration with EGEE.

Technology

Main technologies, resources and services and the role of technology development: From a technical point of view the aim of the project was to "gridify" existing diagnostic workflows and to provide the Grid itself. C3-Grid did not extend the methods of earth system sciences; it was “only” focused on the technical, i.e. infrastructural aspects. Hence, many of the tools of the project are middleware. Existing tools were used as much as possible, but many had to be developed anew. The key challenges were to enable data discovery with automatic metadata generation, to ease data access by bridging heterogeneity, support data processing by

workflow composition and organize the access to resources with a consistent security infrastructure.

Data sharing: Despite data sharing is the gist of the project one lesson learned was that a sophisticated access right management has to be implemented. In May 2009 a new internal project has started to implement a new access right management system.

Interoperability with similar or connecting infrastructures: As C3-Grid forms part of the D-Grid initiative, collaboration with other German grids is wide-ranging. C3-Grid is considered to be one of the most important Grid technology development projects in this initiative. The connection to EGEE is rather loose. Since both projects use different middleware, the main purpose of the collaboration is to ensure the re-usability of the tools by making them compatible. C3-Grid is not only an early but also a successful project within the Grid

community. Since it is well documented and has published many of the preliminary tools, C3-Grid has become a model at least within the European C3-Grid community.

Contribution

Main contributions of project: The impact of C3-Grid in the earth science community is substantial. It has enabled the analysis of data from different sources simultaneously which has led to new insights into the interaction of earth subsystems. Furthermore there is a strong impact on the methodology of earth science. It is common sense in the community that local data management in petabyte dimensions is not possible anymore. As pointed out, C3-Grid has become a model at least within the European Grid community.

Challenges The submission of an application for a follow-up project to the German Federal Ministry for Education and Research is planned for May 2009. The aim is to advance the Grid from the prototype to the production status. The software has to be stabilized and scalability needs to be reached. In regard to the content the work is done but it still needs testing at length. A further task is to improve the international interoperability. Pre-processing of the data has to be improved in order to reduce the size of data that is being transmitted in every data download. Users with limited internet download speed, e.g. from developing countries, can only handle customized data sets. Hence the functionality of C3-Grid will be broadened. A third task is to review and edit the access rights management in the Grid. The current version is not elaborate enough and needs to be refined. An additional task to improve the

interoperability is the integration of C3-Grid into partner grids like Earth Systems Grid (ESG, http://www.earthsystemgrid.org/) and the Nerc Data-Grid (http://ndg.nerc.ac.uk/). From the viewpoint of the project management the most important task is to find better

Page 18 communication solutions especially to improve communication between members of different disciplines

Informants’ recommendations to policy makers

Not covered in C3-Grid interviews.

SWOT analysis

Table 4-1: C3-Grid strengths and weaknesses

Strengths Weakness

Long-term funding The funding of the follow-up by the Ministry for Education and Research of the project is very likely but the final commitment remains to be made.

The funding of the project by the ministry concerns only the manpower costs. The costs for the hardware are contributed by the participating institutions. So, it is still somewhat unclear what happens, if one of the institutions should withdraw from the project.

Sustainability Since the participating institutions switched their data storage step by step from local to Grid archives it is not easy to switch back. Once the commitment to participate is made, it is hard or even impossible to step back.

Even though it is an integral part of the project to open the Grid for an

international community, it currently still is restricted to German scientists.

User recruitment Users are recruited by visits and presentations at expert conferences.

Different strategies of recruitment have been tried, so it is likely that the most effective way could be found.

The strategies might work well for the German community, but it will be expensive and demanding to recruit users internationally by personal visits.

New strategies have to be established.

Involvement of current users

Current users are mostly highly committed. Many important projects within the earth sciences are not realizable without Grid technology anymore. There is a vital necessity to stick with this technology to acquire prestigious projects and to publish in high impact journals.

Since the Grid is not fully operable yet, it is still a problem to open up Grid technology to scientists who are not computer-savvy. It still needs specialist knowledge to use the Grid and this discourages potential users, as the workload is too high to have the Grid doing what it should do.

Organizational bedding

All involved institutions have a long tradition as research institutions. Many of them are flagships of the German research system.

Institutionalised links

As figurehead of the German Grid community at least the bracing within German and European Grid projects is excellent. Furthermore there exist at least loose affiliations to most Grid projects all over the world.

External use of software, tools

A significant part of the work of C3-Grid was to develop middleware and Grid standards. Many younger Grid projects in Germany have adopted the

technology and tools.

The core of the Grid technology is the middleware. C3-Grid - like all D-Grid projects - use gLite as middleware. But from an international point of view much more research is done on Globus, an alternative to gLite. Globus is used

Page 19 in many other paradigmatic projects.

Table 4-2: C3-Grid opportunities and threats

Opportunities Threats

Funding of member organizations

All member organizations are major research institutes or universities. Their funding is guaranteed for the future.

Technology monitoring

Within the earth science community C3 -Grid is setting standards. The project is being presented and discussed at all major conferences in the affiliated fields. Furthermore, C3-Grid is an active member of the Grid community, so developments in this field won’t be missed.

The purchase and maintenance of the hardware is a responsibility of the participating organizations. Hence, it is not guaranteed that all use the same high standards of hardware. There is no obligation to adapt the best technology.

But up to now this is more a theoretical problem.

Competition with other

infrastructures or technologies

C3-Grid is very well embedded in D-Grid. Hence not only C3-Grid but other projects as well help to improve the tools in use.

As mentioned the middleware gLite used is different from the middleware Globus of other major projects.

Security risks Up to now no security risks are known. A more sophisticated access right management system has to be

developed. A separate project proposal has been developed and submitted for funding.

Change of user communities and fields

The current trend within the earth sciences is to develop models with huge data bases. These data bases can only be handled with Grid technology. It is conceivable that more and more scientists will use the Grid.

Since more scientists will use the Grid the Grid has to become more user-friendly. It has to work like usual software, which means that no highly specialized skills have to be necessary to use it. Furthermore the Grid will be opened to researchers from all over the world including countries without good internet access. Hence, access to data has to be simplified in a way that only the data really needed is downloaded.

A more sophisticated pre-processing of the data has to be developed and implemented.

Page 20

Im Dokument Final Report (Seite 39-44)