• Keine Ergebnisse gefunden

Who is afraid of Data Publishing – The ESSD Experience

N/A
N/A
Protected

Academic year: 2022

Aktie "Who is afraid of Data Publishing – The ESSD Experience"

Copied!
26
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

H.Pfeiffenberger, D.Carlson, APE2014, 2014-01-29, Berlin 1

Who is afraid of Data Publishing – The ESSD Experience

Hans Pfeiffenberger, Dave Carlson

Alfred-Wegener-Institute for Polar and Marine Research, Helmholtz Association - Germany

APE2014, 2014-01-29, Berlin

(2)

Who should be Scared of Data Publishing ?

Everybody !!

l  At least those who don’t like a tough challenge

Respectful (Jan Brase)

recognize

(3)

H.Pfeiffenberger, D.Carlson, APE2014, 2014-01-29, Berlin 3

Who should really be Afraid of Data Publishing ?

l  Those who

-  Invented their data (Stapel),

-  Selected data with a bias (notorious: Clinical trials)

-  Read wrong or to much from their data (Reinhart/

Rogoff)

l  Those who build business-models on a monopoly on knowledge or facts, e.g.

- Non-OA Publishers

-  Institutes which consider

data collections as “their” capital

(4)

Royal Society: Science as an Open Enterprise (2012)

l  Open enquiry has been at the heart of science since the first scientific journals were printed in the

seventeenth century. …

l  Science's capacity for self-correction comes from this openness to scrutiny and challenge.

l  RS applied this to data:

Intelligent Openness

(5)

H.Pfeiffenberger, D.Carlson, APE2014, 2014-01-29, Berlin 5

Scared in the 17th Century

Hooke, published 1676 by anagram

„ceiiinossssttuv“

1678 in booklet

(6)

Meitner-Hahn-Strassmann Uran-Experiment, Berlin-Dahlem, 1938

The last big discovery by a small group with a lab notebook ?

(7)

H.Pfeiffenberger, D.Carlson, APE2014, 2014-01-29, Berlin 7

What we do today:

ARGO, the biggest experiment in the world

(8)

ARGO is not Scared of „Data Publishing“ !

What is really fascinating: There are

l  More than 3.000 buoys

l  from more than 30 countries, lots of companies and yet there is:

l  Co-ordinated (quality) data management

- One (“published”) standard for instruments

-  One (“published”) standard for formats

-  One (“published”?) standard for processing

-  Open access to data - (almost) no delay

(9)

H. Pfeiffenberger GeoSim Seminar, 2013-02-15, Potsdam 9

The Dangers of Working in Closed Silos –

„Does computation threaten the scientific method?“

l  „using the same processed data from eight other companies, the same algorithms in the

same programming language, using the same input data, just

coded independently

l  L.Hatton, A. Giordani ISGTW

(10)

Data Publishing Challenge #1

l 

Quality of Data

-  Royal Soc. “intelligent Openness” (2012):

Data need to be “… assessable. Recipients need to be able to make some judgment or assessment of what is communicated.

-  “Guidelines on Data Management in Horizon 2020”

(2013):

“… are data provided in a way that judgments can be made about their reliability and the competence of

those who created them)

(11)

H.Pfeiffenberger, D.Carlson, APE2014, 2014-01-29, Berlin 11

Earth System Science Data (ESSD) established 2008

Advisory Board:

Paul J. Crutzen Sydney Levitus

Alexander Petrovich Lisitzin

Editors in Chief:

David Carlson

Hans Pfeiffenberger

Publishing House

Copernicus Publications – OA Publisher, EGU

(12)

Estimate of Error and Data Provenance

Require Estimate of Error and Data Provenance - No fancy interpretations!!

(13)

H.Pfeiffenberger, D.Carlson, APE2014, 2014-01-29, Berlin 13

2013: CO above Troll Station, Original Data

(14)

2013: CO above Troll Station, Original Data

(15)

H.Pfeiffenberger, D.Carlson, APE2014, 2014-01-29, Berlin 15

2013: CO above Troll Station, Original Data

(16)

Data Publishing Challenge #2

l 

Citability / Cite-worthy-ness / Reputation

-  NSF Proposal Preparation Instructions (2013) Proposals / PIs’ CVs must contain:

“A list of: (i) up to five products … Acceptable

products must be citable and accessible including but not limited to publications, data sets, software, ...”

- DFG “Rules of Good Scientific Practice” (2013):

Recommendation 12 on authorship:

contribution may be “preparation … of data”

(17)

H.Pfeiffenberger, D.Carlson, APE2014, 2014-01-29, Berlin 17

200 Data References ?

A huge work to find, assess, collate (quality) data;

24 out of 43 text pages are source data references!

(18)

The data are out there

Reviewer: „no effort appears to have been made to engage the specialist scientists who have spent months or years at sea collecting such data. “ - not knowing that:

Authors asked 164 potential contributors – got answer from 13!

Does citation already work as an incentive?

(19)

H.Pfeiffenberger, D.Carlson, APE2014, 2014-01-29, Berlin 19

Data Publishing Challenge #3

l 

Linking text and data

-  The Lancet “Reducing waste from incomplete or unusable reports of biomedical research” (2014)

- “… studies of published trial reports showed that … 40–89% were non-replicable”

- Offered a long laundry list of “Components of study documentation” to be published

… and much more

(20)

Now this laundry list is really scary!

1 The protocol and related documents, such as details submitted for study registration

3 Supplementary materials, such as education materials for patients, clinician training resources, and videos

7 The primary data, data manuals, and statistical code for analyses

9 Reliable and stable bidirectional linkages between all these elements

(21)

H.Pfeiffenberger, D.Carlson, APE2014, 2014-01-29, Berlin 21

2012: Nature CC & ESSD; Carbon data aggregation at global scale

(22)

2012: Nature CC & ESSD; Carbon data aggregation at global scale

(23)

H.Pfeiffenberger, D.Carlson, APE2014, 2014-01-29, Berlin 23

2012: Nature CC & ESSD; Carbon data aggregation at global scale

(24)

Linking Text and Data

Data

(in repository)

Article in data journal

Article in

„classical“ journal

(25)

H.Pfeiffenberger, D.Carlson, APE2014, 2014-01-29, Berlin 25

But we do not despair!

(26)

Conclusions

l  Socio-cultural change is on the way (may need just a few more decades)

- Need for change/quality is recognized (Lancet)

-  NSF “5 products” rule offers

the way out of the metrics dungeon

l  “Technical” challenges remain, e.g.

-  Repositories for computer code etc.

- Quality assessment for “protocols” etc.

- bidirectional Linking of Everthing Open (b-LEO)

-  And, I did not even mention versioning …

Referenzen

ÄHNLICHE DOKUMENTE

Among the recent data management projects are the final global data synthesis for the Joint Global Ocean Flux Study (JGOFS) and the International Marine Global

In addition, dissertation-related research data publications are doc- umented by the German National Library within the framework of a legal collect- ing mandate and their

According to the requirement R4 (ability of be- ing aggregated), the metrics presented in this section are defined on the layers of attribute values, tupels, relations and

To study issues of representation, legislative politics and other elite-level aspects of EU politics empirically, scholars have collected a variety of data sets regarding

The question of how many machines are desirable depends partly on how efficiently their use is organ- ized. A comparatively few machines can do more work than

The UFBGKSIZE (generic key size) specifies the number of characters to be considered in a comparison. After the START has been performed, UFBGKSIZE reverts to

Diese wissenschaftliche Arbeit an den Daten wird im Allgemeinen nicht so stattfinden, dass per DOI auf ein Daten- oder Software-Repository zugegriffen, an einer

l  Include Real World™ as well as Good Scientific Practise considerations in the Research Data