• Keine Ergebnisse gefunden

The assembly of island floras from a macroecological perspective.

N/A
N/A
Protected

Academic year: 2022

Aktie "The assembly of island floras from a macroecological perspective."

Copied!
177
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

CENTRE OF BIODIVERSITY AND SUSTAINABLE LAND USE SECTION: BIODIVERSITY, ECOLOGY AND NATURE CONSERVATION

THE ASSEMBLY OF ISLAND FLORAS FROM A MACROECOLOGICAL PERSPECTIVE

DISSERTATION

zur Erlangung des mathematisch-naturwissenschaftlichen Doktorgrades

"Doctor rerum naturalium"

der Georg-August-Universität Göttingen

im Promotionsprogramm „Biodiversity, Ecology and Evolution“

der Georg-August University School of Science (GAUSS)

vorgelegt von

Christian König

aus Chemnitz

Göttingen, 2018

(2)

Betreuungsausschuss:

Prof. Dr. Holger Kreft – Biodiversity, Macroecology & Biogeography, Georg-August-Universität Göttingen

Prof. Dr. Erwin Bergmeier, Abteilung Vegetationsanalyse & Phytodiversität, Georg-August- Universität Göttingen

Dr. Patrick Weigelt, Biodiversity, Macroecology & Biogeography, Georg-August-Universität Göttingen

Mitglieder der Prüfungskommission

Referent: Prof. Dr. Holger Kreft – Biodiversity, Macroecology & Biogeography, Georg-August- Universität Göttingen

Korreferent: Dr. Erwin Bergmeier, Abteilung Vegetationsanalyse & Phytodiversität, Georg-August- Universität Göttingen

Weitere Mitglieder der Prüfungskommission:

Prof. Dr. Dirk Hölscher Prof. Dr. Mark Maraun Prof. Dr. Teja Tscharntke Prof. Dr. Stefan Scheu

Tag der mündlichen Prüfung: 25.10.2018

(3)

Biodiversity starts in the distant past and it points toward the future.

Frans Lanting

(4)
(5)

Thesis abstract

Islands have always played a central role in ecology and biogeography. On the one hand, island biotas are ecologically unique by featuring exceptionally high rates of endemism and remarkable evolutionary adaptations while being generally poor in species. On the other hand, the geographical, climatic, and geological diversity of islands across the globe facilitates the detailed study of abiotic and biotic factors that have shaped these extraordinary assemblages. Many findings from island biogeography have led to general ecological insights in the past, e.g. the dynamic regulation of species diversity via immigration, extinction and speciation. Today, the increasing availability of ecological data allows going beyond species numbers and resolving the identities, functional traits and phylogenetic relationships of individual species at the global scale. This opens new and promising ways of inquiry in the field of island biogeography and bears great potential for understanding the ecological processes shaping island biodiversity at a deeper level.

The objective of the present thesis is twofold. First, I aim to identify and address challenges in the utilization of global plant diversity data that currently impede the effectiveness of macroecological approaches in (island) biogeographical research. Second, I endeavour to utilize these insights to conduct large-scale, data-driven analyses of plant diversity that examine the ecological and biogeographical mechanisms underlying the assembly of island floras. Consequently, the chapters of this thesis are arranged into a conceptual part (Chapters 1 and 2) and an empirical part (Chapters 3 and 4).

In Chapter 1, I develop a novel conceptualization of ecological data types according to their domain and resolution. Focusing on data from two domains, species distributions and functional traits, I show that existing digital infrastructures are generally more advanced for disaggregated data types, e.g. point occurrence records, vegetation plots and individual-level trait measurement, than for aggregated data types, e.g. regional checklists or species-level functional traits). I discuss the need for the integration of aggregated data types into the macroecological data landscape and demonstrate the potential of this approach with three case studies. In Chapter 2, I present the GIFT database, a novel resource for macroecological analyses of global plant diversity. GIFT implements many of the concepts outlined in chapter 1 and achieves nearly global coverage in terms of plant distributions and several key functional traits. The chapter provides extensive information on the design and internal processing workflows of the database, and describes the geographical, taxonomic and functional coverage of GIFT.

In Chapter 3, I use data from GIFT to assess global patterns in the beta diversity of island and mainland floras. To this end, I model species turnover, i.e. the richness-insensitive component of beta diversity, as a function of pairwise geographical distance and climatic differences between floristic regions. I show that, on average, island floras are more similar

(6)

to each other than mainland floras and that species turnover among island assemblages is mostly determined by climatic conditions rather than by geographic distance. These findings suggest that island floras sample largely from a limited set of widespread, dispersive species, while less dispersive taxonomic groups tend to be rare on islands and hence contribute little to species turnover. This interpretation is substantiated by the turnover patterns observed for varyingly dispersive taxonomic and functional groups, and represents a strong basis for the quantitative evaluation of dispersal and environmental filters during island colonization.

In Chapter 4, I further examine ecological filters during island colonization by providing the first global, quantitative analysis of island disharmony – a phenomenon that describes the biased representation of higher taxa on islands compared to their mainland source regions. I develop a novel method for the statistical estimation of island-specific source regions as well as two measures that quantify the overall compositional disharmony of an island flora and the global over- or under-representation of individual families on islands. Analyzing these two measures as a function of island- and family-specific characteristics, respectively, reveals that the overall disharmony of island floras is closely linked to island area, isolation, and climatic conditions, whereas the global over- or under-representation of individual families shows little systematic variation with family-level functional traits or family size. These findings provide a comprehensive basis for understanding the island- and taxon-specific factors that determine assembly processes on islands, but at the same time highlight the need for a stronger utilization of functional and phylogenetic approaches in island biogeography.

In conclusion, the present thesis makes several important contributions to the fields of macroecology and island biogeography. In a broader context, I identify aggregated data types as a rich, but under-utilized source of plant diversity information with great potential for improving global data coverage and representativeness. The effectiveness of a targeted integration of aggregated data is demonstrated by the GIFT database, which describes global plant diversity at an unprecedented level of completeness and constitutes an invaluable resource for future macroecological research. In a more specific context, my research on the beta diversity and disharmony of island floras provides comprehensive new insights into fundamental ecological processes that govern the assembly of island floras. In particular, the proposed method for a statistical estimation of island source regions as well as my findings on the relative roles of dispersal, environmental and biotic filtering address key problems in island biogeography.

(7)

Zusammenfassung

In der ökologischen und biogeographischen Forschung nehmen Inseln seit jeher eine besondere Rolle ein. Zum einen besitzen Inseln ökologisch faszinierende Artgemeinschaften, welche sich oft durch einen hohen Grad an Endemismus und außergewöhnliche Adaptionen bei gleichzeitig relativ geringer Gesamtartenzahl auszeichnen. Zum anderen ermöglicht die geographische, klimatische und geologische Vielfalt der weltweit mehr als 100.000 Inseln vergleichende Studien zu den ökologischen und evolutionären Faktoren, die zum Entstehen dieser Artgemeinschaften beigetragen haben. Die Erkenntnisse der Inselbiogeographie sind dabei nicht nur auf Inseln beschränkt, sondern führten in der Vergangenheit immer wieder zu grundlegenden Einsichten in ökologische Zusammenhänge, wie etwa der dynamischen Regulierung von Artenvielfalt durch Immigrations-, Extinktions- und Artbildungsprozesse.

Heute erlaubt die steigende Verfügbarkeit ökologischer Daten außerdem über die Betrachtung von Artenvielfalt hinaus zu gehen, und die Identitäten, funktionellen Eigenschaften und phylogenetischen Beziehungen einzelner Arten aufzuschlüsseln. Dadurch gewinnen makroökologische Methoden in der inselbiogeographischen Forschung zunehmend an Relevanz und versprechen ein tieferes Verständnis ökologischer Prozesse auf Inseln. Auf diesem Weg sind allerdings noch mehrere Hürden zu überwinden.

Mit der vorliegenden Arbeit verfolge ich zwei Ziele. Zum einen möchte aktuelle Defizite in der Verfüg- und Verwendbarkeit von Biodiversitätsdaten identifizieren und somit zu einer allgemein effektiveren Nutzung makroökologischer Ansätze in der Inselbiogeographie beitragen. Zum anderen möchte ich die daraus gewonnen Erkenntnisse in der Analyse globaler Pflanzendiversitätsmuster umsetzen, um neue Einsichten in die Entstehung und Zusammensetzung von Inselfloren zu gewinnen. Die vier Forschungskapitel gliedern sich dementsprechend in einen konzeptionellen (Kapitel 1 und 2) und einen empirischen Teil (Kapitel 3 und 4).

In Kapitel 1 erarbeite ich eine Einteilung ökologischer Datentypen auf Grundlage von Datenauflösung und -domäne. Ich kann zeigen, dass die existierende digitale Infrastruktur in den Schlüsseldomänen der Verbreitung und funktionellen Eigenschaften von Arten deutlich ausgereifter für disaggregierte Datentypen (z.B. Punktvorkommen, Vegetationsplots, Einzelmessungen von funktionellen Eigenschaften) als für aggregierte Datentypen ist (z.B.

regionale Checklisten oder Mittelwerte funktioneller Eigenschaften). Im Weiteren diskutiere ich die Notwendigkeit einer stärkeren Integration von aggregierten Datentypen in die makroökologische Datenlandschaft, und belege das Potenzial eines solchen Vorgehens anhand von drei makroökologischen Fallstudien. In Kapitel 2 stelle ich GIFT vor, eine neue Datenbank zur makroökologischen Analyse von Pflanzendiversität. GIFT setzt viele der in Kapitel 1 erarbeiteten Konzepte zur Integration globaler Biodiversitätsdaten um, und erreicht nahezu globale Abdeckung hinsichtlich floristischer Verbreitungsdaten sowie

(8)

bestimmter funktioneller Eigenschaften von Planzen. Das Kapitel stellt umfassende Informationen zum Aufbau der Datenbank zusammen, erläutert automatisierte Abläufe zur Verarbeitung ökologischer Daten und präsentiert detaillierte Statistiken zur geographischen, taxonomischen und funktionellen Abdeckung von GIFT.

In Kapitel 3 nutze ich GIFT um globale Muster in der kompositionellen Ähnlichkeit von Insel- und Festlandsfloren zu untersuchen. Hierzu modelliere ich den Arten-turnover, d.h.

den von der Gesamtartenzahl unbeeinflussten Teil der Beta-Diversität zweier Floren, in Abhängigkeit von geographischer Distanz und klimatischen Variablen. Ich zeige, dass sich Inselfloren im Mittel ähnlicher sind als Festlandsfloren und der Arten-turnover auf Inseln weniger von geographischer Distanz als vielmehr von klimatischen Bedingungen bestimmt wird. Die Ergebnisse legen nahe, dass Inseln mehrheitlich durch eine begrenzte Gruppe von Arten kolonisiert werden, die sich verlässlich über weite Entfernungen ausbreiten können, während Arten mit schlechteren Ausbreitungsfähigkeiten nur selten auf Inseln vertreten sind und daher wenig zum Arten-turnover beitragen. Diese Interpretation wird vom entsprechenden Verhalten sich unterschiedlich gut ausbreitender taxonomischer und funktioneller Gruppen gestützt und liefert eine wichtige Grundlage zur quantitativen Bewertung von Ausbreitungs- und Umwelt-Filtern bei der Kolonisierung von Inseln. Auch in Kapitel 4 untersuche ich ökologische Filtereffekte während der Kolonisierung von Inseln und präsentiere die erste globale, quantitative Analyse von Insel-„Disharmonie“ – einem Konzept, das die proportionale Über- oder Unterrepräsentation bestimmter Taxa auf Inseln im Vergleich zu deren Ursprungsregionen auf dem Festland beschreibt. Dazu entwickle ich einen neuen Ansatz zur statistischen Abschätzung der geographischen Ursprungsregionen von Inselfloren sowie zwei Maße zur Quantifizierung der Disharmonie einer Flora als Ganzes und der globalen relativen Häufigkeit einzelner Pflanzenfamilien auf Inseln. Die Analyse dieser Maße in Abhängigkeit insel- beziehungsweise familienspezifischer Eigenschaften zeigt, dass die Disharmonie von Inselfloren insgesamt stark durch Inselgröße, -isolation und -klima bestimmt wird, während die Repräsentation einzelner Familien kaum anhand funktioneller Eigenschaften oder der Familiengröße vorhersagbar ist. Dieser Ergebnisse liefern wichtige Beiträge zum Verständnis insel- und taxon-spezifischer Faktoren bei der Zusammensetzung von Inselfloren. Gleichzeitig hebt die Studie das hohe Potential einer verstärkten Einbindung funktioneller und phylogenetischer Ansätze in die makroökologische (Insel-)forschung hervor.

Zusammenfassend leistet die vorliegende Dissertation mehrere wichtige Beiträge zur makroökologischen und inselbiogeographischen Forschung. Im erweiterten Kontext identifiziere ich aggregierte Datentypen als reichhaltige, jedoch vernachlässigte Quelle von Informationen zur globalen Pflanzendiversität, welche massiv zu einer verbesserten Datenabdeckung und -repräsentativität beitragen kann. Die von mir vorgestellte GIFT Datenbank demonstriert das Potenzial einer stärkeren Integration aggregierter Datentypen in die makroökologische Forschung und bildet die globale Pflanzenvielfalt in teilweise unerreichtem Umfang ab. GIFT wird daher auch zukünftig als Grundlage wichtiger

(9)

makroökologischer Analysen dienen. Im konkreten inselbiogeographischen Kontext verhilft meine Forschung bezüglich Betadiversität und Disharmonie zu neuen Einsichten in grundlegende ökologische Prozesse bei der Entstehung und Entwicklung von Inselfloren.

Insbesondere die von mir entwickelte Methode zur Abschätzung der geographischen Ursprünge von Insel-Artgemeinschaften, sowie meine Erkenntnisse zu den relativen Beiträgen von Ausbreitungs-, Umwelt-, und Interaktionsfiltern bei der Kolonisierung von Inseln stellen wichtige Fortschritte in Kernbereichen der Inselbiogeographie dar.

(10)
(11)

Table of contents

Thesis abstract ... i

Zusammenfassung ... iii

Table of contents ... vii

List of figures ... x

List of tables ... xi

Author Contributions ... xiii

General Introduction ... 1

Historical biogeography and the significance of islands ... 1

The macroecological approach ... 4

Study outline... 5

1 Global integration of plant diversity data – the significance of data resolution and domain ... 7

1.1 Abstract ... 7

1.2 Introduction ... 9

1.3 Data as key to global plant ecology ... 10

1.3.1 Data domains, types and resolution ... 10

1.3.2 Data collection and processing ... 12

1.3.3 Data mobilization ... 13

1.3.4 Data imputation ... 14

1.3.5 Data sharing ... 15

1.3.6 Data integration ... 16

1.4 Case studies ... 17

1.4.1 Global patterns in plant growth form ... 17

1.4.2 The latitudinal gradient in seed mass revisited ... 18

1.4.3 A global assessment of insular woodiness ... 20

1.5 Conclusion and future directions ... 22

2 GIFT – A Global Inventory of Floras and Traits for macroecology and biogeography ... 25

2.1 Abstract ... 25

2.2 Introduction ... 27

2.3 Content and structure of GIFT ... 28

2.3.1 Overview ... 28

2.3.2 Checklists ... 29

(12)

2.3.3 Species names and taxonomic standardization ...31

2.3.4 Taxonomic backbone and phylogeny ...32

2.3.5 Functional traits ...33

2.3.6 Geographic regions ...34

2.3.7 Versioning ...36

2.4 Current state ... 36

2.4.1 Geographic coverage ...36

2.4.2 Taxonomic coverage ...38

2.4.3 Trait coverage ...39

2.4.4 Web interface ...40

2.5 Applications and outlook ... 41

3 Dissecting global turnover in vascular plants ... 45

3.1 Abstract ... 45

3.2 Introduction ... 47

3.3 Methods ... 48

3.3.1 Species data ...48

3.3.2 Abiotic data ...49

3.3.3 Compositional similarity ...50

3.3.4 Analysis ...50

3.4 Results ... 53

3.5 Discussion ... 56

3.5.1 Turnover as result of filtering processes ...56

3.5.2 The role of species attributes for turnover...58

3.5.3 The origin of beta diversity ...59

3.5.4 Methodological strengths and limitations ...60

3.6 Conclusion ... 60

4 Source pools and disharmony of the world’s island floras ... 63

4.1 Abstract ... 63

4.2 Introduction ... 65

4.3 Methods ... 66

4.3.1 Data collection ...67

4.3.2 Compositional disharmony ...68

4.3.3 Representational disharmony ...69

4.4 Results ... 71

4.4.1 Source region estimation ...71

4.4.2 Compositional and representational disharmony ...72

4.5 Discussion ... 74

4.5.1 A new approach for estimating floristic source regions ...74

4.5.2 Determinants of compositional and representational disharmony ...76

4.5.3 Disharmony – a necessarily vague concept? ...77

4.5.4 Conclusion ...78

(13)

General Discussion ... 79

Summary and contribution of this thesis ... 79

Challenges and future perspectives ... 80

References ... 85

Acknowledgements ... 107

Selbstständigkeitserklärung ... 109

Appendix ... 111

A1 Supplementary information to Chapter 1 ... 113

A2 Supplementary information to Chapter 2 ... 115

A3 Supplementary information to Chapter 3 ... 125

A4 Supplementary information to Chapter 4 ... 141

(14)

List of figures

Plate 1: Seminal works by early biogeographers on the distribution of plant diversity ... 2

Figure 1.1: Selected biodiversity data types, arranged according to their primary domain (species distributions vs. functional traits) and informational resolution (disaggregated vs. aggregated) ... 11

Figure 1.2: Comparison of logical and statistical data imputation ... 14

Figure 1.3: The global composition in plant growth form ... 18

Figure 1.4: Latitudinal gradient in seed mass for 519,812 species-sites combinations ... 19

Figure 1.5: Proportions of woody vs non-woody species and Raunkiær life forms among seed plants on twelve oceanic islands .... 21

Figure 2.1: Conceptual framework of the Global Inventory of Floras and Traits database (GIFT) ... 28

Figure 2.2: Simplified structure of the Global Inventory of Floras and Traits database (GIFT) ... 30

Figure 2.3: Trait processing in GIFT ... 33

Figure 2.4: Frequency distributions of 2007 geographic regions in GIFT ... 35

Figure 2.5: Spatial coverage of checklist data currently stored in GIFT ... 37

Figure 2.6: Taxonomic coverage of distribution data in GIFT at the family level ... 38

Figure 2.7: Geographical trait coverage of GIFT ... 40

Figure 3.1: Framework of this study for analyzing global turnover of vascular plants ... 51

Figure 3.2: Distance decay of similarity for different subsets ... 54

Figure 3.3: Turnover partitioning for taxonomic and functional groups ... 55

Figure 3.4: Predicted compo-sitional similarity of vascular plants ... 57

Figure 4.1: Schematic representation of the quantification of compositional and representational disharmony ... 67

Figure 4.2: Exemplary comparison of empirically reconstructed and statistically modelled source pools ... 71

Figure 4.3: Global patterns in floristic disharmony from an island- and a taxon-centred perspective ... 73

Figure A1.1: Geographical coverage of GIFT for native vascular plant checklists ... 113

Figure A1.2: Main module of the directed graph used for hierarchical trait derivation in GIFT ... 114

Figure A2.2: Geographical summary of selected environmental variables in GIFT ... 115

Figure A2.3: Spatial coverage of floristic subsets in GIFT ... 116

Figure A2.4: Taxonomic trait coverage of GIFT ... 117

Figure A3.1: Summary of the operational geographical units (OGUs) analysed in Chapter 4 ... 125

Figure A3.2: Graph structure used to derive plant growth form ... 126

Figure A3.3: Pairwise correlation of predictor variables used in GDM ... 127

Figure A3.4: Distance decay for taxonomic and functional groups (island vs. mainland comparison) ... 128

Figure A3.5: Turnover partitioning for taxonomic and functional groups (mainland vs. island comparison) ... 130

Figure A3.6: GDM transformation functions ... 132

Figure A4.1: Representational disharmony (Drep) as a function of the ratio between mean proportional representation in island- and mainland floras ... 141

Figure A4.2: Relationship of representational disharmony (Drep) and family-level functional traits ... 142

Figure A4.3: Representational disharmony (Drep) of 450 vascular plant families ... 143

Figure A4.4: Correlation between compositional disharmony (Dcomp) and log10(species richness) ... 144

Figure A4.5: Determination of threshold value for weights vector W, exemplarily for six islands ... 146

(15)

List of tables

Table 2.1: Current coverage of GIFT for selected major plant groups ... 36

Table 4.1: Statistical model results for compositional disharmony (Dcomp) and representational disharmony (Drep) ... 72

Table A2.1: Functional traits in GIFT ... 118

Table A2.2: Links between parent traits and derived traits used in the hierarchical trait derivation in GIFT ... 120

Table A2.3: Groups of physical geographical, environmental and socio-economic variables in GIFT ... 122

Table A3.1: Distance decay model summaries ... 129

Table A3 2: Data references for Chapter 3 ... 133

Table A4.1: Data references for Chapter 4 ... 147

(16)
(17)

Author Contributions

Chapter 1: Global integration of plant diversity data – a functional perspective Christian König1, Patrick Weigelt1, Julian Schrader1, Amanda Taylor1, Jens Kattge2 and Holger Kreft1. HK and CK conceived the project. PW and CK collected data and developed the database. CK designed the case studies and performed the analyses. CK led the writing with major contributions from all authors.

Manuscript under review at PLOS Biology.

Chapter 2: GIFT - A Global Inventory of Floras and Traits for macroecology and biogeography

Patrick Weigelt1, Christian König1 and Holger Kreft1.

PW and HK conceived the idea of the GIFT database. All authors led the collection of checklist and trait data. PW and CK developed the workflows for importing and processing data in GIFT and for calculating derived variables. PW and CK performed the analyses presented in this manuscript and all authors contributed to writing the manuscript.

Manuscript under review at Ecology and Evolution.

Chapter 3: Dissecting global turnover in vascular plants.

Christian König1, Patrick Weigelt1 and Holger Kreft1.

All authors conceived the project. CK performed the analyses and led the writing with significant contributions from PW and HK.

Published by John Wiley & Sons Ltd.: König, C., Weigelt, P. & Kreft, H. (2017) Dissecting global turnover in vascular plants. Global Ecology and Biogeography, 26, 228–242. Definitive version available under: https://onlinelibrary.wiley.com/doi/abs/10.1111/geb.12536

Chapter 4: Source pools and disharmony of the world’s island floras

König, Christian1, Patrick Weigelt1, Amanda Taylor1, Anke Stein3, Wayne Dawson4, Franz Essl5,6, Jan Pergl7, Petr Pyšek7,8, Mark van Kleunen3,9, Marten Winter10,Cyrille Chatelain11, Jan Wieringa12,13, Pavel Krestov14 and Holger Kreft1

HK, PW and CK conceived the project. CK designed the methodology, performed the analyses and led the writing with significant contributions from all authors.

Unpublished manuscript.

(18)

Author affiliations

1 Biodiversity, Macroecology & Biogeography Group, University of Goettingen, 37077 Goettingen, Germany

2 Research Group Functional Biogeography, Max Planck Institute for Biogeochemistry, Jena, Germany

3 Ecology, Department of Biology, University of Konstanz, D-78457 Konstanz, Germany

4 Department of Biosciences, Durham University, South Road, Durham DH1 3LE, United Kingdom

5 Division of Conservation Biology, Vegetation and Landscape Ecology, University of Vienna, 1030 Wien, Austria

6 Centre for Invasion Biology, Department of Botany and Zoology, Stellenbosch University, Matieland 7602, South Africa

7 Czech Academy of Sciences, Institute of Botany, Department of Invasion Ecology, CZ-252 43 Průhonice, Czech Republic

8 Department of Ecology, Faculty of Science, Charles University, Viničná 7, CZ-128 44 Prague, Czech Republic

9 Zhejiang Provincial Key Laboratory of Plant Evolutionary Ecology and Conservation, Taizhou University, Taizhou 318000, China

10 German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103 Leipzig, Germany

11 Conservatoire et Jardin botaniques de la Ville de Genève, 1202 Genève, Switzerland

12 Naturalis Biodiversity Centre, Darwinweg 2, 2333 CR Leiden, Netherlands

13 Wageningen University, Biosystematics Group, Droevendaalsesteeg 1, 6708 PB Wageningen, Netherlands

14 Botanical Garden-Institute of the Far Eastern Branch of the Russian Academy of Sciences, Vladivostok, 690024, Russia

(19)

General Introduction

Historical biogeography and the significance of islands

Terrestrial plant life has endured more than 400 million years of geological, environmental, and geographical change (Morris et al., 2018). The effects of this eventful past are preserved in the complex distribution and striking variation of today’s plant diversity. Among the three to four hundred thousand species of extant vascular plants (Christenhusz & Byng, 2016;

Willis, 2017), examples range from miniscule aquatic herbs (Wolffia arrhiza, Díaz et al., 2016) to giant forest trees (Sequoia sempervirens, Díaz et al., 2016), from narrow-ranged endemics (Erica capensis, Helme & Trinder-Smith, 2006) to global cosmopolitans (Phragmites australis, Eller et al., 2017), and from ancient evolutionary relics (Amborella trichipoda, Poncet et al., 2013) to members of recent radiations (Lupinus semperflorens, Hughes & Eastwood, 2006).

Understanding how such diversity patterns vary in space and time is among the most fundamental questions in ecology (Pennisi, 2005; Sutherland et al., 2013). The respective scientific discipline, focusing on the systematic investigation of spatiotemporal variations in biodiversity, is termed biogeography (Lomolino et al., 2016).

In the 18th century, early naturalists started to realize that the spatial distribution of species is highly structured. Carl Linnaeus (1707-1778) noted that species are adapted to certain environments and do not occur outside their preferred range of conditions. Georges-Louis Leclerc, Comte de Buffon (1707-1788) added to this observation that distant locations generally harbor distinct sets of species, irrespective of their climatic and environmental similarity (Lomolino et al., 2016). Subsequently, eminent researchers such as Johann Reinhold Forster (1729-1798), Sir Joseph Banks (1743-1820), Augustin-Pyrame de Candolle (1778- 1841) and, especially, Alexander von Humboldt (1769-1859) further consolidated the emerging field of biogeography, documenting latitudinal and elevational gradients in species diversity, defining biogeographic regions, and expressing first ideas of mutual interactions influencing the distribution of species (Lomolino et al., 2016, see also Plate 1). These contributions greatly helped to understand the relationship between species distributions and contemporary environmental conditions, but could not sufficiently explain biogeographical patterns such as the abrupt faunal change within the Malay Archipelago or the unique biotas of oceanic islands. These and other observations were finally put into perspective by Charles Darwin (1809-1882) and Alfred Russel Wallace (1823-1913). Their independent discovery of evolution by means of natural selection (Darwin & Wallace, 1858; Darwin, 1859) provided the key to understanding species distributions – in fact, species themselves – as the current endpoints in a series of past geological, climatic and ecological dynamics. It is not a coincidence that the ideas of both Darwin and Wallace were substantially inspired by observations they had made on islands.

(20)

Plate 1: Seminal works by early biogeographers on the distribution of plant diversity.

Top: Elevational zonation of the Ecuadorian Andes including Mt. Chimborazo (Humboldt, 1805-1834). Left:

Global floristic regionalization (Grisebach, 1866). Grisebach acknowledged the uniqueness of island floras by placing them in a separate category (zone 24: “Oceanische Inselfloren”).

Islands are exceptionally informative subjects of biogeographical research. Islands are characterized by isolated, comparatively simple biotas, well-defined geographical boundaries (Gillespie, 2007), and feature a large range of climatic (e.g. temperature, precipitation, seasonality), geographical (e.g. area, elevation, isolation) and historical (e.g. island age, geological origin, Pleistocene impacts) conditions (Weigelt et al., 2013). This makes islands ideal model systems for studying evolutionary, ecological and biogeographical processes at large spatial scales, where experimental manipulations are infeasible (Vitousek, 2002;

Whittaker & Fernández-Palacios, 2007; Whittaker et al., 2017). Moreover, islands disproportionately contribute to global biodiversity (Myers et al., 2000; Barthlott et al., 2005) and feature some of the highest endemism rates worldwide (Kier et al., 2009) while being known hotspots of biological invasions and species extinctions (Sax & Gaines, 2008; van Kleunen et al., 2015). For these reasons, islands are highly relevant study systems from both a methodological and ecological point of view.

The unique properties of islands inspired another seminal work that holds relevance to this date: the equilibrium theory of island biogeography (ETIB, MacArthur & Wilson, 1963, 1967). Similar to the theory of evolution by natural selection, the ETIB laid out a radically new perspective that describes complex biotic patterns as the outcome of only a few

(21)

fundamental processes. According to the ETIB, the species number of an island arises dynamically from opposing rates of immigration and extinction that vary with island isolation and area, respectively. The simple yet elegant mathematical formulation of the model prompted a shift towards a more quantitative approach to ecology and biogeography (Simberloff, 1969; Levin, 1974b; Connor & McCoy, 1979; Hubbell, 2001). In fact, the simplicity of the ETIB was the key to its immense success, as it provided a generalizable framework for predicting species richness across different taxa and geographical settings (Simberloff, 1974; Santos et al., 2016), including insular habitats on the mainland such as mountain tops (Brown, 1971), lakes (Browne, 1981), or forest fragments (Harris, 1984).

Many aspects of island biodiversity, however, remained beyond the scope of the ETIB and its extensions. In particular, compositional and morphological features of island biota proved notoriously hard to predict from analytical models, as they result from a complex interplay of island- and taxon-specific characteristics, evolutionary dynamics, and stochastic events (Whittaker & Fernández-Palacios, 2007). Scientific progress on these more intricate aspects of island biodiversity therefore was based on natural-historical observations and conceptual models. Especially the work of Carlquist (1965, 1966a, 1966b, 1966c, 1966d, 1974) greatly advanced the understanding of assembly processes on islands. Accordingly, species immigration and extinction are characterized by selective ecological filters (Carlquist, 1965):

on the one hand, dispersal filtering prevents species with poor dispersal abilities from crossing the open sea; on the other hand, environmental filtering prevents the establishment of species that cannot persist under the predominant biotic and abiotic conditions of the island. Successful colonizers find themselves in a new ecological and evolutionary arena and – given a sufficient amount of time and reproductive isolation from the mainland – potentially diversify and/or adapt to the local conditions. This sequential view of assembly processes has helped to understand many peculiar features of island biota such as the over- or under-representation of certain taxa (Carlquist, 1965; Hoekstra & Fagan, 1998) or common evolutionary trends (e.g. insular woodiness or loss of dispersal capacity; Carlquist, 1966b, 1970; Whittaker & Fernández-Palacios, 2007). Furthermore, it provided a framework for deriving testable hypotheses regarding the taxonomic, functional, and phylogenetic composition of island biota (Midway & Hodge, 2012).

Biogeographical research has impacted our understanding of the natural world in many ways, and islands have played a central role in this process. Due to their geographical isolation and ecological simplicity, islands represent excellent study systems, which helped uncover fundamental mechanisms of evolution (natural selection), community assembly (immigration, extinction and speciation). However, the two classical research paradigms in (island) biogeography, natural history (Humboldt, 1805-1834; Wallace, 1881; Carlquist, 1965) and mathematical modelling (Arrhenius, 1921; MacArthur & Wilson, 1967; Hubbell, 2001), have been unable to fully bridge the gap between detailed descriptions and robust generalizations. Consequently, a novel approach – rigorously quantitative yet capable of resolving the complexities of ecological systems – was required.

(22)

The macroecological approach

Macroecology seeks to understand ecological phenomena at large spatiotemporal scales by analyzing emergent statistical patterns in the distribution, abundance and diversity of organisms (Brown & Maurer, 1989; Brown, 1995; Kent, 2005). This data-driven approach offers a powerful toolkit for island biogeographical research (Kueffer et al., 2014). Kreft et al.

(2008), for example, analyzed the effects of area, isolation, climate, topography and geology on the number of native vascular plant species in 1458 island and mainland floras, providing the first quantitative synthesis on the drivers of insular species richness. Their finding that on islands, but not on the mainland, area is the most important predictor of species richness showed that area-mediated effects on species richness – e.g. speciation rate, extinction rate, or carrying capacity – differ in strength across geographical settings. Macroecological approaches have also been critical for testing theoretical frameworks such as the general dynamic model of island biogeography (Whittaker et al., 2008), which postulates that rates of key ecological processes on islands vary over geological timescales. The major prediction of this model, that species richness follows a hump-shaped relationship with island age, has been empirically confirmed for multiple archipelagos and taxa (Whittaker et al., 2008;

Cameron et al., 2013; Lenzner et al., 2017).

The power of the macroecological approach is manifest most clearly when looking not just at species numbers, but also at species composition. Knowing which species occur in a given geographical area, and not just how many, opens up entirely new research avenues. Species identities establish a link to the wealth of species-specific information on functional traits, taxonomic and phylogenetic relationships, biotic interactions, and abiotic preferences that constitute the basis for a statistical (i.e. macroecological) characterization of species assemblages. This makes aspects of island biodiversity that used to be too complex for analytical models tangible. In recent years, the potential of species-level macroecological approaches has been demonstrated by numerous studies, for example on the beta diversity (Stuart et al., 2012; Cabral et al., 2014), functional characteristics (Santos et al., 2015; Whittaker et al., 2014), or phylogenetic structure (Cardillo et al., 2008; Weigelt et al., 2015) of island biotas. However, many fundamental questions in island biology and biogeography remain to be addressed (Patiño et al., 2017).

The focus on statistical patterns makes macroecology a particularly data-intensive discipline, whose capacity to produce novel ecological insights is highly dependent on the availability and quality of ecological data (Kueffer et al., 2014). The rise of macroecology within the last two decades (Beck et al., 2012) has been enabled and accompanied by the rapid growth of ecological databases. Today, unprecedented amounts of data on the spatial distribution (Global Biodiversity Information Facility, GBIF, 2018; Map of Life, Jetz et al., 2012), functional traits (TRY, Kattge et al., 2011a), taxonomic affiliations (TPL, The Plant List, 2013;

TNRS, Boyle et al., 2013) and (phylo-)genetic relationships (Genbank, Benson et al., 2005;

TreeBASE, Piel et al., 2009) of plant species are available. Moreover, modern geospatial data

(23)

products allow for a global characterization of abiotic, biotic and socioeconomic variables with high accuracy and at high spatiotemporal resolutions (e.g. Karger et al., 2017; Hengl et al., 2017; Copernicus Global Land Service, 2018).

Despite these developments, our knowledge of biodiversity continues to be limited by the lack of ecological data (Taugourdeau et al., 2014; Hortal et al., 2015). Some data limitations are inevitable and arise from fundamental constraints (e.g. in terms of money, time, labor, etc.) on the spatiotemporal resolution at which biodiversity can be measured (Hortal, 2008), but others can be overcome by a coordinated utilization and integration of existing data resources. One potential area of improvement is the common practice of using local (i.e.

highly resolved) diversity data such as point occurrences or vegetation plots to address questions at continental or global scales (see e.g. Moles et al., 2007; Moles et al., 2009;

Morueta-Holme et al., 2013; Vellend et al., 2013). This mismatch in scales entails two pitfalls that may compromise the reliability of ecological inferences. First, highly resolved diversity data are particularly affected by the above-mentioned constraints on the ability to measure biodiversity, and therefore exhibit severe deficits in terms of large-scale geographical, temporal and taxonomic coverage (Gonzalez et al., 2016; Meyer et al., 2016). Second, highly resolved diversity data reflect local ecological processes and do not scale up to large geographical extents, where other factors such as climate and biogeographical history regulate biodiversity (Huston, 1999; Hortal, 2008, but see e.g. Azaele et al., 2015). A viable way to overcome these drawbacks is to align the scale of the analyzed data with that of the research question, which emphasizes a stronger utilization of relatively coarse-grained, but sufficiently complete and representative diversity data to address macroecological problems.

Study outline

With the present thesis, I aim at elucidating the assembly of island floras from a macroecological perspective, with a particular focus on the taxonomic and functional composition of island plant assemblages. The four research chapters recapitulate major steps towards this objective.

In Chapter 1, I provide a general perspective on the opportunities and challenges of data integration for macroecological research. I examine the availability, applicability and utilization of different types of plant diversity data and show that (1) the macroecological data landscape is dominated by disaggregated data (e.g. point occurrence records, trait measurements) as opposed to aggregated data (e.g. species checklists, taxonomic monographs), and that (2) major data providers mostly focus on a single domain of data (e.g.

distributions, functional traits, genetic sequences). I argue that a stronger integration of data across domains and different levels of aggregation has considerable potential for improving data coverage and representativeness at global scales. I describe generalizable strategies for the effective collection, mobilization, imputation and integration ecological data with a particular focus on plant distributions and functional traits. Finally, I present three case

(24)

studies that highlight the potential of macroecological data integration for answering fundamental ecological and (island) biogeographical questions.

In Chapter 2, I present the Global Inventory of Floras and Traits (GIFT) database. GIFT represents the basis of all empirical studies in this thesis (Chapters 3 and 4, case studies in Chapter 1) and implements many concepts and ideas outlined in Chapter 1, in particular the utilization of aggregated data (e.g. species checklists and Floras) and the integration of data from multiple domains (e.g. species distributions, functional traits, taxonomic and phylogenetic information, geographical characteristics). The chapter provides detailed information on the technical design, processing workflows and data coverage of GIFT.

In Chapter 3, I assess the drivers of species turnover among vascular plant assemblages on islands and the mainland. I use generalized linear models to compare the distance decay of similarity, i.e. species turnover as a function of geographic distance among sites, for different taxonomic and functional plant groups (angiosperms, gymnosperms, pteridophytes, trees, shrubs, herbs) on islands and the mainland. I then apply generalized dissimilarity models to quantify the unique effects of geographic distance and climatic variables in creating species turnover among island and mainland assemblages, respectively. Finally, I present a global prediction of species turnover across a high-resolution equal-area grid.

In Chapter 4, I address the phenomenon of island disharmony, the biased representation of higher taxa on islands compared to their mainland source regions. I present a novel method for identifying island-specific species source regions and develop a measure that quantifies the compositional disharmony of a given island flora. I analyze this measure for 320 islands as a function of important island biogeographical variables (distance from the mainland, area, geological origin, climatic conditions), providing the first global, quantitative assessment of island disharmony to date. Furthermore, I analyze the global over- or under-representation of 450 vascular plant families on islands as a function of family-specific characteristics that presumably affect colonization success (range size, species number and age; functional traits related to dispersal ability, reproduction and life-history).

(25)

1 Global integration of plant diversity data – the significance of data resolution and domain

Christian König, Patrick Weigelt, Julian Schrader, Amanda Taylor, Jens Kattge and Holger Kreft

1.1 Abstract

Recent years have seen an explosion in the availability of biodiversity data describing the distribution, function, and evolutionary history of life on earth. Integrating these heterogeneous data remains a challenge due to large variations in observational scales, collection purposes and terminologies. While seminal projects for the integration of disaggregated biodiversity data (e.g. point occurrence records, trait measurements) have been established, aggregated data types (e.g. Floras, taxonomic monographs) have received less such attention, leaving a major source of information on global biodiversity largely untapped.

Focusing on plant distributions and functional traits, we here demonstrate the synergies arising from a more tight integration of biodiversity data across domains and resolutions. To this end, we lay out effective strategies for data collection, mobilization, imputation, and sharing, and summarize existing frameworks for scalable and integrative biodiversity research. In three case studies related to the global distribution of plant growth forms, the latitudinal gradient of seed mass, and the global prevalence of insular woodiness, we highlight the potential of aggregated data for biodiversity research and improving the representativeness and completeness of biodiversity data in general. Our results show the need for a more extensive use of available data resources for achieving a both precise and general picture of global biodiversity.

(26)
(27)

1.2 Introduction

Minimizing the negative ecological impacts of habitat loss (Watson et al., 2016), climate change (Pachauri et al., 2014), and species invasion (Seebens et al., 2017) is one of the major challenges of this century and requires a detailed understanding of global biodiversity (Kerr et al., 2007; Barnard & Thuiller, 2008). In this context, vascular plants constitute a critical group, as they are key providers of biochemical energy and habitat structure. At the same time, the sheer magnitude of plant diversity renders an exhaustive assessment of even its most basic dimensions, e.g. the number of extant species, difficult (Brown & Lomolino, 1998). This effect is further amplified when looking at more complex, often interdependent aspects such as species distributions, functional traits, or phylogenetic relationships, and becomes increasingly pervasive at small informational grain sizes (Hortal et al., 2015). Despite these existing shortfalls in on our knowledge of global plant diversity, recent years have seen an explosion in both the availability (Kattge et al., 2011a; GBIF, 2018; Maitner et al., 2018) and large-scale utilization (Zanne et al., 2014; Díaz et al., 2016; König et al., 2017; Butler et al., 2017; Smith & Brown, 2018) of plant diversity data. This data-driven paradigm has been recognized as key for reducing the shortfalls in biodiversity knowledge and building a sufficiently robust understanding of global biodiversity to address the pressing challenges imposed by global change (Kelling et al., 2009; Hampton et al., 2013).

Biogeography is a key discipline for the integration of heterogeneous biodiversity data, as it brings together the two principal dimensions of ecology – the organism and the environment – at large spatiotemporal scales. Biogeographical data can therefore be integrated with a variety of organismic (e.g. taxonomic, functional, phylogenetic) and environmental (e.g.

climate, soil, topography) information. A particularly promising branch of biogeography is functional biogeography. Functional biogeography focuses on documenting and understanding the geographical variation in traits, utilizing ideas, concepts, and methods from a variety of disciplines including ecosystem ecology, evolutionary biology, earth sciences, and ecoinformatics (Violle et al., 2014). In particular, functional biogeography adds a spatial dimension to functional ecology and is thus relevant for a variety of research areas, in which adopting a functional perspective has stimulated substantial scientific progress, e.g.

community ecology (McGill et al., 2006; Stegen & Swenson, 2009), biodiversity research (Petchey & Gaston, 2002; Lamanna et al., 2014), ecosystem ecology (Díaz et al., 2007; Bello et al., 2010), or conservation biology (Cadotte et al., 2011; Ostertag et al., 2015). Moreover, the integration of species distributions and functional traits opens up new and interesting research questions: How are different aspects of functional diversity distributed in space? Is there a consistent relationship between functional diversity and ecosystem functioning across habitats, ecosystems, or biomes? Which functional properties are particularly sensitive to climate and land-use changes, and where do they occur most frequently?

Data-driven functional biogeography – and biodiversity research in general – has to bridge the gap between fine-scale precision and global representativeness. This gap is reflected by

(28)

the variety of existing data types, ranging from highly resolved point occurrence records and trait measurements to relatively coarse, but also more representative data types such as Floras and taxonomic monographs. Consequently, the integration of biodiversity data across multiple resolutions is crucial for overcoming the deficits of individual data types and constitutes a key requirement for developing a deeper understanding of global biodiversity (Jetz et al., 2012). This poses new scientific challenges, e.g. with respect to data sharing and collaborative research (Hampton et al., 2015; Michener, 2015a), the representativeness of large-scale datasets (Engemann et al., 2015; Meyer et al., 2016), or the effective integration of multiple data types (Jetz et al., 2012; La Salle et al., 2016).

Focusing on plant distributions and functional traits, our aim here is to help address these challenges in order to realize the full potential of plant diversity data. First, we characterize common data types with respect to their informational resolution and domain, and highlight general trade-offs across biodiversity data. Based on that, we outline strategies for the effective utilization and integration of plant diversity data across domains and resolutions.

We provide suggestions for improving data collection, identify potentials for data mobilization, and describe methods for filling data gaps through imputation. Furthermore, we discuss methodological, sociocultural, and information technological barriers that currently impede the large-scale integration of biodiversity data. We present three case studies based on the Global Inventory of Floras and Traits database (see Box A1.1, Chapter 2), a novel resource for functional biogeography, to demonstrate how already the integration of selected aggregated data types allows tackling fundamental questions in ecology and biogeography related to (1) the global distribution of plant growth forms, (2) the latitudinal gradient in seed mass and (3) the prevalence of insular woodiness on oceanic islands.

1.3 Data as key to global plant ecology

1.3.1 Data domains, types and resolution

Biodiversity science can be organized into different domains that cover distinct spheres of knowledge, e.g. of the taxonomic classification, geographical distribution, functional traits or abiotic tolerances of organisms (Hortal et al., 2015). A domain is typically associated with a set of domain-specific data types (Figure 1.1). Species distributions, for example, can be represented by point occurrences, plot networks, checklists, or expert range maps.

Functional trait data may come in the form of field measurements for individual plants, or as aggregated values for populations, species, or higher taxonomic groups (e.g. genera or families). In addition, some biodiversity data types combine information from multiple domains, e.g. regional Floras representing a source of both distributional and functional information.

(29)

Figure 1.1: Selected biodiversity data types, arranged according to their primary domain (species distributions vs. functional traits) and informational resolution (disaggregated vs. aggregated).

Existing projects that integrate global plant diversity data are often domain-specific (e.g. Map of Life:

Jetz et al., 2012; TRY: Kattge et al., 2011a, GBIF, 2018) or focus on the disaggregated end of the data spectrum (e.g. BIEN: Enquist et al., 2016). Complementing the ecological data landscape with aggregated data (e.g. GIFT, see Chapter 2) creates strong synergies and facilitates biodiversity data integration across domains and resolutions.

Across different data types, there is a trade-off between high informational resolution on the one hand, and completeness and representativeness on the other (Rondinini et al., 2006). This trade-off is important, because data resolution affects the precision (i.e. certainty) of ecological inferences, whereas data representativeness affects their accuracy (i.e. correctness) (Walther &

Moore, 2005; Hortal et al., 2015). Disaggregated data, e.g. point occurrences or trait measurements, generally have a high informational resolution, which is necessary to address questions at the level of populations or communities (Bolnick et al., 2011; Meyer et al., 2018).

However, at macroecological scales, disaggregated data often exhibit deficits in terms of completeness and representativeness (Schrodt et al., 2015; Engemann et al., 2015; Meyer et al., 2016). In contrast, aggregated data, e.g. regional floras and checklists, or taxonomic monographs, provide a mostly complete and representative account of their subject region or taxon (Frodin, 2001; Farjon, 2010) but are limited in their capacity to resolve fine-grained ecological information (Figure 1.1).

Major projects for biodiversity data integration focus primarily, though not exclusively, on the disaggregated end of the data spectrum, e.g. the Global Biodiversity Information Facility

AGGREGATED

(precision: low, completeness and representativeness: high)

DISAGGREGATED

(precision: high, completeness and representativeness: low)

TRAITS DISTRIBUTIONS

Point occurrence records

Expert range maps

Floras

Aggregated traits (higher taxon level) Plot data

Aggregated traits (species level) Trait measurements (individual level)

Regional checklists

Trait measurements (intra-individual level)

Taxonomic monographs

(30)

(GBIF) for species occurrence records, TRY for primary trait data or the Botanical Information and Ecology Network (BIEN) for primary data on New World plant distributions and functional traits (see also Figure 1.1). A systematic compilation of existing aggregated plant diversity data to complement these initiatives is still missing. GIFT, the Global Inventory of Floras and Traits database (Chapter 2), is a contribution towards filling this gap and building a robust baseline for global plant diversity research.

1.3.2 Data collection and processing

The integration of biodiversity data starts in the field – with the primary biodiversity data collected in surveys, experiments, citizen science projects and other research efforts. Such data is usually specifically tailored to answer a particular research question. Thus, robust ecological generalizations require large quantities of (disaggregated) primary or (aggregated) derived data that is organized and integrated in comprehensive biodiversity databases. The quality and coverage of such databases can be greatly improved when primary research projects put strong emphasis on the utility and re-usability of collected data for secondary scientific purposes (Michener & Jones, 2012).

The utility of primary data for data integration efforts can be increased in several ways. First, focusing on regions, ecosystems, plant groups, or functional traits that are currently underrepresented in global biodiversity databases increases the general interest in the collected data as well as the study itself. Coverage analyses based on integrated biodiversity resources can provide guidance by identifying knowledge gaps and setting research priorities (Meyer et al., 2016). Second, cross-institutional coordination of research projects creates synergies through standardized methods and complementary research foci. Research networks such as the International Long Term Ecological Research Network (ILTER, see Vanderbilt & Gaiser, 2017) provide an ideal framework to utilize these synergetic effects (Peters et al., 2014b). Third, an efficient study design helps to maximize the data output given the available resources. This can be aided, for instance, by statistical power analyses (Johnson et al., 2015), optimizing study logistics and surveying effort (Moore & McCarthy, 2016), and cooperating closely with local field guides and botanists (Elbroch et al., 2011). Throughout the process of data collection, digital solutions such as Open Data Kit (Brunette et al., 2013) can help to conveniently enter, cross-check, annotate and aggregate field data. This increases data integrity and provides crucial meta-information for later quality assessments and integration efforts.

The re-usability of primary data can be ensured by adopting existing data standards and protocols. The Plant List (2013) provides a widely-accepted basis for resolving and standardizing plant species names. Software packages such as taxonstand (Cayuela et al., 2012), taxize (Chamberlain & Szöcs, 2013) or the taxonomic name resolution service (Boyle et al., 2013) help to utilize The Plant List and other authoritative taxonomic resources to resolve thousands of species names at a time. With respect to functional traits, defined measurement

(31)

protocols (Pérez-Harguindeguy et al., 2013) and terminologies (Garnier et al., 2017) facilitate interoperability across research projects. The exchange of diversity data is supported by data standards like the Darwin Core Archive (Wieczorek et al., 2012) or the Humboldt Core Archive (Guralnick et al., 2017). Finally, innovative publishing frameworks such as the Biodiversity Data Journal (Pensoft, 2017) or the GBIF Integrated Publishing Toolkit (GBIF, 2018) allow for a quick publication of standardized and easily accessible datasets.

1.3.3 Data mobilization

The increasing digitization of scientific collections and literature has set ecology up for the age of “Big Data” (Hampton et al., 2013). The Global Biodiversity Information Facility (GBIF, 2018), for example, currently provides access to more than 208 million occurrence records of vascular plants, 62 million of which are derived from preserved herbarium specimens. While this is a substantial achievement, specimen records encode more than just distributional information (Beaman & Cellinese, 2012). In particular, the (semi-)automated extraction of traits from herbarium specimens represents an area of largely unused potential.

Standardized measurements on collected plant material may be incorporated into digitization workflows, potentially yielding thousands of geographically defined records of e.g. specific leaf area (Queenborough & Porras, 2014) or phenological plant information (Gallinat et al., 2018). Also, images of already digitized specimens can be used to retrieve certain functional traits, e.g. leaf size (Corney et al., 2012). Nonetheless, the set of traits that can be (non- destructively) obtained from herbarium specimens excludes many important characteristics, e.g. plant growth form, vegetative height, or stem specific density.

Another way to mobilize substantial amounts of ecological data – mainly from the aggregated end of the data spectrum – lies in the botanical literature. Generations of botanists have produced thousands of Floras, species checklists, and taxonomic monographs. Vascular plants are among the most intensively studied groups, and with some exceptions, almost any region on earth has been subject to some form of floristic inventory (Frodin, 2001). Such resources provide expert-validated distributional information, often including the biogeographical status of the listed species (e.g. endemic, native, introduced). Moreover, descriptions of general morphology, life history, flowers, fruits, seeds, phenology and other features of the covered taxa are often available. Massive efforts to make biodiversity literature digitally available and searchable are underway (e.g. www.biodiversitylibrary.org, www.plantsoftheworldonline.org) and machine learning algorithms are becoming increasingly successful at extracting information from loosely structured text data (Collobert et al., 2011; LeCun et al., 2015). Considering the wealth of information contained in published floristic literature, the development of general, scalable methods for data extraction seems to be central for improving the coverage of biodiversity databases. Machine learning techniques such as recurrent neural networks might be particularly suited for this task. First studies using machine learning to extract trait information from floristic descriptions show promising results (Hoehndorf et al., 2016).

(32)

1.3.4 Data imputation

Data imputation is a technique where missing or inconsistent data items are replaced with estimated values (OECD, 2013) and represents an inexpensive yet powerful way to improve data coverage in ecological datasets. A conceptual distinction can be made between logical and statistical imputation methods (Figure 1.2).

Figure 1.2: Comparison of logical and statistical data imputation. Logical imputation infers a limited quantity of highly certain data (e.g. deducing woodiness status from growth form), whereas statistical imputation yields large quantities of less certain data (e.g. predicting a suite of functional traits from sparse records).

Logical imputation uses unequivocal relationships among data to infer new values. This is possible either when data is categorically nested, e.g. trees always being woody (Beentje, 2016), or linked by mathematical relationships, e.g. leaf mass per unit area (LMA) being the inverse of specific leaf area (SLA). While the considerations underlying logical imputation seem rather trivial, the approach has yet to be widely used for complementing plant diversity data. Applications of logical imputation include, for example, (1) the propagation of information from complex functional traits to more simple ones (see Figure A1.2), (2) the imputation of species-level traits when a higher taxon is known to be uniform with respect to that trait, or (3) the improvement of regional species checklists based on geographically nested occurrence records or plot data. The main advantage of logical imputation is that the results can be treated with the same certainty as the underlying data. This makes it a particularly suitable approach for building and extending repositories of primary data. At the same time, logical imputation helps to harmonize data that uses differing terminologies by embedding it in a logical hierarchy (e.g. bee-pollination, insect-pollination, and animal- pollination form nested subsets of pollination syndromes). However, considering that such clear hierarchical relationships are scarce among biodiversity data, the gap-filling potential of logical imputation is limited.

Logical imputation Statistical imputation

Data relationship Hierarchical (one-to-many) or bijective (one-to- one)

Correlative (many-to-many)

Imputation method Logical deduction Statistical prediction

Gap-filling potential Limited Very high

Certainty of results Very high (depending on correctness of input data and specified relationships)

Variable (depending on correlative structure of input data and model performance)

Applications (examples)

Hierarchical deduction of categorical traits („tree“

„woody“) or occurrence information („occurs in Yasuni National Park“ „occurs in Ecuador“)

Bayesian Hierarchical Probabilistic Matrix Factorization (Schrodt, 2015), Multiple Imputation by Chained Equations (Azur, 2011)

A B

C

A B C A

B C

Referenzen

ÄHNLICHE DOKUMENTE

[r]

This kind of incompatibility also extends to data-base systems. Much of the information content of a system of files is not in the specific data recorded, but in its organization and

Can programming language semantics be used for program verification?.?. Chapter 9:

Why are assertions in Hoare logic be formalized as functions?. Can Hoare logic proofs be done

What can be expressed in small step semantics that is not directly expressable in big step semantics?. ©Arnd Poetzsch-Heffter

The text aims to identify and present the latest research on relevant theory and practice contexts, and also to capture learning designs and technologies that demonstrate ways

Any capital gains derived from the disposal of real estate proper- ties situated in Switzerland by (Swiss or foreign) companies are either subject to corporate income tax or

Field of phrasemes like somatisms in order to language unit character is taken into account in the first section as well as dependence of phrasemes like their meaning explanation