• Keine Ergebnisse gefunden

The availability, quality and interoperability of data is paramount to the progress of biogeography and ecology as increasingly data-intensive disciplines (Michener & Jones, 2012;

Hampton et al., 2013; Franklin et al., 2017). Here, we demonstrated how the explicit consideration of data resolution offers new perspectives on the compilation and integration of plant diversity data. Our results show that a near-complete collection of coarse-grained plant distributions and basic functional traits is within reach, when exploiting the full potential of data mobilization and imputation. This offers new opportunities for plant diversity research in general.

Currently, studies and projects for the integration of global plant diversity are mostly based on disaggregated data. While this approach has been a highly successful line of research (Swenson et al., 2012; Moles et al., 2014; Díaz et al., 2016), the pervasiveness of biases and gaps in disaggregated biodiversity data is of increasing concern to ecologists (Boakes et al., 2010; Engemann et al., 2015; Sandel et al., 2015; Meyer et al., 2016). We have shown that systematic utilization of aggregated data can help address this problem. First, aggregated data provide a coarse but more complete and less biased picture of geographical variation in taxonomic, functional and phylogenetic diversity. This offers much-needed baselines against which the completeness of disaggregated data can be evaluated in order to quantify and map gaps in global biodiversity knowledge (Hortal et al., 2015; Franklin et al., 2017). Second, aggregated data provide prior information about the geographical and statistical distribution of more highly resolved, but potentially incomplete or biased ecological variables. This knowledge can be used, for instance, to inform functional biogeographical analyses (see case study 2), to improve species distribution and niche models (Merow et al., 2016), or to parametrize ancestral state reconstructions (Pagel et al., 2004) and dynamic global vegetation models (Scheiter et al., 2013). Third, aggregated data capitalize on expert knowledge to compensate for the varying availability and quality of primary (disaggregated) data.

Consequently, aggregated data types are not mere compilations of disaggregated data, but provide valuable additional information, e.g. reliable species absences or uniform functional traits for higher taxa. The potential of utilizing aggregated biodiversity data extends to other clades for which a wealth of literature exists, e.g. mammals, birds, or many groups of arthropods.

Data integration is potent not only across resolutions, but also across domains. Satellite-borne, multispectral imagery is a crucial component of biodiversity research, providing global high-resolution data of e.g. net primary productivity, vegetation cover or canopy height (Kuenzer et al., 2014). Advanced instruments will soon enable the derivation of similar data products for selected functional traits, which helps tracking changes in the biosphere at increasing spatial and temporal resolutions. Nevertheless, the identification of individual plants from space remains impossible for most practical purposes, which highlights the need for integrating in-situ and satellite-borne data to address ecological questions at global scales

(Jetz et al., 2016). Vegetation plot databases are another key source of plant diversity data, holding crucial information on species abundances and co-occurrences. BIEN demonstrates how the integration of specimen- and plot data with taxonomic, functional and phylogenetic information helps bridging the gap between local-, regional- and continental-scale ecological processes (Blonder et al., 2015; Engemann et al., 2016). Furthermore, biogeographical analyses could benefit from integrating contemporary species distributions with fossil records and phylogenies, and conservation planning could be aided by bringing together ecological, environmental and socioeconomic data within a consistent framework – the potential of cross-domain data integration remains to be fully explored.

The unparalleled pressure on our global biosphere renders a full utilization of all available biodiversity data imperative. Rapid advancements in information technology have brought down the technological barriers to this objective. It is now up to ecologists to keep pace with this development, and to work collaboratively on creating infrastructures for the integration of biodiversity data that bridge the gap between fine-scale precision and global representativeness.

2 GIFT – A Global Inventory of Floras and Traits for macroecology and biogeography

Patrick Weigelt, Christian König and Holger Kreft

2.1 Abstract

To understand the evolutionary history and geographic distribution of plant life on Earth, we need to integrate high-quality and global-scale distribution data with functional and phylogenetic information. Large-scale distribution data for plants are, however, often restricted to either certain taxonomic groups or geographic regions. For example, range maps only exist for a small subset of all plant species and digitally available point-occurrence information is strongly biased geographically and taxonomically. An alternative, currently rarely used source of information is represented by regional Floras and checklists, which contain highly curated information about the species found in clearly defined areas, and which together cover virtually the entire global land surface. Here we report on our recent efforts to mobilize this information for macroecological and biogeographical analyses in the GIFT database, the Global Inventory of Floras and Traits. GIFT integrates species distributions of land plants (focusing on vascular plants) with trait and phylogenetic information as well as region-level geographic, environmental and socio-economic data.

GIFT currently holds species lists for 2,893 regions across the whole globe including

~315,000 taxonomically standardized species names (i.e. c. 80% of all known land plant species) and ~3 million species-by-region occurrences. In addition, GIFT contains information about the floristic status (native, endemic, alien and naturalized) and takes advantage of the wealth of trait information in the regional Floras, complemented by data from global trait databases. Utilizing hierarchical and taxonomic trait imputation, GIFT holds information for 83 functional traits and more than 2.3 million trait-by-species combinations and achieves unprecedented coverage in categorical traits such as woodiness (~233,000 spp.) or growth form (~213,000 spp.). Here we present the structure, content and automated workflows of GIFT and a corresponding web-interface (http://gift.uni-goettingen.de) as proof of concept for the feasibility and potential of mobilizing aggregated biodiversity data for global macroecological and biogeographical research.

2.2 Introduction

Worldwide, about 382,000 vascular plant species form the basis of our terrestrial biosphere and provide key ecosystem services to humanity (Willis, 2017). Despite the long history of botanical exploration of our planet, the global distribution is only known for a subset of all plant species at comparatively coarse spatial grains (e.g. WCSP, 2014). In contrast to smaller and better known taxa like birds and mammals (BirdLife International, 2018; IUCN, 2018), high-quality species-level range maps or atlas data of plants are only available for certain well-studied groups (e.g. conifers in Farjon & Filer, 2013; cacti in Barthlott et al., 2015) or confined regions (e.g. Europe in Tutin et al., 1964–1980). Many research questions at the forefront of biogeography and macroecology, however, require a detailed knowledge of global plant distributions and, additionally, of species-level functional traits and phylogenetic relationships (e.g. Morueta-Holme et al., 2013; Weigelt et al., 2015; König et al., 2017).

Several national and international initiatives focus on mobilizing and aggregating plant distribution information. For instance, the Global Biodiversity Information Facility (GBIF, 2018) provides access to ~214 million point occurrences of vascular plant species from herbarium records and observations. These records are invaluable for plant ecology and conservation-related research, as they provide information about key aspects of species identity, time and place (Powney & Isaac, 2015). However, taxonomic, geographical and temporal biases (Hortal et al., 2015; Meyer et al., 2016) as well as the lack of important meta-information, like, for example, the floristic status at a given location (native, non-native, naturalized, etc.), limit their usefulness for macroecological research. An alternative source of information are Floras and checklists which, in contrast, present highly curated accounts of the plant species known to occur in a certain region. Floras and checklists are often based on decades to centuries of exploration and regional botanical work, and have profited from the expertise of generations of botanists. They aim at providing (near-)complete floristic inventories for a given region and thus provide information on species presences and their floristic status, and additionally allow for the inference of local species absences (Lobo et al., 2010; Jetz et al., 2012). So far, extensive compilations of plant checklists exist only for certain geographic regions (e.g. Ulloa Ulloa et al., 2017), taxonomic groups (e.g. Flann, 2009; WCSP, 2014), functional types (e.g. BGCI, 2017), or, for example, naturalized alien plants (van Kleunen et al., 2015; Pyšek et al., 2017).

In light of the increasing availability of biodiversity data, it is a major challenge to integrate various data types and to link data from different ecological domains representing species distributions, functional traits, phylogenetic relationships or environmental characteristics for analyses and cross-validation (see Chapter 1). Initiatives that integrate different types of distribution data with additional biotic or abiotic information are currently most comprehensive for particular geographic regions (e.g. BIEN for the Americas; Enquist et al., 2016) or other taxa (e.g. Map of Life for vertebrates; Jetz et al., 2012). However, the wealth of aggregated information in regional Floras and checklists (Frodin, 2001) allows for a

near-global characterization of plant distributions. In combination with functional traits from the botanical literature or large trait databases (e.g. Kattge et al., 2011a; Royal Botanic Gardens Kew, 2016) and ever-growing species-level phylogenies (e.g. Smith & Brown, 2018), this represents a promising basis for macroecological and biogeographic research.

Here, we present GIFT, the Global Inventory of Floras and Traits database, a new resource designed to integrate species distribution data and functional traits of vascular plants from regional Floras and checklists with phylogenetic information and geographic, environmental, and socio-economic characteristics (Figure 2.1). As such, the database architecture, workflows, and data of GIFT facilitate a wide array of macroecological and biogeographical analyses and may help to extent and validate other plant distribution and trait data resources.

The general concepts outlined here may serve as a role-model for aggregated species checklist and trait databases for other major taxonomic groups.

Figure 2.1: Conceptual framework of the Global Inventory of Floras and Traits database (GIFT).

The core information in GIFT are species occurrences in geographic regions (islands, political units, protected areas, biogeographical regions) based on Floras and checklist. At the level of the geographical regions, this information is linked to physical geographic, bioclimatic, and socioeconomic properties. At the level of the species, functional traits, taxonomic placement, and phylogenetic relationships are linked. This integration of species distribution data in the form of full regional inventories and regional and species characteristics allows for a wide variety of macroecological and biogeographical analyses of taxonomic, phylogenetic, and functional diversity as well as for the refinement and validation of other plant distribution and trait datasets.