Proceedings of TDWG : Conference Abstract
Print
Conference Abstract
Setting up an Interdisciplinary Data Infrastructure: Why Cooperation between Domain Experts and Computer Scientists Matters - An Experience Report from the GFBio Project
expand article infoBirgitta König-Ries, Dagmar Triebel§, Robert Huber|, Falko Glöckler, Anton Güntsch#, Janine Felden|, Felicitas Löffler, Jana Hoffmann
‡ Friedrich-Schiller-Universität Jena, Jena, Germany
§ Botanische Staatssammlung München, Munich, Germany
| MARUM, Universität Bremen, Bremen, Germany
¶ Museum für Naturkunde Berlin, Berlin, Germany
# Freie Universität Berlin, Berlin, Germany
Open Access

Abstract

The German Federation for Biological Data (GFBio; Diepenbroek et al. 2014) is implementing a national infrastructure for the preservation, integration, and publication of biological data collected in German research projects. GFBio is built upon an archive infrastructure comprised of nine data centers including PANGAEA and the major German Natural Science Collections (German Federation for Biological Data (GFBio) 2017a). Creating and running GFBio requires close collaborations within a highly interdisciplinary consortium. Bringing together expertise from collections, scientists in the relevant fields, biodiversity informaticians and computer scientists proved to be essential for designing and building this system.

GFBio is currently in its second funding phase. Essential services, required for the operation of the future infrastructure, have been successfully implemented. The realized technologies and tools use globally accepted standards as well as innovative concepts e.g., for data visualisation or semantic integration.

A portal (https://www.gfbio.org) provides a common point of access to all GFBio services: data submission, data discovery, data visualisation and analysis, a terminology service, and a help desk. In addition, archived research data is shared with international information infrastructures such as the Global Biodiversity Information Facility (GBIF) and the Biological Collection Access Service (BioCASE).

As the data centers use different systems and thus internally build upon different data structures (German Federation for Biological Data (GFBio) 2017b), the search functionality integrated in the portal is an good example of the collaboration between teams of different expertise.

Since the aim was to provide an integrated, faceted search, it was necessary to agree on common fields that can be used to feed the facets. Therefore, the GFBio data centers agreed on using ABCD 2.06 (Access to Biological Collection Data) as a common standard and specified thirty elements for data exchange. Here, it was essential to bring together (1) domain experts for defining which facets they consider useful for an effective search, (2) computer scientists for providing the implementation based on Elasticsearch (Elasticsearch 2017), (3) biodiversity informaticians for defining mappings between different standards and (4) data curators from the GFBio data centers and long-term repositories for negotiating the set of mandatory fields. The starting point for broader research data management workflows was derived from high-quality data provided via publishing pipelines established at each data center.

With that, primary collection and research data are available with metadata and data units according to the ABCD community standard and are ready to be reused following the FAIR data principles (Wilkinson et al. 2016): Findable, Accessible, Interoperable, Re-usable. Consequently, interdisciplinary cooperation is the GFBio data portal’s measure of success.

Keywords

data curation, data management, research data, data archiving, FAIR principles, data portal, data standards, Access to Biological Collection Data, GFBio

Presenting author

Felicitas Löffler, Jana Hoffmann

References

login to comment