NFDI4DataScience – National Research Data Infrastructure for Data Science

Project duration: October 2021 – September 2026

The vision of NFDI4DataScience (NFDI4DS) is to support all steps of the complex and interdisciplinary research data lifecycle in Data Science and Artificial Intelligence.

The past years have seen a paradigm shift, with computational methods increasingly relying on data- and often deep learning-based approaches, leading to the establishment of Data Science as a discipline driven by advances in the field of Computer Science. Transparency, reproducibility and FAIRness have become crucial challenges for Data Science and Artificial Intelligence due to the complexity of Data Science methods, often relying on a combination of code, models and data. NFDI4DS will promote FAIR and open research data infrastructures supporting all involved digital artifacts such as code, models, data, or publications through an integrated approach.

The overarching objective of NFDI4DS is the development, establishment, and sustainment of a national research data infrastructure for the Data Science and Artificial Intelligence research community. The key idea is to work towards increasing the transparency, reproducibility and fairness of projects, by making all digital artifacts available, interlinking them, and offering additional tools and services.

NFDI4DS will represent the Data Science and Artificial Intelligence research community in Germany, which is an interdisciplinary field rooted in Computer Science. In the initial phase, NFDI4DS will focus on application areas: language technology, biomedical research, information sciences and social sciences.

Challenges

  • Transparency: Due to the interdependencies of computational methods and models (code, data, models, ontologies), transparency about the inner workings of models, data used for training at any stage or the inherent limitations and biases of methods and models is essential.
  • Reproducibility: Recent studies in Computer Science, e.g. have documented that the recent paradigm shift towards data-driven and neural methods, combined with a lack of transparency, e.g. about data provenance or train/test splits, has led to fundamental reproducibility issues affecting key Computer Science areas and also other scientific disciplines that rely on big data and computational methods to an extent where it has become challenging to define the actual state-of-the-art in these areas.
  • FAIRness: Computational methods tend to learn and reinforce biases prevalent in the used data, leading to concerns about discrimination and ethical issues of state-of-the-art computational methods. In everyday life, we start to witness systems that use algorithms and data to automatically make decisions about personal life’s, such as guiding cars with autopilot or assessments for criminal justice. Algorithms draw conclusions from data, however, inconclusive, inscrutable, and misguided evidence may lead to bias and unjustified actions. Recent examples include unmasking of "Clever Hans" strategies in computer vision, disparities in gender classification, racial bias in population health management.

Solutions

  • Increase the transparency, reproducibility and fairness of projects
  • Promote a FAIR and open research data infrastructure for the research community
  • Support all steps of the complex and interdisciplinary research data lifecycle
  • Make all digital artifacts (such as code, models, data, or publications) available, interlink them, and offer additional tools and services
  • Develop, establish, and sustain a national research data infrastructure for the research community
  • Represent the Data Science and Artificial Intelligence community in Germany
  • Focus on four application areas in the initial phase: language technology, biomedical research, information sciences and social sciences

Partners

  • Fraunhofer-Gesellschaft (Fraunhofer FOCUS, FIT)
  • Leibniz-Universität Hannover
  • Leibniz-Zentrum für Informatik (Schloss Dagstuhl)
  • Leibniz-Informationszentrum Technik und Naturwissenschaften (TIB)
  • Universität zu Köln
  • Leibniz-Institut für Sozialwissenschaften (GESIS)
  • TU Berlin
  • RWTH Aachen
  • TU Dresden
  • Universität Leipzig
  • Informationszentrum Lebenswissenschaften (ZB MED)
  • Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI)
  • Leibniz-Institut für Informationsinfrastruktur (FIZ Karlsruhe)
  • Universität Hamburg
  • Leibniz-Informationszentrum Wirtschaft (ZBW)

Funding: DFG (Deutsche Forschungsgemeinschaft)