Data-driven science requires not only fast storage systems but also strategies to manage this data efficiently within and across data centers. Big data tools can satisfy the need for searching data based on user specific metadata, however, there is a zoo of tools available and no single tool can realize all the requirements a HPC system in a data center requires. Data lakes, for example, are a reasonable approach but there are alternative concepts and tools that also need to be considered. A uniform and consistent view to the millions of scientific data files on HPC systems and their efficient processing is required to maximize exploitability and prevent segmented data silos between users or projects.

Project Goals

Aim of the project is to critically investigate state-of-the-practice of data management concepts at NHR centers and bring forward joint developments and training for the scope of data management. We expand previous activities of using data lakes for HPC systems with a broad data-centric view which ultimately should fuel the data exchange between centers. Over a period of one year, in the project we will a) investigate and develop methods for efficient data handling at NHR centers. In particular, suitability and performance of existing (general and domain-specific) research data management solutions for HPC systems are explored. b) develop a concept for data exchange between centers. This involves performance aspects of the data transfer with a focus on network tests between centers with testing of tools and optimizations, and organizational aspects, e.g., user identity management and permission of data for the transfers. c) investigate performance of storage systems and compare it across centers. The goal is to expand the previous conducted tests involving HPC file systems and object storage systems and to exchange experience and performance results within the NHR. d) form communities and create training material for typical use cases. For the aforementioned efforts we organize workshops and create training materials for the NHR centers.

GWDG’s Role in the project

GWDG is organizing this project and performs all tasks in close collaboration with the involved partners.

Project Partners

  • Zuse Institute Berlin
  • Technische Universität Dresden (TUD)
  • RWTH Aachen

Deliverables

Storage Report

Contact

Hendrik Nolte

Duration

01.01.2023 - 31.12.2023

Funded by

NHR Zukunftsprojekte