Theses Topics

Cookies and Tracking help us to give you a better experience on our website.

Theses Topics

You can find open theses topics here.

Token Management for an API to utilise HPC resources in generic workflows

The recently developed HPC API (HPCSerA) allows to submit HPC jobs from outside the HPC system itself. For that, a token per user is currently manually created and stored on the API server. The thesis should investigate a proper token management, e.g. a token for each compute side or workflow system, and a self-service for the token generation anddeletion. It should be evaluated if and how the token management can be integrated into a Identity Management (IdM) system of a data center can be achieved, using the GWDG as use case. The token management should be prototypically implemented and tested.

Please contact Sven Bingert ( for further details.

Implementation of an API specification to enhance the functionality of an Text- and Datamining system

The project MINE ( provides an interface to search for text from different sources. It also runs Text and Data Mining in the back-end to improve the meta data available for each text resource. In the past different application interfaces (API) specifications were developed by other research groups. These APIs are for searching and retrieving textual data. In this thesis two of these specifications should be investigated and prototypical implemented using the MINE system as data source. That would, for the first time, combine many different text resources under one standardised API.

Please contact Sven Bingert ( for further details.

MSc/BSc topics on Knowledge Graphs and NLP techniques

We have three MSc and BSc topics on different applications of Natural Language Processing (NLP) in different scientific fields to be filled soon. The topics are in Biodiversity, natural sciences and others. Based on the topic, you will work with Neo4J and RDF based graphs (query languages incl. Cypher and SPARQL) and implement NLP subtasks incl. Information Extraction, Reading Comprehension, Entity Typing and others. For each topic, there are predefined set of data sources which may include Wikidata, Wikipedia, scientific texts and others.

Applicants require Python programming skills beside good analytical skills and interest in working on text processing, NLP techniques and graph databases.

Please contact Alireza Zarei ( for further details.

SSO Keycloak integration and self-services for a community portal

The PID Consortium for eResearch (ePIC) provides a WebUI for mining Persistent Identifiers as a self-service ( In a previous work the self-service was upgraded to a modern web framework (vue.js). Although the foundations are done the authentication and authorisation method is still missing. The thesis should investigate the requirements, propose a solution for the self-service authentication, and eventually implement it as a prototype. Additionally, in the course of requirements analysis, other self-service features might be identified and added to the WebUI.

Please contact Sven Bingert ( for further details.

Metadata quality dashboard for the Deutsche Digitale Bibliothek

The dashboard should display different partly interactive data visualization elements (charts and tables) which display different quality dimensions (completeness, accuracy, validity etc.). The data sources are CSV and JSON files. There are two PHP and d3.js based dashboards [1, 2] which this new service could be built on, but it could be also built on another technological stack. The student will have a chance to present and discuss the results to and with an external party, the experts of the Deutsche Digitale Bibliothek and KIT Karlsruhe (the organisation that will host the service on the long run).


Please contact Péter Király ( for further details.