Themen

Cookies und Tracking helfen uns, Ihnen auf unserer Website ein besseres Erlebnis zu ermöglichen.

Theses Topics

You can find open theses topics here.

Porting Hadoop and Spark to OSv

The goal of this thesis is to port components of Hadoop and Spark to the unikernel OSv.

OSv is a novel operating system made for the cloud. The advantages of OSv include very small image size of a few MBs, very fast boot times of under 1s, and a streamlined architecture. Hadoop is the most popular big data platform with a vibrant ecosystem. Spark is an extension to Hadoop, which is particularly well-suited for real time processing and stream processing of large data sets. The combination of OSv and Hadoop allows to operate big data clusters dynamically in the cloud. First however, Hadoop’s and Spark's components such as HDFS, YARN, and Spark executors need to be ported to OSv. The challenges of porting components to OSv lie in OSv's design trade-offs. There is no shell. There is only one address space. You can only run one process at a time. 

Applicants require solid Linux administration skills, knowledge of operating systems, and analytical skills. While porting applications, problems will arise, which will need to be resolved together with OSv’s inventors. After porting components, performance analyses need to be carried out to compare performance with a standard Ubuntu installation.

This thesis will be performed in the context of the Horizon 2020 EU project Mikelangelo (http://www.mikelangelo-project.eu/).

Please contact peter.chronz@gwdg.de for details.