Nevertheless, while for years high-performance computing (HPC) systems were the predominant means of meeting the requirements expressed by large-scale scientific workflows, today some components have moved away from supercomputers to Cloud-type infrastructures [5]. This migration has been mainly motivated by the Cloud’s ability to perform data analysis tasks efficiently. From an I/O and storage perspective, the world of Cloud computing is very different from on-premise supercomputers: direct access to resources is extremely limited due to a very high level of abstraction. Instead, we have access to various storage systems, potentially geographically distributed, that use these resources. Another major difference is that, unlike HPC systems, cloud storage, network and computing resources have a certain elasticity and can be allocated [6]. Eventually, while the cost of using a supercomputer from the user’s point of view is essentially expressed in node-hours deducted from a grant, access to the Cloud follows a pay-as-you-go model that must be taken into account, as data movements in particular are costly.
Thus, dealing with this high degree of heterogeneity distributed between two worlds with very different philosophies is a real challenge for scientific workflows and applications. This PhD thesis aims to address this issue through the point of view of the resource provisioning. Through intelligent scheduling algorithms, we want to enable workflows to seamlessly use elastic storage systems [7] on hybrid infrastructures combining HPC systems and Cloud. Multiple criteria can be taken into account beyond the only performance aspect such as financial cost or energy. These algorithms will need to rely on a resource abstraction model that also need to be devised. Collaborations (e.g. with Argonne National Laboratory, USA) will be able to bring a dose of artificial intelligence to the imagined scheduling algorithms, for example with reinforcement learning. In general, there will be a strong emphasis on international collaborations during this PhD thesis.
The PhD position is mainly based in Rennes, at IRISA/Inria within the KerData research team. The selected candidate will have the opportunity to join a very dynamic group in a stimulating work environment with a lot of active national, European and international collaborations as part of cutting-edge international projects in the areas of Exascale Computing, Cloud Computing, Big Data and Artificial Intelligence. The candidate is also expected to be hosted for 3-6 month internships abroad to strengthen the international visibility of his/her work and benefit from the expertise of other researchers in the field.
Requirements of the candidate
– An excellent Master degree in computer science or equivalent
– Strong knowledge of distributed systems
– Knowledge on storage and (distributed) file systems
– Ability and motivation to conduct high-quality research, including publishing the results in relevant venues
– Strong programming skills (Python, C/C++)
– Working experience in the areas of Big Data management, Cloud computing, HPC, is an advantage
– Very good communication skills in oral and written English.
– Open-mindedness, strong integration skills and team spirit