Seminar Bruno Raffin (Large Scale Data Assimilation)
– March 17, 2022
Title: Large Scale Data Assimilation
Authors: Sebastian Friedemann and Bruno Raffin
Abstract:
How to combine data, which may be available though different sensors,
with traditional numerical solvers designed to compute solutions of
PDEs modeling complexe phenomenon ? Bridging both is a timely
question with the multiplication of data sources (IoT). But
augmenting solvers working in high dimension spaces with external
data is far from trivial. Both are usually subject to uncertainties.
Data Assimilation (DA) is a well known approach to address this issue. DA is actually
routinely used in production for weather forecast for instance. In this talk, I will first motivate and
introduce the principles of Data Assimilation, including the
different families of techniques (statistical, variational) with a
specific focus on statistical ones (EnKF and particle filter).
From there, I will explain the worked performed with the
Melissa framework to support very large scale statistical data
assimilation with the EnKF method. We will dive into the details
of the software infrastructure that has been designed to go to very large scale,
supporting features like elasticity, fault tolerance and dynamics
load balancing, and show some experimental performance results.
Polaris-datamove seminar: Georges Da Costa (Multi-objective resources optimization Performance- and Energy-aware HPC and Clouds)
– March 24, 2022
Titre: "Multi-objective resources optimization Performance- and Energy-aware HPC and Clouds"
This talk will answer to the challenge "How to efficiently manage a datacenter ?" by providing theoretical and practical tools.
Datacenters are at the center of the life of an increasing part of the world population while being mostly unknown: From online services to weather forcast. Their role, while transparent, is of the utmost importance. Their electricity consumption is then a key challenge, and will be even more in a near future. Managing efficiently leads to optimization of quality of service, but also to improving energy efficiency.
This talk will show the required tools: The tools linked to measures and monitoring, from a technical point of view, but also by the definition of metrics. Then, it will explore methods to model such problems, along with direct or approximated resolution techniques. The next step will show several heuristics to quickly reach approximated solutions. Several types of validations, from experimental to improved simulated ones will serve as support for this talk.
Seminar Bertrand Simon (An Exact Algorithm for the Linear Tape Scheduling Problem)
– March 31, 2022
Title: An Exact Algorithm for the Linear Tape Scheduling Problem
Abstract: Magnetic tapes are often considered as an outdated storage technology, yet they are still used to store huge amounts of data. Their main interests are a large capacity and a low price per gigabyte, which come at the cost of a much larger file access time than on disks. With tapes, finding the right ordering of multiple file accesses is thus key to performance. Moving the reading head back and forth along a kilometer long tape has a non-negligible cost and unnecessary movements thus have to be avoided. However, the optimization of tape request ordering has then rarely been studied in the scheduling literature, much less than I/O scheduling on disks. For instance, minimizing the average service time for several read requests on a linear tape remains an open question. Therefore, in this paper, we aim at improving the quality of service experienced by users of tape storage systems, and not only the peak performance of such systems. To this end, we propose a reasonable polynomial-time exact algorithm while this problem and simpler variants have been conjectured NP-hard. We also refine the proposed model by considering U-turn penalty costs accounting for inherent mechanical accelerations. Then, we propose a low-cost variant of our optimal algorithm by restricting the solution space, yet still yielding an accurate suboptimal solution. Finally, we compare our algorithms to existing solutions from the literature on logs of the mass storage management system of a major datacenter. This allows us to assess the quality of previous solutions and the improvement achieved by our low-cost algorithm. Aiming for reproducibility, we make available the complete implementation of the algorithms used in our evaluation, alongside the dataset of tape requests that is, to the best of our knowledge, the first of its kind to be publicly released.
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.