Category: Seminars Towards a control-theory approach for minimizing unused grid resources, by Agustín Yabo (Master 2, Datamove + Ctrl-A)

Towards a control-theory approach for minimizing unused grid resources, by Agustín Yabo (Master 2, Datamove + Ctrl-A)


June 14, 2018

HPC systems are facing more and more variability in their behavior, related to e.g., performance and power consumption, and the fact that they are less predictable requires more runtime management. This can be done in an Autonomic Management feedback loop, in response to monitored information in the systems, by analysis of this data and utilization of the results in order to activate appropriate system-level or application-level feedback mechanisms (e.g., informing schedulers, down-clocking CPUs).
Such problem is found in the context of CiGri, a simple, lightweight, scalable and fault tolerant grid system which exploits the unused resources of a set of computing clusters. Computing power left over by the execution of a main HPC application scheduling is used to execute smaller jobs, which are injected as much as the global system allows.
The seminar will presents results addressing the problem of automated resource management in an HPC infrastructure, using techniques from Control Theory to design a controller that maximizes cluster utilization while considering cluster and fileserver overloading. We draw from a previous work where a proportional-integral feedback (Proportional Integral, PI) control loop was implemented, through a maximum number of jobs to be sent to the cluster, in response to system information about the current number of jobs processed. Then, we proposed a dynamical model for the system queue, cluster and fileserver load with time-varying parameters, an EKF (Extended Kalman Filter) for performing online parameter estimation and an MPC (Model-Predictive Control) approach for regulating the desired values of the infrastracture to the desired ones.

Bat. IMAG, 206

View full calendar

Comments are closed.