- Title: Optimizing Performance and Energy Consumption for Machine Learning Inference
- When: November 3, 2025 — 10:00
- Where: I3S, room 007, Les Algorithmes
- Committee:
- Patricia STOLF (referee), Professor, Université Paul Sabatier, IRIT, France
- Romain ROUVOY (referee), Professor, Université de Lille, France
- Anne-Cécile ORGERIE, Directrice de recherche, CNRS, IRISA, France
- Guillaume URVOY-KELLER, Professor, Université Côte d’Azur, I3S, France
- Frederic GIROIRE (supervisor), Directeur de Recherche, CNRS, I3S, France
- Ramon APARICIO-PARDO (co-supervisor), Maître des Conférences, Université Côte d’Azur, I3S, France
-
Abstract: The rapid advancement of AI usage has led to a significant increase in computing demands. The emergence of new deployment paradigms, such as MLaaS (Machine Learning as a Service), and the growing popularity of notable Large Language Models (LLMs), such as GPT, Gemini and Deepseek, has intensified this demand. Consequently, the energy consumption associated with AI models has increased considerably, raising concerns about their environmental footprint.
In this thesis, we address the AI increasing resource consumption demand and propose optimization strategies for improving energy efficiency during AI inference. To address these issues, we tackle two problems. In the first one, we consider a system with several inference tasks with hard deadlines that must be scheduled onto a set of machines. Furthermore, model compression utilities applied during inference time allow that each task can be executed with a certain compression, presenting a trade-off between its compression level (and, its processing time) and its obtained utility. The objective is to maximize the tasks utilities. We propose, thus, an approximation algorithm with proven guarantees to solve the problem and extend the model to incorporate energy budget constraints.
We then investigated the increasing environmental impacts associated with AI usage. Recently, new paradigms in green AI emerged, shifting from “bigger is better”, which prioritizes large models, to “small is sufficient” emphasizing energy sobriety through smaller and more efficient models. We investigate how the AI community can adopt energy sobriety today by focusing on model selection during inference. We estimated the energy savings by selecting the model for executing inference requests. Finally, to broadly analyze the environmental implications of AI, we conduct a Life Cycle Assessment (LCA) of Pl@ntNet, an AI-based application. This includes a multi-criteria, multi-stage evaluation to capture a comprehensive picture of its ecological footprint.