Caelin Kaplan, Phd Student in NEO team supervised by Giovanni NEGLIA, has defended his thesis on Friday, November 22nd, in Morgenstern Amphitheatre, from 2:00 pm. Congratulations Caelin!
Thesis title: Inherent Trade-offs in Privacy-Preserving Machine Learning
Abstract:
Privacy-preserving ML techniques often result in reduced task-specific utility and may negatively impact other essential factors like fairness, robustness, and interpretability. These challenges have limited the widespread adoption of privacy-preserving methods. This thesis aims to address these challenges through two primary goals: (1) to deepen the understanding of key trade-offs in three privacy-preserving ML techniques—differential privacy, empirical privacy defenses, and federated learning; (2) to propose novel methods and algorithms that improve utility and effectiveness while maintaining privacy protections.
The first study in this thesis investigates how differential privacy impacts fairness across groups defined by sensitive attributes. Using standard ML fairness datasets, we show that group disparities in metrics like demographic parity, equalized odds, and predictive parity are often reduced or remain negligible when compared to non-private baselines, challenging the prevailing notion that differential privacy worsens fairness for underrepresented groups.
The second study focuses on empirical privacy defenses, which aim to protect training data privacy while minimizing utility loss. We propose a baseline defense method, Weighted Empirical Risk Minimization (WERM), which allows for a clearer understanding of the trade-offs between model utility, training data privacy, and reference data privacy. Our approach offers theoretical guarantees on model utility and the relative privacy of training and reference data
The third study addresses the convergence-related trade-offs in Collaborative Inference Systems (CISs), which are increasingly used in the Internet of Things (IoT) to enable smaller nodes in a network to offload part of their inference tasks to more powerful nodes. We propose a novel FL approach explicitly designed for CISs, which accounts for varying serving rates and uneven data availability. Our framework provides theoretical guarantees and consistently outperforms state-of-the-art algorithms, particularly in scenarios where end devices handle high inference request rates.
These contributions aim to support the responsible and ethical deployment of AI technologies that prioritize data privacy and protection.