Research

Scientific program

Context

Machine Learning / Federated Learning

Collected, analyzed and exploited, users’ data offer unprecedented opportunities for innovation, but raise real concerns about data privacy. The emergence of AI enabled-applications accentuates data privacy issues. Federated learning is a promising on-device machine learning scheme and new research topic on privacy-preserving machine learning. Federated learning becomes a paradigm shift in privacy-preserving AI and offers an attractive framework for training large-scale distributed learning models on sensitive data. However, federated learning still faces many challenges to fully preserve data privacy. To achieve this goal, we will extend different federated learning approaches to consider their limitations in terms of accuracy, confidentiality, robustness, explainability and fairness.

Synthetic Data

Like federated learning which shares models instead of personal data, the generation of synthetic data from generative models has been recently proposed for the publication of data while preserving privacy. However, this technique has limitations in terms of insurance against membership inference attacks and utility preservation, especially for the less represented data.

Privacy and personal data protection

There is a tension between the risk and the benefit of using personal data. From a technological point of view, there exists two main avenues to reduce these privacy risks: First the development of Privacy Enhancing Technologies (PET) such as anonymisation techniques; and second, the proactive identification and analysis of privacy issues in existing systems. Leveraging the opportunities offered by machine learning, this project will develop new privacy protection techniques as well as tools for the automatized detection of privacy issues in system processing personal data.

Objectives

O1. Federated Learning

This objective aims at developing a secure machine learning framework and tools that preserve the confidentiality of personal data. To tackle the scientific challenges, we will quantify the privacy risk of different machine learning algorithms through the development of new inference attacks and propose associated countermeasures. We will also extend different federated learning approaches. Our vision is to jointly address the various limitations in terms of precision, confidentiality, and robustness associated with these approaches. In addition, we will propose mechanisms to better understand the distributed AI learning process and ensure unbiased fairness that may occur from users data, and in particular with respect to generative (rather than discriminative) models.

O2. Synthetic data generation

This objective aims at investigating synthetic data generation schemes to improve the privacy of data publication. To achieve that, we will firstly quantify the privacy leakages of existing synthetic data generation approaches before proposing new schemes limiting the risks. Different use cases will be considered but a focus on health data will be targeted.

O3. Leveraging AI to detect privacy issues

This objective aims at using AI in order to build system capable of detecting and solving privacy issues. Within systems processing personal data (smartphones, connected devices, Web, etc.), privacy issues are generally detected via a manual examination. This approach is costly and does not scale with the large number of systems to be considered. ML can help tackle this issue by offering a way to automatically identify problematic behaviors in system using personal data.

Summary of activities

DNS-based fingerprinting of IoT devices

  • Evaluation of DNS traffic encryption as a privacy preserving tool : analysis of the impact of DNS encryption on the performance of ML-based fingerprinting methods.
  • Visit of Inria PhD student at UCL (July 2023)
  • Visit of Inria research team at UCL (December 2022)

 

 

Comments are closed.