Workshop on Privacy, Machine Learning and decentralized systems.

The workshop will held in Rennes from March 20th, 2pm to March 21st, 4pm. Room Minquiers (B025).

Preliminary program

Tuesday 20th, afternoon

  • Welcome lunch(Room Sein, B001) at 12:00
  • Snips: concrete use cases for decentralised machine learning
    Joseph Dureau.
    Snips proposes a private by design solution to power voice assistants. We run Wakeword detection, Automatic Speech Recognition, and Natural Language Understanding on the edge, on devices as small as a Raspberry Pi 3. Our platform is proposed both as a B2B solution, and as a free solution for non-commercial applications. In the current approach, algorithms are being trained on Snips servers without any user data. However, the community version has thousands of monthly active users, and grows rapidly, representing an opportunity for experimentation on decentralised machine learning algorithms. We will briefly present each of the algorithms underlying Snips Wakeword detection, Automatic Speech Recognition, and Natural Language Understanding, as concrete use cases for decentralised machine learning applications.
  • Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks

    We propose to focus on the problem of discovering neural network architectures efficient both in terms of prediction quality and cost. For instance, our approach is able to solve the following tasks: ‘learn a neural network able to predict well in less than 100 milliseconds’ or ‘learn an efficient model that fits in a 50 Mb memory’. Our contribution is a novel family of models called Budgeted Super Networks. They are learned using gradient descent techniques applied on a budgeted learning objective function which integrates a maximum authorized cost where this cost can be of different nature. We present a set of experiments on computer vision problems and analyze the ability of our technique to deal with three different costs: the computation cost, the memory consumption cost, and also a \textit{distributed computation} cost. We particularly show that our model can discover neural network architectures that have a better accuracy than the ResNet and CNF architectures on CIFAR-10 and CIFAR-100, at a lower cost.

  • Break
  • Privacy preserving, personalized and decentralized machine learning
    Aurélien Bellet, Rachid Guerraoui, Masha Taziki, Marc Tommasi.

    The rise of connected personal devices together with privacy concerns call for machine learning algorithms capable of leveraging the data of a large number of agents to learn personalized models under strong privacy requirements. In this paper, we introduce an efficient algorithm to address the above problem in a fully decentralized (peer-to-peer) and asynchronous fashion, with provable convergence rate. We show how to make the algorithm differentially private to protect against the disclosure of information about the personal datasets, and formally analyze the trade-off between utility and privacy. Our experiments show that our approach dramatically outperforms previous work in the non-private case, and that under privacy constraints, we can significantly improve over models learned in isolation.

  • Discussion on use cases and collaborations with Snips

Wednesday 21th

  • 9:30 Tutorial on the Blockchain
    Emmanuelle Anceaume
  • Break
  • The GOPA protocol
    Jan Ramon, Aurélien Bellet
    The amount of personal data collected in our everyday interactions with connected devices offers great opportunities for innovative services fueled by machine learning, as well as raises serious concerns for the privacy of individuals.
    We propose a massively distributed protocol for a large set of users to privately compute averages over their joint data, which can then be used to learn predictive models. Our protocol can find a solution of arbitrary accuracy, does not rely on a third party and preserves the privacy of users throughout the execution in both the honest-but-curious and malicious adversary models. Specifically, we prove that the information observed by the adversary (the set of maliciours users) does not significantly reduce the uncertainty in its prediction of private values compared to its prior belief. The level of privacy protection depends on a quantity related to the Laplacian matrix of the network graph and generally improves with the size of the graph.
    Furthermore, we design a verification procedure which offers protection against malicious users joining the service with the goal of manipulating the outcome of the algorithm.
    The presentation may also contain some more general theoretical considerations.
  • Robustness and Identity Management in Private Decentralized Computations
    Jan Ramon Cesar Sabater

    The GOPA algorithm provides strong protocols for doing a decentralized computation while maintaining privacy and enforcing a correct behavior of the users. However, some other types of malicious activity or unexpected events may arise and endanger the accuracy or execution of the computation. This presentation will enumerate and explain some research directions to improve GOPA in these aspects.

  • Discussion on privacy aspects
  • Break
  • Pleiades: Distributed Structural Invariants at Scale
    François Taiani

    In order to meet rising expectations in terms of scalability, robustness, and flexibility, large scale distributed systems increasingly espouse sophisticated distributed architectures that require enforcing complex distributed structural invariants. Unfortunately, maintaining these structural invariants at scale is particularly time consuming and error prone, as developers must take into account asynchronous failures, loosely coordinated sub-systems and network delays. To address this problem, we propose PLEIADES, a new plat- form to construct and enforce large-scale distributed structural invariants under aggressive conditions. PLEIADES combines the resilience of self-organizing overlays, with the expressiveness of an assembly-based design strategy. The result is a highly survivable framework that is able to dynamically maintain arbitrary complex distributed structures under aggressive crash failures. Our evaluation shows in particular that PLEIADES is able to restore the overall structure of a 25,600 node system in 11 asynchronous rounds after half of the nodes have crashed.

  • Speed-up K-NN graph construction
    Olivier Ruas

    K-Nearest-Neighbors (KNN) graphs play a key role in a large range of applications. A KNN graph typically connects entities characterized by a set of features so that each entity becomes linked to its k most similar counterparts according to some similarity function. As datasets grow, KNN graphs are unfortunately becoming increasingly costly to construct, and the general approach, which consists in reducing the number of comparisons between entities, seems to have reached its full potential. In this talk I will 0present you two recent of our contributions to speed up the comparisons between users. They consist in limiting the set of features of each user by (1) sampling and (2) fingerprinting. Our evaluation shows that they deliver substantial speed-ups while providing KNN graphs close to the exact ones.

  • Break
  • Random Binary Search Trees with Concurrent Insertions
    George Giakkoupis

    Consider the following simple random experiment to determine the impact of concurrency on the performance of binary search trees: A number n of randomly permuted keys arrive one at a time. When a new key arrives, it is first placed into a buffer of size c. Whenever the buffer is full, or when all keys have arrived, an adversary chooses one key from the buffer and inserts it into the binary search tree. The ability of the adversary to choose the next key to insert among c buffered keys, models a
    distributed system, where up to c processes try to insert keys concurrently. In this talk I will present recent results on the expected height and average node depth of the resulting tree.

  • Discussion on collaboration graph constructions
  • Break
  • General discussions
    • on the Mediego use cases
    • future collaborations between teams
    • Recruitment strategies

Workshop supported by ANR (ANR PAMELA project) and CPER Data (Hauts de France, Project MyLocalInfo)