Magnet au vert 2019 September 4th and 5th

The internal seminar of Magnet team will be held in Hotel Louvre Lens.

Departure: 4th 9:00 from Bus park at gare Lille Europe (on the left when going out of the railway station – after Crowne Plazza); Return: 5th 18:30 Inria parking (leaving Lens at 17:00).


  • 4th 9:45-10:15 Welcome
  • 4th 10:15-11:15 Fast Clustering Methods for Networked Data, Fabio
  • 4th 11:15-12:00 Personalized Predictive Models and Random Graphs in Decentralized Communication Networks, Arijus
  • 4th 12:00-14:00 Lunch
  • 4th 14:00-15:00 Fully Decentralized Joint Learning of Personalized Models and Collaboration Graphs, Aurélien
  • 4th 15:00-15:45  Joint Learning of the Data and the Representation for graph-based Semi-supervised Learning, Mariana
  • 4th 16:00-17:45 Visit of the Museum
  • 4th 19:00-21:00 Dinner
  • 4th 21:00-22:00 A Very Gentle Introduction to Graph Neural Networks, Mikaela & Mariana & Rémi
  • 4th 22:00- Open discussion leaded by students on whatever they want to be discussed
  • 5th 8:30-9:30 Privacy-preserving Aggregation over Solutions of Conjunctive Queries, Jan
  • 5th 9:30-10:15 Privacy-preserving Adversarial Representation Learning in Automatic Speech Recognition: Reality or Illusion, Brij
  • 5th 10:15-10:45 coffee break
  • 5th 10:45-11:15  Multilingual Grapheme to Phoneme Synthesis, Pradipta
  • 5th 11:15-11:45 Computing Word Embeddings with Sparse Methods, William
  • 5th 11:45-12:30 Finding Good Peers Online in Decentralized Machine Learning Network, Mahsa
  • 5th 12:30-14:00 lunch
  • 5th 14:00-14:45 A Language for specifying Information Processes and their Privacy Properties, Moitree & Jan
  • 5th 14:45-15:30 Semi-supervised Imitation Learning for Coreference Resolution, Onkar
  • 5th 15:30-16:00 coffee break
  • 5th 16:00-16:45 Secure Protocols for Privacy Preserving Machine Learning, Cesar
  • 17:00 Departure

PAD-ML Kick-Off: Workshop on Privacy-Aware Distributed Machine Learning

We are organizing a Workshop on Privacy-Aware Distributed Machine Learning on October 25, 2018 at INRIA Lille. This workshop serves as kick-off to PAD-ML (Privacy-Aware Distributed Machine Learning), an Inria North-European Associate Team between the Magnet project-team (Inria) and the Privacy-preserving data analysis group (Alan Turing Institute) which officially started in September 2018.

Pauline Wauquier thesis defense

Pauline defended her PhD thesis on Task driven representation learning on May 29th 2017.


Machine learning proposes numerous algorithms to solve the different tasks that can be extracted from real world prediction problems. To solve the different concerned tasks, most Machine learning algorithms somehow rely on relationships between instances. Pairwise instances relationships can be obtained by computing a distance between the vectorial representations of the instances. Considering the available vectorial representation of the data, none of the commonly used distances is ensured to be representative of the task that aims at being solved. In this work, we investigate the gain of tuning the vectorial representation of the data to the distance to more optimally solve the task. We more particularly focus on an existing graph-based algorithm for classification task. An algorithm to learn a mapping of the data in a representation space which allows an optimal graph-based classification is first introduced. By projecting the data in a representation space in which the predefined distance is representative of the task, we aim at outperforming the initial vectorial representation of the data when solving the task. A theoretical analysis of the introduced algorithm is performed to define the conditions ensuring an optimal classification. A set of empirical experiments allows us to evaluate the gain of the introduced approach and to temper the theoretical analysis.

Workshop on Decentralized Machine Learning, Optimization and Privacy

We are organizing a Workshop on Decentralized Machine Learning, Optimization and Privacy on September 11-12, 2017 at INRIA Lille.


New ANR Project in Magnet: Grasp (GRAph-based machine learning for linguistic Structure Prediction).


New ANR Project in Magnet: Pamela (Personalized and decentrAlized MachinE Learning under constrAints).

Argumentation Mining Workshop

About computational approaches of argumentation.

RSS Workshop

Workshop on Graph based learning of the RSS North-European Team : UCL-Aalto-Lille

June 16-17, 2016

Projet de Master : Propagation d’étiquettes structurées pour le traitement automatique des langues

Titre : Propagation d’étiquettes structurées pour le traitement automatique des langues
Equipe : MAGNET
Responsable HDR : Marc Tommasi
Encadrant : Pascal Denis

Problématique :

Le traitement automatique des langues (TAL) offre deux défis centraux aux algorithmes d’apprentissage automatique: d’une part, la très grande dimensionalité de leur espace des sorties (leur nombre est le plus souvent exponentiel en la taille de l’input), et d’autre part, le faible volume de données annotées disponibles (qui vient du coût important associé à la collecte d’annotations linguistiques). Ces deux problèmes ont mené au développement d’algorithmes d’apprentissage qui intègrent ces sorties complexes, ainsi qu’à des approches qui exploitent, en plus des données annotées, des données non étiquetées (plus largement disponibles). Parmi ces dernières approches, les approches par graphe (basées notamment sur la propagation d’étiquettes et la régularisation par variété) se sont montrées extrêmement prometteuses. Malheureusement, ces approches par graphe n’ont jusqu’à présent pas été généralisées au cas des sorties structurées. La question est en effet de déterminer comment on peut propager des étiquettes structurées (p.ex., des séquences d’étiquettes ou des arbres en dépendances) à l’intérieur d’un graphe.

Travail réaliser :

Il s’agira tout d’abord pour l’étudiant de se familiariser, par le biais de lectures et l’écriture d’une synthèse, avec la littérature sur la prédiction de structure, l’apprentissage semi-supervisé par graphe, ainsi qu’un ou plusieurs problèmes de TAL (tels que l’analyse en partie de discours, le parsing syntaxique, ou l’analyse discursive) et les approches état-de-l’art pour ces problèmes. Dans un deuxième temps, l’étudiant reproduira quelques algorithmes état-de-l’art et les éprouvera sur des benchmarks. Enfin, l’étudiant tentera de combiner approches de prédiction de structures et de propagation par graphe.

Bibliographie :

Zhu, X., & Ghahramani, Z. (2002). Learning from labeled and unlabeled data with label propagation (Technical Report CMU-CALD-02-107). Carnegie Mellon University

Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: a geometric framework for learning from Labeled and Unlabeled Examples. Journal of Machine Learning Research, 7, 2399–2434.

A. Subramanya, S. Petrov, and F. C. N. Pereira. “Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models”. In: EMNLP. 2010, pp. 167–176.

D. Das and S. Petrov. “Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections”. In: ACL. 2011, pp. 600–609.

A Markov random field approach to online graph label prediction

when: July 2, 2014

where: salle plénière

speaker: Mark Herbster

title: A Markov random field approach to online graph label prediction

abstract: In graph label prediction each vertex is labeled with a 0 or 1.  The aim is given a partial labeling of the graph to extend to a complete labeling.  A common approach to this prediction problem is to relax the labels to [0,1] and then minimize a quadratic*energy*  function with respect to a partial labeling and predict with minimizer.  This approach is at the core of the very successful techniques in semi-supervised learning of label propagation [ZLG03] and manifold regularization [BN04].  We instead use the unrelaxed labels with an energy function to define a probability distribution over labelings (a Markov Random Field) and then predict by marginalization.  The relative drawback of this approach is the computational complexity.  We mitigate the problem with an efficient deterministic approximation.  For our approximation we prove worst-case online mistake bounds and also show that sequentially our approximate prediction cannot differ from the true marginal prediction very often for “easy’’ problems.