Simon Leglaive – RobotLearn

Since February 2018 to August 2019 I have been a postdoctoral researcher in the PERCEPTION team at Inria Grenoble Rhône-Alpes. I am mainly working on audio source separation and speech enhancement.

You can visit my web page at this address: https://sleglaive.github.io/

Publications

Publications HAL de Leglaive

2025

Journal articles

titre: Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge
auteur: Simon Leglaive, Matthieu Fraticelli, Hend ElGhazaly, Léonie Borne, Mostafa Sadeghi, Scott Wisdom, Manuel Pariente, John R. Hershey, Daniel Pressnitzer, Jon P. Barker
article: Computer Speech and Language, 2025, 89, ⟨10.1016/j.csl.2024.101685⟩
Accès au texte intégral et bibtex

Conference papers

titre: AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
auteur: Samir Sadok, Simon Leglaive, Laurent Girin, Gaël Richard, Xavier Alameda-Pineda
article: ICASSP 2025 – IEEE International Conference on Acoustics, Speech, and Signal Processing, Apr 2025, Hyderabad, India. pp.1-5
Accès au texte intégral et bibtex

titre: MEGA: Masked Generative Autoencoder for Human Mesh Recovery
auteur: Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Francesc Moreno-Noguer
article: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, Nashville (Tennessee), United States
Accès au bibtex

2024

Journal articles

titre: A Multimodal Dynamical Variational Autoencoder for Audiovisual Speech Representation Learning
auteur: Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier
article: Neural Networks, 2024, 172, pp.106120. ⟨10.1016/j.neunet.2024.106120⟩
Accès au bibtex

Conference papers

titre: VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space
auteur: Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Antonio Agudo, Francesc Moreno-Noguer
article: ECCV 2024 – 18th European Conference on Computer Vision, Sep 2024, Milan, Italy. pp.471-490, ⟨10.1007/978-3-031-72943-0_27⟩
Accès au bibtex

titre: Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion
auteur: Karim M Ibrahim, Antony Perzo, Simon Leglaive
article: International Conference on Acoustics, Speech, and Signal Processing, 2024, Seoul, South Korea. ⟨10.1109/icassp48485.2024.10445740⟩
Accès au texte intégral et bibtex

2023

Journal articles

titre: Learning and controlling the source-filter representation of speech with a variational autoencoder
auteur: Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier
article: Speech Communication, 2023, 148, pp.53-65. ⟨10.1016/j.specom.2023.02.005⟩
Accès au texte intégral et bibtex

Conference papers

titre: Motion-DVAE: Unsupervised learning for fast human motion denoising
auteur: Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Renaud Séguier
article: ACM SIGGRAPH Conference on Motion, Interaction and Games (ACM MIG), Nov 2023, Rennes, France. ⟨10.1145/3623264.3624454⟩
Accès au bibtex

titre: SwimXYZ: A large-scale dataset of synthetic swimming motions and videos
auteur: Guénolé Fiche, Vincent Sevestre, Camila Gonzalez-Barral, Simon Leglaive, Renaud Séguier
article: ACM SIGGRAPH Conference on Motion, Interaction and Games (ACM MIG), Nov 2023, Rennes, France. ⟨10.1145/3623264.3624440⟩
Accès au bibtex

titre: Étude sur l’inversion de StyleGAN dans un contexte de détection d’hypertrucages
auteur: Matthieu Delmas, Amine Kacete, Stephane Paquelet, Simon Leglaive, Renaud Seguier
article: XXIXe Colloque GRETSI, GRETSI – Groupe de Recherche en Traitement du Signal et des Images, Aug 2023, Grenoble, France
Accès au texte intégral et bibtex

titre: The CHiME-7 UDASE task: Unsupervised domain adaptation for conversational speech enhancement
auteur: Simon Leglaive, Léonie Borne, Efthymios Tzinis, Mostafa Sadeghi, Matthieu Fraticelli, Scott Wisdom, Manuel Pariente, Daniel Pressnitzer, John R. Hershey
article: 7th International Workshop on Speech Processing in Everyday Environments (CHiME), Aug 2023, Dublin, Ireland. ⟨10.21437/CHiME.2023-2⟩
Accès au texte intégral et bibtex

titre: Unsupervised speech enhancement with deep dynamical generative speech and noise models
auteur: Xiaoyu Lin, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda
article: Interspeech 2023 – 24th Annual Conference of the International Speech Communication Association, ISCA, Aug 2023, Dublin, Ireland. pp.1-5
Accès au bibtex

titre: Speech Modeling with a Hierarchical Transformer Dynamical VAE
auteur: Xiaoyu Lin, Xiaoyu Bie, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda
article: ICASSP 2023 – IEEE International Conference on Acoustics, Speech and Signal Processing, Jun 2023, Rhodes, Greece. pp.1-5, ⟨10.1109/ICASSP49357.2023.10096751⟩
Accès au bibtex

titre: A vector quantized masked autoencoder for speech emotion recognition
auteur: Samir Sadok, Simon Leglaive, Renaud Séguier
article: IEEE ICASSP 2023 Workshop on Self-Supervision in Audio, Speech and Beyond (SASB), Jun 2023, Rhodes, Greece
Accès au texte intégral et bibtex

Master thesis

titre: Renault Flins : usine symbole des transformations d’une industrie, 1980-2010
auteur: Vincent Leglaive
article: Sciences de l’Homme et Société. 2023
Accès au texte intégral et bibtex

Preprints, Working Papers, …

titre: The CHiME-7 UDASE task: Unsupervised domain adaptation for conversational speech enhancement
auteur: Simon Leglaive, Léonie Borne, Efthymios Tzinis, Mostafa Sadeghi, Matthieu Fraticelli, Scott Wisdom, Manuel Pariente, Daniel Pressnitzer, John Hershey
article: 2023
Accès au texte intégral et bibtex

2022

Journal articles

titre: Unsupervised Speech Enhancement using Dynamical Variational Autoencoders
auteur: Xiaoyu Bie, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin
article: IEEE/ACM Transactions on Audio, Speech and Language Processing, 2022, 30, pp.2993-3007. ⟨10.1109/TASLP.2022.3207349⟩
Accès au texte intégral et bibtex

Conference papers

titre: Expectation-Maximization Based Defense Mechanism for Distributed Model Predictive Control
auteur: Rafael Accácio Nogueira, Romain Bourdais, Simon Leglaive, Hervé Guéguen
article: 9th IFAC Conference on Networked Systems (NecSys22), Jul 2022, Zürich, Switzerland
Accès au texte intégral et bibtex

titre: Dynamical variational autoencoders and their application to speech spectrogram modeling
auteur: Laurent Girin, Xiaoyu Bie, Simon Leglaive, Thomas Hueber, Xavier Alameda-Pineda
article: JEP 2022 – 34e Journées d’Études sur la Parole, Université de Nantes, Jun 2022, Noirmoutier, France. pp.655-663, ⟨10.21437/JEP.2022-69⟩
Accès au texte intégral et bibtex

titre: Learning and controlling the source-filter representation of speech with a variational autoencoder
auteur: Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Seguier
article: CFA 2022 – 16ème Congrès Français d’Acoustique, Société Française d’Acoustique (SFA), Apr 2022, Marseille, France
Accès au bibtex

Poster communications

titre: Expectation-Maximization Based Defense Mechanism for Distributed Model Predictive Control
auteur: Rafael Accacio Nogueira, Romain Bourdais, Simon Leglaive, Hervé Guéguen
article: 9th IFAC Conference on Networked Systems (NecSys22), Jul 2022, Zürich, Switzerland
Accès au texte intégral et bibtex

2021

Conference papers

titre: On Speech Sparsity for Computational Efficiency and Noise Reduction in Hearing Aids
auteur: Adrien Llave, Simon Leglaive
article: 13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Dec 2021, Tokyo, Japan
Accès au texte intégral et bibtex

titre: A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling
auteur: Xiaoyu Bie, Laurent Girin, Simon Leglaive, Thomas Hueber, Xavier Alameda-Pineda
article: Interspeech 2021 – 22nd Annual Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic. pp.46-50, ⟨10.21437/Interspeech.2021-256⟩
Accès au texte intégral et bibtex

Books

titre: Dynamical Variational Autoencoders: A Comprehensive Review
auteur: Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, Xavier Alameda-Pineda
article: , 15 (1-2), pp.197, 2021, Foundations and Trends® in Machine Learning, 978-1-68083-913-5. ⟨10.1561/2200000089⟩
Accès au texte intégral et bibtex

2020

Journal articles

titre: Audio-Visual Speech Enhancement Using Conditional Variational Auto-Encoders
auteur: Mostafa Sadeghi, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud
article: IEEE/ACM Transactions on Audio, Speech and Language Processing, 2020, 28, pp.1788-1800. ⟨10.1109/TASLP.2020.3000593⟩
Accès au texte intégral et bibtex

Conference papers

titre: Localization cues preservation in hearing aids by combining noise reduction and dynamic range compression
auteur: Adrien Llave, Simon Leglaive, Renaud Seguier
article: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec 2020, Auckland, New Zealand
Accès au texte intégral et bibtex

titre: A Recurrent Variational Autoencoder for Speech Enhancement
auteur: Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud
article: ICASSP 2020 – IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, May 2020, Barcelone (virtual), Spain. pp.371-375, ⟨10.1109/ICASSP40776.2020.9053164⟩
Accès au texte intégral et bibtex

2019

Journal articles

titre: Audio-noise Power Spectral Density Estimation Using Long Short-term Memory
auteur: Xiaofei Li, Simon Leglaive, Laurent Girin, Radu Horaud
article: IEEE Signal Processing Letters, 2019, 26 (6), pp.918-922. ⟨10.1109/LSP.2019.2911879⟩
Accès au texte intégral et bibtex

Conference papers

titre: Notes on the use of variational autoencoders for speech and audio spectrogram modeling
auteur: Laurent Girin, Fanny Roche, Thomas Hueber, Simon Leglaive
article: DAFx 2019 – 22nd International Conference on Digital Audio Effects, Sep 2019, Birmingham, United Kingdom. pp.1-8
Accès au texte intégral et bibtex

titre: Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization
auteur: Simon Leglaive, Laurent Girin, Radu Horaud
article: ICASSP 2019 – IEEE International Conference on Acoustics, Speech and Signal Processing, May 2019, Brighton, United Kingdom. pp.101-105, ⟨10.1109/ICASSP.2019.8683704⟩
Accès au texte intégral et bibtex

titre: Speech enhancement with variational autoencoders and alpha-stable distributions
auteur: Simon Leglaive, Umut Şimşekli, Antoine Liutkus, Laurent Girin, Radu Horaud
article: ICASSP 2019 – IEEE International Conference on Acoustics, Speech and Signal Processing, May 2019, Brighton, United Kingdom. pp.541-545, ⟨10.1109/ICASSP.2019.8682546⟩
Accès au texte intégral et bibtex

2018

Journal articles

titre: Student’s t Source and Mixing Models for Multichannel Audio Source Separation
auteur: Simon Leglaive, Roland Badeau, Gael Richard
article: IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018, 26 (6), pp.1150-1164
Accès au texte intégral et bibtex

Conference papers

titre: A variance modeling framework based on variational autoencoders for speech enhancement
auteur: Simon Leglaive, Laurent Girin, Radu Horaud
article: MLSP 2018 – IEEE 28th International Workshop on Machine Learning for Signal Processing, Sep 2018, Aalborg, Denmark. pp.1-6, ⟨10.1109/MLSP.2018.8516711⟩
Accès au texte intégral et bibtex

titre: Alpha-stable low-rank plus residual decomposition for speech enhancement
auteur: Umut Şimşekli, Halil Erdogan, Simon Leglaive, Antoine Liutkus, Roland Badeau, Gael Richard
article: ICASSP: International Conference on Acoustics, Speech, and Signal Processing, Apr 2018, Calgary, Canada. pp.651-655, ⟨10.1109/ICASSP.2018.8461539⟩
Accès au texte intégral et bibtex

2017

Conference papers

titre: Separating Time-Frequency Sources from Time-Domain Convolutive Mixtures Using Non-negative Matrix Factorization
auteur: Simon Leglaive, Roland Badeau, Gael Richard
article: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct 2017, New Paltz, New York, United States
Accès au texte intégral et bibtex

titre: Séparation de sources audio en milieu réverbérant : Factorisation en matrices non-négatives et représentation temporelle du mélange convolutif
auteur: Simon Leglaive, Roland Badeau, Gael Richard
article: Colloque GRETSI, Sep 2017, Juan-Les-Pins, France
Accès au texte intégral et bibtex

titre: Semi-Blind Student’s t Source Separation for Multichannel Audio Convolutive Mixtures
auteur: Simon Leglaive, Roland Badeau, Gael Richard
article: 25th European Signal Processing Conference (EUSIPCO), Aug 2017, Kos, Greece. pp.2323-2327
Accès au texte intégral et bibtex

titre: Multichannel audio source separation: variational inference of time-frequency sources from time-domain observations
auteur: Simon Leglaive, Roland Badeau, Gael Richard
article: 42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Mar 2017, La Nouvelle Orléans, LA, United States
Accès au texte intégral et bibtex

titre: Alpha-Stable Multichannel Audio Source Separation
auteur: Simon Leglaive, Umut Şimşekli, Antoine Liutkus, Roland Badeau, Gael Richard
article: 42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Mar 2017, New Orleans, United States
Accès au texte intégral et bibtex

Theses

titre: Modeling reverberant mixtures for multichannel audio source separation
auteur: Simon Leglaive
article: Traitement du signal et de l’image [eess.SP]. Télécom ParisTech, 2017. Français. ⟨NNT : ⟩
Accès au texte intégral et bibtex

titre: Modeling reverberant mixtures for multichannel audio source separation
auteur: Simon Leglaive
article: Traitement du signal et de l’image [eess.SP]. Télécom ParisTech, 2017. Français. ⟨NNT : 2017ENST0068⟩
Accès au texte intégral et bibtex

2016

Journal articles

titre: Multichannel Audio Source Separation with Probabilistic Reverberation Priors
auteur: Simon Leglaive, Roland Badeau, Gael Richard
article: IEEE/ACM Transactions on Audio, Speech and Language Processing, 2016, 24 (12), pp.2453-2465
Accès au texte intégral et bibtex

Conference papers

titre: Autoregressive Moving Average Modeling of Late Reverberation in the Frequency Domain
auteur: Simon Leglaive, Roland Badeau, Gael Richard
article: European Signal Processing Conference (EUSIPCO), Aug 2016, Budapest, Hungary. pp.1478-1482
Accès au texte intégral et bibtex

2015

Conference papers

titre: MULTICHANNEL AUDIO SOURCE SEPARATION WITH PROBABILISTIC REVERBERATION MODELING
auteur: Simon Leglaive, Roland Badeau, Gael Richard
article: Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct 2015, New Paltz, NY, United States. pp.5
Accès au texte intégral et bibtex

titre: A priori probabiliste anéchoïque pour la séparation sous-déterminée de sources sonores en milieu réverbérant
auteur: Simon Leglaive, Roland Badeau, Gael Richard
article: Colloque GRETSI, Sep 2015, Lyon, France
Accès au texte intégral et bibtex

titre: Singing voice detection with deep recurrent neural networks
auteur: Simon Leglaive, Romain Hennequin, Roland Badeau
article: 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2015, Brisbane, Australia. pp.121-125
Accès au texte intégral et bibtex