Category: Seminars

Adversarial Neural Networks for Language Identification

Speaker: Raphaël Duroselle Date: July 12, 2018 at 10:30 – C103 Abstract: Language identification systems are very common in speech processing and are used to classify the spoken language given a recorded audio sample. They are often used as a front-end for subsequent processing tasks such as automatic speech recognition or speaker identification. Standard methodologies such …

Continue reading

Semi-supervised learning with deep neural networks for relative transfer function inverse regression

Speaker: Emmanuel Vincent Date: June 07, 2018 Abstract: Prior knowledge of the relative transfer function (RTF) is useful in many applications but remains little studied. In this work, we propose a semi-supervised learning algorithm based on deep neural networks (DNNs) for RTF inverse regression, that is to generate the full-band RTF vector directly from the source-receiver …

Continue reading

Leveraging Word Contexts in Wikipedia for OOV Proper Nouns Recovery in Speech Recognition

Speaker: Badr Abdullah Date: May 31, 2018 Abstract: Automatic Speech Recognition (ASR) systems are usually trained on static data and a finite vocabulary. When a spoken utterance contains Out-Of-Vocabulary (OOV) words, ASR systems misrecognize these words as in-vocabulary words with similar acoustic properties, but with entirely different meaning. The majority of OOV words are information-rich proper …

Continue reading

Speech/non-speech segmentation for speech recognition

Speaker: Odile Mella and Dominique Fohr Date: May 24, 2018 Abstract: Multiple-input neural network-based residual echo suppression

Multiple-input neural network-based residual echo suppression

Speaker: Guillaume Carbajal Date: April 12 2018 Abstract: A residual echo suppressor (RES) aims to suppress the residual echo in the output of an acoustic echo canceler (AEC). Spectral-based RES approaches typically estimate the magnitude spectra of the near-end speech and the residual echo from a single input, that is either the far-end speech or the …

Continue reading

Multichannel speech separation with RNN from high-order ambisonics recordings

Speaker: Lauréline Pérotin Date: March 29, 2018 Abstract: We present a source separation system for high-order ambisonics (HOA) contents. We derive a multichannel spatial filter from a mask estimated by a long short-term memory (LSTM) recurrent neural network. We combine one channel of the mixture with the outputs of basic HOA beamformers as inputs to the …

Continue reading

VisArtico: multimodal visualization software – Present & future

Speakers: Slim Ouni and Sara Dahmani Date and place: March 19, 2018 – C005 Abstract: VisArtico is a multimodal visualization software (acoustic, articulatory, visual, gestural) that has been developed within the team. This software has undergone several changes over several years. In this seminar, we present the software: user interface, functionalities, capabilities, etc. As this software …

Continue reading

Feedback on text analysis and emotion recognition in voice using deep learning

Speaker: Nicolas Turpault Date: February 15, 2018 Abstract: – During my internship in a startup in London I developed a system to try to recognise emotion in voice. In this work we used some speech processing (MFCC) and then applied a RNN (LSTM) to predict the emotion in voice. We used SEMAINE and Avec databases to …

Continue reading

Biomechanical models of speech articulators to understand speech motor control

Speaker: Pascal Perrier (Gipsa-lab Grenoble) Date: January 18, 2018 Abstract: We have been working for the last 20 years on the development of 2D and the 3D biomechanical models of speech articulators in the aim to better understand (1) how speech movements are constrained, (2) which degrees of freedom speakers have to deal with the goals …

Continue reading

Arabic speech synthesis

Speaker: Amal Houidhek Date: November 30, 2017 Abstract: The first part of the presentation investigates statistical parametric speech synthesis (SPSS) of Modern Standard Arabic (MSA): Hidden Markov Models (HMM)-based speech synthesis system relies on a description of speech segments corresponding to phonemes, with a large set of features that represent phonetic, phonologic, linguistic and contextual aspects. …

Continue reading