Anastasiia TSUKANOVA

Author's posts

SING: Symbol-to-Instrument Neural Generator

Speaker: Alexandre Défossez Date: January 10, 2019 at 13:00 – B011 Abstract: Recent progress in deep learning for audio synthesis opens the way to models that directly produce the waveform, shifting away from the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation. Despite their successes, current state-of-the-art neural audio synthesizers …

Continue reading

Deep learning-based speaker localization and speech separation from Ambisonics recordings

Speaker: Laureline Pérotin Date: November 22, 2018 at 10:30 – C005 Abstract: Personal assistants are flourishing, but it is still hard to achieve voice control in adverse conditions, whenever noise, other speakers, reverberation or furniture reflections are present. Preprocessings such as speaker localization and speech enhancement have shown to help automatic speech recognition. I will present …

Continue reading

Analysis and development of speech enhancement features in cochlear implants

Speaker: Nicolas Furnon Date: October 18, 2018 at 10:30 – C005 Abstract: Cochlear implants (CIs) are complex systems developed to restore the hearing sense to people with profound auditory loss. These solutions are efficient in quiet environments but adverse situations remain very challenging for CI users. A range of algorithms are implemented in the processors by …

Continue reading

Alpha-stable process for signal processing

Speaker: Mathieu Fontaine Date: September 27, 2018 at 10:30 – C005 Abstract: The scientific topic of sound source separation (SSS) aims at decomposing audio signals into their constitutive components, e.g. separate the main signer voice from the background music or from the background noise. In the case of very old and degraded historical recordings, SSS strongly …

Continue reading

Adversarial Neural Networks for Language Identification

Speaker: Raphaël Duroselle Date: July 12, 2018 at 10:30 – C103 Abstract: Language identification systems are very common in speech processing and are used to classify the spoken language given a recorded audio sample. They are often used as a front-end for subsequent processing tasks such as automatic speech recognition or speaker identification. Standard methodologies such …

Continue reading

Semi-supervised learning with deep neural networks for relative transfer function inverse regression

Speaker: Emmanuel Vincent Date: June 07, 2018 Abstract: Prior knowledge of the relative transfer function (RTF) is useful in many applications but remains little studied. In this work, we propose a semi-supervised learning algorithm based on deep neural networks (DNNs) for RTF inverse regression, that is to generate the full-band RTF vector directly from the source-receiver …

Continue reading

Leveraging Word Contexts in Wikipedia for OOV Proper Nouns Recovery in Speech Recognition

Speaker: Badr Abdullah Date: May 31, 2018 Abstract: Automatic Speech Recognition (ASR) systems are usually trained on static data and a finite vocabulary. When a spoken utterance contains Out-Of-Vocabulary (OOV) words, ASR systems misrecognize these words as in-vocabulary words with similar acoustic properties, but with entirely different meaning. The majority of OOV words are information-rich proper …

Continue reading

Speech/non-speech segmentation for speech recognition

Speaker: Odile Mella and Dominique Fohr Date: May 24, 2018 Abstract: Multiple-input neural network-based residual echo suppression

Multiple-input neural network-based residual echo suppression

Speaker: Guillaume Carbajal Date: April 12 2018 Abstract: A residual echo suppressor (RES) aims to suppress the residual echo in the output of an acoustic echo canceler (AEC). Spectral-based RES approaches typically estimate the magnitude spectra of the near-end speech and the residual echo from a single input, that is either the far-end speech or the …

Continue reading

Multichannel speech separation with RNN from high-order ambisonics recordings

Speaker: Lauréline Pérotin Date: March 29, 2018 Abstract: We present a source separation system for high-order ambisonics (HOA) contents. We derive a multichannel spatial filter from a mask estimated by a long short-term memory (LSTM) recurrent neural network. We combine one channel of the mixture with the outputs of basic HOA beamformers as inputs to the …

Continue reading