Category: Seminars

A summary of the CHiME-4 Speech Separation and Recognition Challenge

Speaker: Emmanuel Vincent Date: September 22, 2016 Outline: 1. From CHiME-1 to CHiME-3 2. Environment, simulation, and microphone mismatches in CHiME-3 3. CHiME-4 tracks and baselines 4. Discussion

City-identification of Flickr videos using semantic acoustic features

Speaker: Benjamin Elizalde (Carnegie Mellon University) Date: July 7, 2016 Abstract: City-identification of videos aims to determine the likelihood of a video belonging to a set of cities. In this paper, we present an approach using only audio, thus we do not use any additional modality such as images, user-tags or geo-tags. In this manner, we show …

Continue reading

Multimodal acquisition platform

Speaker: Valerian Girard (Engineer) Date: June 23, 2016 Abstract: In this talk, I will present my work during this year where I have contributed in developing a multimodal acquisition platform that records multimodal data in speech communication context. The platform can record motion capture data of the face, the arms and hands with and without markers using …

Continue reading

Modelling Context of OOV Words in Large Vocabulary Continuous Speech Recognition

Speaker: Imran Sheikh (PhD student) Date: June 16, 2016 Abstract: The diachronic nature of broadcast news content causes frequent variations in the linguistic content and vocabulary, leading to Out-Of-Vocabulary (OOV) words and specially OOV proper names. OOVs missed by the speech recognition system can be recovered by a dynamic vocabulary multi-pass recognition approach in which relevant proper …

Continue reading

Formant shifting for speech intelligibility improvement in car noise environment

Speaker: Karan Nathwani (post-doctoral fellow) Date: June 9, 2016 Abstract: In this work, we propose a novel approach aiming at improving the intelligibility of speech in the context of in-car applications. Speech produced in noisy environments is subject to the Lombard effect which gathers a number of voice transformation effects compared to the speech produced in calm …

Continue reading

A step towards multidimensional automatic improvisation

Speaker: Ken Déguernel (PhD student) Date: June 2, 2016 Abstract: Automatic music improvisation systems based on the OMax paradigm use training over a one-dimensional sequence to generate original improvisation. First, we propose a system creating improvisation in a closer way to a human improviser where the intuition of a context is enriched with knowledge. This system combines …

Continue reading

A combined evaluation of established and new approaches for speech recognition in varied reverberation conditions

Speaker: Sunit Sivasankaran (Engineer) Date: May 12, 2016 Abstract: Robustness to reverberation is a key concern for distant-microphone ASR. Various approaches have been proposed, including single-channel or multichannel dereverberation, robust feature extraction, alternative acoustic models, and acoustic model adaptation. We conduct a series of experiments to assess the impact of various dereverberation and acoustic model adaptation approaches on the ASR …

Continue reading

Optimal transport for domain adaptation

Speaker: Alain Rakotomamonjy (Université de Rouen) Date: May 11, 2016 Abstract: Domain adaptation addresses one of the most challenging tasks in machine learning : coping with mismatch between learning and testing probability distributions. If adaptation is done correctly, models learned on a specific data representation become more robust when confronted to data depicting the same problems, but described through another …

Continue reading

Compact Multiview Representation of Documents Based on the Total Variability Space

Speaker: Mohamed Bouallegue (post-doctoral fellow) Date: April 21, 2016 Abstract: In this talk, I present my research work during my thesis at Laboratoire Informatique d’Avignon and my postdoctoral research at Laboratoire d’Informatique de l’Université du Maine. This work explores the paradigm of Factor Analysis/i-vector for identification of topics in spoken documents. We identify themes from dialogues of …

Continue reading

Introduction to Sum Product Networks for noisy speech recognition

Speaker: Juan Andrés Morales Cordovilla (post-doctoral fellow) Date: March 3, 2016 Abstract: Sum Product Networks (SPN) are a new kind of probabilistic models that have the advantages of Deep learning of Neural Networks (DNNs) and of exact marginalization of Gaussian Mixture Models (GMMs). These two properties are very useful to do Missing Data or Uncertainty Decoding on the …

Continue reading