The research agenda of the PERCEPTION group is the investigation and implementation of computational models for mapping images and sounds onto meaning and actions. PERCEPTION team members address these challenging topics with an interdisciplinary approach that spans the following disciplines: computer vision, auditory signal processing and scene analysis, machine learning, and robotics. In particular, we develop methods for the representation and recognition of visual and auditory objects and events, audio-visual fusion, recognition of human actions, gestures and speech, spatial hearing, and human-robot interaction.
- Computer vision: spatio-temporal representation of 2D and 3D visual information, action and gesture recognition, analysis of human faces, 3D sensors, binocular vision, multiple-camera systems, person and object tracking in video sequences
- Auditory scene analysis: binocular hearing, multiple sound source localization, tracking and separation, speech communication, sound-event classification, speaker diarization, acoustic signal enhancement.
- Machine learning: probabilistic mixture models, linear and non-linear dimension reduction, manifold learning, graphical models, Bayesian inference, neural networks and deep learning.
- Robotics: robot vision, robot hearing, human robot interaction, data fusion, software architectures.
- The audiovisual head POPEYE
- Three NAO robots using NAOLab and equipped with stereoscopic camera heads and with a spherical microphone array
- One Pepper robot
- The MIXCAM (multiple mixed cameras) laboratory
- Digital Media and Communications R&D Center, Samsung Electronics (2016-2017)
- Samsung Advanced Institute of Technology, Seoul, Korea (2010-2012)
- 4D View Solutions, Grenoble, France
- Softank Robotics Europe (formerly Aldebaran), Paris, France (2010-2016)
- Xerox Research Center India (XRCI), Bangalore (2014-2017)