Yutong Ban



INRIA Grenoble Rhone-Alpes
655, avenue de l’Europe
38330 Montbonnot Saint-Martin, France

Yutong BAN is a PhD student in PERCEPTION team at INRIA since 10/2015, directed by Dr.Radu Horaud. He received his Engineer degree  in computer vision from  Télécom Saint Etienne (France)  in 2015. He received his Bachelor degree in telecommunication engineering from Xidian University, China in 2013. He’s currently conducting its research on audio visual speaker tracking and diarization.

His research interests includes audio-visual tracking and diarization, visual servoing, variational inference and stereo-depth fusion.


 Publications (Google Scholar)

 Journal papers

  • Y. Ban, X. Alameda-PIneda, C. Evers, and R. Horaud. “Tracking Multiple Audio Sources with the Von Mises Distribution and Variational EM”. Submitted to IEEE Signal Processing Letters (December 2018). [page] [pdf]
  • Y. Ban, X. Alameda-PIneda, L. Girin, and R. Horaud. “Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers”. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (September 2018). [page] [pdf]
  • X. Li *, Y. Ban *, L. Girin, X. Alameda-Pineda, and R. Horaud. “Online Localization and Tracking of Multiple Speakers in Reverberant Environments”. Submitted to IEEE Journal on Selected Topics in Signal Processing (August 2018)  (* indicates the equally contributed authors). [pdf]


 Conference and workshop papers

  • X. Li, Y. Ban, L. Girin, X. Alameda-Pineda and R. Horaud. “A cascaded multiple-speaker localization and tracking system” International Workshop on Acoustic Signal Enhancement (IWAENC), LOCATA Satteline Workshop, Sep 2018, Tokyo, Japan [pdf]
  • Y. Ban, X. Li, X. Alameda-Pineda, L. Girin and R. Horaud “Accounting for Room Acoustics in Audio-Visual Multi-Speaker Tracking”  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2018, Calgary, Alberta, Canada. [pdf]
  • Y. Ban, L. Girin, X. Alameda-Pineda, and R. Horaud “Exploiting the Complementarity of Audio and Visual Data in Multi-Speaker Tracking”  ICCV Workshop on Computer Vision for Audio-Visual Media, Oct 2017, Venezia, Italy. [pdf]
  • Y.Ban, X. Alameda-Pineda, F. Badeig, S. Ba and R. Horaud “Tracking a Varying Number of People with a Visually-Controlled Robotic Head” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep 2017, Vancouver, Canada. (IROS’17: JTCF Novel Technology Paper Award Finalist) [page] [pdf]
  • Y.Ban, S. Ba, X. Alameda-Pineda and R. Horaud “Tracking Multiple Persons Based on a Variational Bayesian Model”.  ECCV  Workshops, Oct 2016, Amsterdam, Netherlands. [page] [pdf]