*IEEE Journal of Selected Topics in Signal Processing*, 13 (1), pp. 88 – 103, 2019.

*IEEE/ACM Transactions on Audio, Speech and Language Processing*,, 2017, 25 (10), pp.1997 – 2012.

Xiaofei Li, Laurent Girin, Radu Horaud, Sharon Gannot. *Estimation of the Direct-Path Relative Transfer Function for Supervised Sound-Source Localization.* IEEE/ACM Transactions on Audio, Speech, and Language Processing, volume 24, number 11, 2016.

[pdf] [arXiv] [HAL] [IEEEXplore] [bibtex] [matlab code]

We address the problem of localization of single and multiple speech sources in reverberant and noisy rooms. The interchannel response (two microphones) corresponding to the direct-path propagation of an audio source is a function of the source direction. In practice, this response is contaminated by noise and reverberation. The direct-path relative transfer function (DP-RTF) is defined as the ratio between the direct-path acoustic transfer function of the two channels. We proposed several methods to estimate the DP-RTF from the noisy and reverberant microphone signals in the short-time Fourier transform domain. First, the convolutive transfer function approximation is adopted to accurately represent the impulse response of the sensors in the STFT domain. Second, the DP-RTF is estimated by using the auto- and cross-power spectral densities at each frequency and over multiple frames. In the presence of stationary noise, an inter-frame spectral subtraction algorithm is proposed, which enables to achieve the estimation of noise-free auto- and cross-power spectral densities. Third, a consistency test is proposed to check whether a set of consecutive frames is associated to the same source or not. Finally, a complex-valued Gaussian mixture model (CGMM) is adopted to assign the DP-RTF observations to the speaker locations, whose components correspond to all the possible candidate source locations. After optimizing the CGMM-based objective function, both the number of sources and their locations are estimated by selecting the CGMM components with the largest weights. In addition, an entropy-based penalty term is added to the likelihood to impose sparsity over the set of CGMM component weights. This favors a small number of detected speakers with respect to the large number of initial candidate source locations.

**Video: Sound-source localization with the direct-path relative transfer function**

**An example for online multiple-speaker localization: top **The CGMM weights along time.** bottom **The black circles represent the detected speakers by selecting the peaks of CGMM weights. The gray curves represent the ground-truth trajectories of active speakers.

Xiaofei Li, Laurent Girin, Fabien Badeig, Radu Horaud. *Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function*. International Conference on Intelligent Robots and Systems (IROS) 2016. [pdf] [Slides] [bibtex][matlab code]

Xiaofei Li, Laurent Girin, Radu Horaud, Sharon Gannot. *Estimation of Relative Transfer Function in the Presence of Stationary Noise Based on Segmental Power Spectral Density Matrix Subtraction.* IEEE ICASSP 2015. [pdf] [poster] [dataset] [bibtex][Matlab code]

Xiaofei Li, Radu Horaud, Laurent Girin, Sharon Gannot. *Local Relative Transfer Function for Sound Source Localization*. EUSIPCO 2015. [pdf] [Slides] [dataset] [bibtex]