By Guillaume Delorme*, Yihong Xu*, Luis G. Camara, Elisa Ricci, Radu Horaud, Xavier Alameda Pineda
[arXiv] [paper] [code]
Existing works on multiple object tracking (MOT) are developed under the traditional supervised learning setting, where the training and test data are drawn from the same distribution. This hinders the development of MOT in real-world applications since collecting and annotating a tracking dataset for each deployment scenario is often very time-consuming or even unrealistic. Motivated by this limitation, we investigate MOT in unsupervised settings and introduce DAUMOT, a general MOT training framework designed to adapt an existing pre-trained MOT method to a target dataset without annotations. DAUMOT alternates between tracking and adaptation. During tracking, a model pre-trained on source data is used to track on the target dataset and to generate pseudo-labels. During adaptation, both the source labels and the target pseudo-labels are used to update the model. In addition, we propose a novel adversarial sequence alignment and identity-detection disentanglement method to bridge the source-target domain gap. Extensive ablation studies demonstrate the effectiveness of each component of the framework and its increased performance as compared to the baselines. In particular, we show the benefits of DAUMOT on two state-of-the-art MOT methods in two unsupervised transfer settings: MOT17  →MOT20  and vice-versa.
- Empirical evidence is reported on the severity of the domain shift problem in MOT by measuring the tracking performance on a target set of a model trained on a source set.
- We introduce DAUMOT, a novel unsupervised DA MOT training strategy to alleviate domain shift, together with two baselines (TA and TADA). Different from DA for object detection, two adversarial losses are designed for MOT. First, adversarial sequence alignment to tackle the inter-sequence domain shift, employing multi-class discriminators at the detection and image levels. Second, identity-detection disentanglement is proposed to limit the coupling between the detection and re-ID branches.
- Experimental results demonstrate the effectiveness of DAUMOT with two different trackers on two unsupervised DA settings and using standard MOT datasets, namely MOT17 → MOT20 and MOT20 → MOT17.
Pipeline: Alternating between tracking and adaptation. First, tracking uses a generic MOT method on the target sequences to obtain bounding-box and identity pseudo-labels. Second, the adaptation step updates the tracker using labels and pseudo-labels respectively for the source and target datasets, using standard ID and detection losses. In addition, we propose two adversarial strategies, namely adversarial sequence alignment and identity-detection disentanglement, to train the generic MOT method to be invariant to inter-sequence domain shifts and disentangle the identity and detection branches. The adversarial sequence alignment is implemented at the image and detection levels with two multi-class discriminators. The adversarial disentanglement is implemented with an ID multi-class discriminator. All discriminators operate on features sampled from the respective feature maps.
 Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I., Roth, S., Schindler, K. & Leal-Taixé, L. MOT20: A benchmark for multi object tracking in crowded scenes. arXiv:2003.09003[cs], 2020., (arXiv: 2003.09003).
 Milan, A., Leal-Taixé, L., Reid, I., Roth, S. & Schindler, K. MOT16: A Benchmark for Multi-Object Tracking. arXiv:1603.00831 [cs], 2016., (arXiv: 1603.00831).