Speakers: Nicolas Turpault
Date: May 05, 2019 at 10:30 – C005
Deep neural networks are particularly useful to learn relevant representations from data.
Recent studies have demonstrated the potential of unsupervised representation learning for ambient sound analysis using various flavors of the triplet loss.
They have compared this approach to supervised learning.
However, in real situations, it is common to have a small labeled dataset and a large unlabeled one.
In this work, we combine unsupervised and supervised triplet loss based learning into a semi-supervised representation learning approach.
We propose two flavors of this approach, whereby the positive samples for those triplets whose anchors are unlabeled are obtained either by applying a transformation to the anchor, or by selecting the nearest sample in the training set.
We compare our approach to supervised and unsupervised representation learning as well as the ratio between the amount of labeled and unlabeled data.
We evaluate all the above approaches on an audio tagging task using the DCASE 2018 Task 4 dataset, and we show the impact of this ratio on the tagging performance.