Transfer Learning for Abusive Language Detection

Speaker: Tulika Bose

Data and time: Jan 19, 2023, at 10:30

Abstract: The proliferation of social media, despite its multitude of benefits, has led to the increased spread of abusive language. Deep learning models for detecting abusive language have displayed great levels of in-corpus performance but under-perform substantially outside the training distribution. Moreover, they require a considerable amount of expensive labeled data for training. This thesis studies the problem of transfer learning for abusive language detection and explores various solutions to improve knowledge transfer in cross-corpus scenarios. First, we investigate if combining topic model representations with contextual representations can improve cross-corpus generalizability. The association of unseen target comments with abusive language topics in the training corpus is shown to provide complementary information for a better cross-corpus transfer. Secondly, we explore some popular Unsupervised Domain Adaptation (UDA) approaches from sentiment classification for cross-corpus abusive language detection. Our analysis reveals their limitations and emphasizes the need for effective adaptation methods suited to this task. As our third contribution, we propose two DA approaches with a dynamic refinement mechanism using feature attributions, which are post-hoc model explanations. Particularly, we study the problem of spurious corpus-specific correlations that restrict the generalizability of classifiers for detecting hate speech, a sub-category of abusive language. Finally, we propose a novel training strategy for transferring knowledge from a resource-rich source corpus to a low-resource target corpus in hate speech. We incorporate neighborhood information with Optimal Transport that permits exploiting the embedding space geometry. By aligning the joint embedding and label distributions of neighbors, we obtain substantial improvements in low-resource hate speech corpora.