Expanding the training data for neural network based hate speech classification

Speaker:  Ashwin Geet D’Sa

Data and place: April 28, 2022, at 10:30 – Hybrid

Abstract: The phenomenal increase in internet usage, catering to the dissemination of knowledge and expression, has also led to an increase in online hate speech. Online hate speech is anti-social communicative behavior, which leads to the threat and violence toward an individual or a group. Deep learning-based models have become the state-of-the-art solution in classifying hate speech. However, the performance of these models depends on the amount of labeled training data. In this thesis, we explore various solutions to expand the training data to train a reliable model for hate speech classification.

As the first approach, we use semi-supervised learning to combine the huge amount of unlabeled data, easily available on the internet, with a limited amount of labeled data to train the classifier. For this, we use the label-propagation algorithm. The performance of this method depends on the representation space of labeled and unlabeled data. We show that pre-trained sentence embeddings are label agnostic and yield poor results. We propose a simple and effective neural network-based approach for transforming these pre-trained representations into task-aware ones. This method achieves significant performance improvements in low-resource scenarios.
In our second approach, we explore data augmentation, a solution to obtain synthetic samples using the original training data. Our data augmentation technique is based on a single conditional GPT-2 language model fine-tuned on the original training data. Our approach uses a fine-tuned BERT model to select high-quality synthetic data. We study the effect of the quantity of augmented data and show that using a few thousand synthetic samples yields significant performance improvements in hate speech classification. Our qualitative evaluation shows the effectiveness of using BERT for filtering the generated samples.
For our final approach, we use multi-task learning as a method to combine several available hate speech datasets and jointly train a single classification model. Our approach leverages the advantages of a pre-trained language model (BERT) as shared layers of our multi-task architecture. We treat one hate speech corpus as one task. Thus, adopting the paradigm of multi-task learning to multi-corpus learning. We show that training a multi-task model with several corpora achieves similar performance as training several corpus-specific models. Nevertheless, fine-tuning the multi-task model for a specific corpus allows improving the results. We demonstrate the effectiveness of our multi-task learning approach for domain adaptation on hate speech corpora.
We explore the three proposed approaches in low-resource scenarios and show that they achieve significant performance improvements in very low-resource setups.