DATA

FALCON is a multi-label, graph-based dataset containing COVID-19-related tweets. This dataset includes expert annotations for six fallacy types—loaded language, appeal to fear, appeal to ridicule, hasty generalization, ad hominem, and false dilemma—and allows for the detection of multiple fallacies in a single tweet. The dataset’s graph structure enables analysis of the relationships between fallacies and their progression in conversations.


CyberAgressionAdo is an open-access French dataset created to support research on online hate detection in multiparty conversations. It includes 36 conversations gathered through role-playing simulations conducted in schools. The dataset is annotated using a multi-label, fine-grained tagset across six distinct layers, capturing information such as participant roles, the presence and type of hate speech, and various forms of verbal abuse.

In addition, the annotations follow a detailed hierarchical structure designed to reflect the communicative intentions behind each message, as well as the contextual factors that influence how messages are produced and interpreted. To support perspectivist approaches, the dataset also includes individual annotations from multiple annotators, enabling the exploration of divergent interpretations and how they can be integrated into the development of machine learning models.


ISHate is the first benchmark focused on implicit and subtle hate speech detection. It contains carefully annotated examples where hate is expressed through irony, sarcasm, metaphors, and other linguistic devices. The dataset aims to improve the detection of non-explicit hate speech, which is often harder to identify.


ElecDeb60to20 is the most comprehensive dataset of political debates annotated with argument components (claims and premises), argument relations (support and attack), and fallacies. It includes all US presidential debates from 1960 to 2020.