Published on July 20, 2025

Online discourse is increasingly trapped in a vicious cycle where polarizing language fuels toxicity and vice versa. Identity, one of the most divisive issues in modern politics, often increases polarization. Yet, prior NLP research has mostly treated toxicity and polarization as separate problems. In Indonesia, the world’s third-largest democracy, this dynamic threatens democratic discourse, particularly in online spaces. We argue that polarization and toxicity must be studied in relation to each other. To this end, we present a novel multi-label Indonesian dataset annotated for toxicity, polarization, and annotator demographic information. Benchmarking with BERT-base models and large language models (LLMs) reveals that polarization cues improve toxicity classification and vice versa. Including demographic context further enhances polarization classification performance.

Publication: Association for Computational Linguistics

Co-authors: Derry Wijaya, Monash University; Lucky Susanto; Monash University; Zilu Tang, Boston University; Fariz Akyask, Monash University; Ika Idris, Monash University; Alham Aji, MBZUAI