Events Calendar | College of Engineering

Starts: 10:00 am on Wednesday, September 17, 2025
Ends: 12:00 pm on Wednesday, September 17, 2025

Title: Data-driven approaches for improving the identification of misleading content online

Presenter: Pujan Paudel

Advisor: Professor Gianluca Stringhini

Chair: Professor Yigong Hu

Committee: Professor Gianluca Stringhini, Professor Manuel Egele, Professor Mark Crovella, Professor Engin Kirda

Google Scholar Link: https://scholar.google.com/citations?user=8K4IiBwAAAAJ&hl=en&oi=ao

Abstract: Misleading content online appears in many forms, spanning false claims that spread rapidly on social networks to craftily designed e-commerce websites defrauding users of money and trust. This thesis aims to build data-driven systems that can improve the automated identification of misleading content online while supporting human-in-the-loop content moderation systems and downstream security systems. I achieve this goal by developing and evaluating four complementary systems that together strengthen platform soft-moderation practices and enable proactive discovery of scam websites on the broader Web.

First, I introduce a claim-comprehensive soft-moderation pipeline that uses learning-to-rank and information retrieval techniques to identify posts discussing misleading claims, increasing coverage and consistency of warning labels. Second, I propose an unsupervised, context-aware stance detection framework to distinguish the propagation of a falsehood from its critique or correction, reducing contextual false positives of warning labels. Third, I extend soft moderation beyond text with an efficient reverse image retrieval system that finds visually similar instances of misleading images at scale, enabling multi-modal moderation. Finally, I present a data-driven query-mining and scoring system that allows systematic issuing of search engine queries with a higher likelihood of returning scam websites, accelerating existing security pipelines to discover scam websites earlier, and improving the resource efficiency of downstream detection systems.

Across large-scale, heterogeneous datasets capturing real-world events on social media and diverse search engine results on the web, these systems (i)expand the coverage, context, and accuracy of soft moderation of misleading content online and (ii)improve the timeliness and yield of discovering online scam websites promoting misleading content. Collectively, these contributions develop a practical toolbox of systems for human-in-the-loop moderation and security systems, demonstrating that targeted, claim-centric, context-aware, and multi-modal pipelines can help make information ecosystems more trustworthy and safe.

Location:: PHO 339

Back to Calendar