Skip to main navigation Skip to search Skip to main content

Textual anomaly detection: A systematic evaluation of representations and algorithms

Panagiotis Soustas*, Matthew Edwards

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

Abstract

This study evaluates scalable, unsupervised methods for detecting malicious online content by benchmarking transformer-based architectures across diverse linguistic contexts. We demonstrate that the semantic resolution of the embedding layer is the critical determinant of performance. Our results identify that large language models coupled with manifold learning achieve superior anomaly separation in knowledge-intensive domains, significantly outperforming traditional BERT-based pipelines. Additionally, we reveal a topological dichotomy in detection strategies: Contrastive autoencoders offer robust stability in structured environments, whereas few-shot deviation learning (FATE) is essential for high-entropy, dynamic contexts such as politics and sports. These findings propose a shift toward context-aware architectures capable of adapting to the complex semantic landscape of modern web content.
Original languageEnglish
Title of host publicationPacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)
PublisherSpringer Nature
Number of pages13
Publication statusAccepted/In press - 16 Mar 2026
EventResearch and Applications of Foundation Models for Data Mining and Affective Computing - Hong Kong, China
Duration: 9 Jun 202612 Jun 2026
https://rafda-pakdd.github.io/RAFDA2026/

Workshop

WorkshopResearch and Applications of Foundation Models for Data Mining and Affective Computing
Abbreviated titleRAFDA
Country/TerritoryChina
CityHong Kong
Period9/06/2612/06/26
Internet address

Research Groups and Themes

  • Cyber Security

Fingerprint

Dive into the research topics of 'Textual anomaly detection: A systematic evaluation of representations and algorithms'. Together they form a unique fingerprint.

Cite this