Abstract
This study evaluates scalable, unsupervised methods for detecting malicious online content by benchmarking transformer-based architectures across diverse linguistic contexts. We demonstrate that the semantic resolution of the embedding layer is the critical determinant of performance. Our results identify that large language models coupled with manifold learning achieve superior anomaly separation in knowledge-intensive domains, significantly outperforming traditional BERT-based pipelines. Additionally, we reveal a topological dichotomy in detection strategies: Contrastive autoencoders offer robust stability in structured environments, whereas few-shot deviation learning (FATE) is essential for high-entropy, dynamic contexts such as politics and sports. These findings propose a shift toward context-aware architectures capable of adapting to the complex semantic landscape of modern web content.
| Original language | English |
|---|---|
| Title of host publication | Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) |
| Publisher | Springer Nature |
| Number of pages | 13 |
| Publication status | Accepted/In press - 16 Mar 2026 |
| Event | Research and Applications of Foundation Models for Data Mining and Affective Computing - Hong Kong, China Duration: 9 Jun 2026 → 12 Jun 2026 https://rafda-pakdd.github.io/RAFDA2026/ |
Workshop
| Workshop | Research and Applications of Foundation Models for Data Mining and Affective Computing |
|---|---|
| Abbreviated title | RAFDA |
| Country/Territory | China |
| City | Hong Kong |
| Period | 9/06/26 → 12/06/26 |
| Internet address |
Research Groups and Themes
- Cyber Security
Fingerprint
Dive into the research topics of 'Textual anomaly detection: A systematic evaluation of representations and algorithms'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver