Abstract
Identifying disturbing online content being targeted at children is an important content moderation problem. However, previous approaches to this problem have focused on features of the content itself, and neglected potentially helpful insights from the reactions expressed by its online audience. To help remedy this, we present the Elsagate Corpus, a collection of over 22 million comments on more than 18,000 videos that have been associated with disturbing content. We describe the how we collected this corpus and present some insights from our initial explorations, including the suprisingly positive reactions from audiences to this content, challenges in identifying averse comments, and some unusual non-linguistic commenting behaviour of uncertain purpose.
Original language | English |
---|---|
Title of host publication | The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS’2024) Proceedings |
Editors | Ruslan Mitkov, Saad Ezzini, Cengiz Acarturk, Tharindu Ranasinghe, Paul Rayson, Mo El-Haj, Ignatius Ezeani, Matthew Bradbury, Nouran Khallaf |
Pages | 147-152 |
Number of pages | 6 |
Publication status | Published - 30 Jul 2024 |
Event | First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security - Lancaster University, Lancaster, United Kingdom Duration: 29 Jul 2024 → 30 Jul 2024 https://nlpaics.com/ |
Conference
Conference | First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security |
---|---|
Abbreviated title | NPLAICS'2024 |
Country/Territory | United Kingdom |
City | Lancaster |
Period | 29/07/24 → 30/07/24 |
Internet address |
Research Groups and Themes
- Cyber Security