The Elsagate corpus: Characterising commentary on alarming video content

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

76 Downloads (Pure)

Abstract

Identifying disturbing online content being targeted at children is an important content moderation problem. However, previous approaches to this problem have focused on features of the content itself, and neglected potentially helpful insights from the reactions expressed by its online audience. To help remedy this, we present the Elsagate Corpus, a collection of over 22 million comments on more than 18,000 videos that have been associated with disturbing content. We describe the how we collected this corpus and present some insights from our initial explorations, including the suprisingly positive reactions from audiences to this content, challenges in identifying averse comments, and some unusual non-linguistic commenting behaviour of uncertain purpose.
Original languageEnglish
Title of host publicationThe First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS’2024) Proceedings
EditorsRuslan Mitkov, Saad Ezzini, Cengiz Acarturk, Tharindu Ranasinghe, Paul Rayson, Mo El-Haj, Ignatius Ezeani, Matthew Bradbury, Nouran Khallaf
Pages147-152
Number of pages6
Publication statusPublished - 30 Jul 2024
EventFirst International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security - Lancaster University, Lancaster, United Kingdom
Duration: 29 Jul 202430 Jul 2024
https://nlpaics.com/

Conference

ConferenceFirst International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Abbreviated titleNPLAICS'2024
Country/TerritoryUnited Kingdom
CityLancaster
Period29/07/2430/07/24
Internet address

Research Groups and Themes

  • Cyber Security

Fingerprint

Dive into the research topics of 'The Elsagate corpus: Characterising commentary on alarming video content'. Together they form a unique fingerprint.

Cite this