Abstract
Fan fiction has provided opportunities for genre enthusiasts to produce their own story lines from existing print fiction. It has also introduced concerns including intellectual property issues for traditional print publishers. An interesting and difficult problem is determining whether a given segment of text is fan fiction or print fiction. Classifying unstructured text remains a critical step for many intelligent systems. In this paper we detail how a significant volume of print and fan fiction was obtained. The data is processed using a proposed pipeline and then analysed using various supervised machine learning classifiers. Given 5 to 10 sentences, our results show an accuracy of 80-90% can be achieved using traditional approaches. To our knowledge this is the first study that explores this type of fiction classification problem.
Original language | English |
---|---|
Title of host publication | Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods |
Editors | Maria De Marsico, Gabriella Sanniti di Baja, Ana Fred |
Publisher | SciTePress |
Pages | 511-517 |
Number of pages | 8 |
Volume | 1 |
ISBN (Electronic) | 9789897585494 |
DOIs | |
Publication status | Published - 5 Feb 2022 |
Event | 11th International Conference on Pattern Recognition Applications and Methods - Duration: 3 May 2022 → 5 May 2022 https://icpram.scitevents.org/?y=2022 |
Publication series
Name | ICPRAM |
---|---|
Publisher | SciTePress |
ISSN (Electronic) | 2184-4313 |
Conference
Conference | 11th International Conference on Pattern Recognition Applications and Methods |
---|---|
Abbreviated title | ICPRAM |
Period | 3/05/22 → 5/05/22 |
Internet address |
Keywords
- Natural Language Processing
- Text Classification