Multi-domain evaluation framework for named entity recognition tools

Zahraa S. Abdallah*, Mark Carman, Gholamreza Haffari

*Corresponding author for this work

Research output: Contribution to journalArticle (Academic Journal)peer-review

13 Citations (Scopus)

Abstract

Extracting structured information from unstructured text is important for the qualitative data analysis. Leveraging NLP techniques for qualitative data analysis will effectively accelerate the annotation process, allow for large-scale analysis and provide more insights into the text to improve the performance. The first step for gaining insights from the text is Named Entity Recognition (NER). A significant challenge that directly impacts the performance of the NER process is the domain diversity in qualitative data. The represented text varies according to its domain in many aspects including taxonomies, length, formality and format. In this paper we discuss and analyse the performance of state-of-the-art tools across domains to elaborate their robustness and reliability. In order to do that, we developed a standard, expandable and flexible framework to analyse and test tools performance using corpora representing text across various domains. We performed extensive analysis and comparison of tools across various domains and from various perspectives. The resulting comparison and analysis are of significant importance for providing a holistic illustration of the state-of-the-art tools.

Original languageEnglish
Pages (from-to)34-55
Number of pages22
JournalComputer Speech and Language
Volume43
Early online date12 Nov 2016
DOIs
Publication statusPublished - 1 May 2017

Keywords

  • Benchmark evaluation
  • Multi-domain evaluation
  • Named entity recognition
  • Qualitative data analysis

Fingerprint

Dive into the research topics of 'Multi-domain evaluation framework for named entity recognition tools'. Together they form a unique fingerprint.

Cite this