FindMyPast Yearly N-grams and Entities Dataset

  • The FindMyPast Newspaper Team (Creator)
  • Thomas Lansdall-Welfare (Creator)
  • Nello Cristianini (Creator)
  • Saatviga Sudhahar (Creator)
  • James Thompson (Contributor)
  • Justin Lewis (Contributor)

Dataset

Description

This dataset is the FindMyPast Yearly N-grams and Entities dataset. It contains the secondary data for the paper "Content Analysis of 150 Years of British Periodicals". It contains the yearly time series for the 1,000,000 most frequent 1-, 2-, and 3-grams from the corpus described in the paper, the yearly time series for the 100,000 most frequent named entities linked to Wikipedia and the list of articles and newspapers used from FindMyPast in the study.

When using this data, please cite:

Lansdall-Welfare, T. et al. (2016). Content Analysis of 150 Years of British Periodicals. In: Proceedings of the National Academy of Sciences of the United States of America.
Date made available16 Dec 2016
PublisherUniversity of Bristol

Keywords

  • Data Science
  • Computer Science
  • Digital Humanities
  • Social Science

Cite this