Detecting events in a million New York Times articles

TM Snowsill, I Flaounas, Bie Tijl De, Nello Cristianini

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

4 Citations (Scopus)

Abstract

We present a demonstration of a newly developed text stream event detection method on over a million articles from the New York Times corpus. The event detection is designed to operate in a predominantly on-line fashion, reporting new events within a specified timeframe. The event detection is achieved by detecting significant changes in the statistical properties of the text where those properties are efficiently stored and updated in a suffix tree. This particular demonstration shows how our method is effective at discovering both short- and long-term events (which are often denoted topics), and how it automatically copes with topic drift on a corpus of 1035263 articles.
Translated title of the contributionDetecting events in a million New York Times articles
Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases (ECML/PKDD)
PublisherSpringer
Publication statusPublished - Oct 2010

Bibliographical note

Other page information: 615-618
Conference Proceedings/Title of Journal: Machine Learning and Knowledge Discovery in Databases (ECML/PKDD)
Other identifier: 2001246

Fingerprint

Dive into the research topics of 'Detecting events in a million New York Times articles'. Together they form a unique fingerprint.

Cite this