Skip to content

Fast unsupervised online drift detection using incremental Kolmogorov-Smirnov test

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Publisher or commissioning bodyAssociation for Computing Machinery (ACM)
Pages1545-1554
Number of pages10
ISBN (Print)9781450342322
DOIs
DateAccepted/In press - 13 Aug 2016
DatePublished (current) - 13 Aug 2016
Event22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016 - San Francisco, United States
Duration: 13 Aug 201617 Aug 2016

Conference

Conference22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016
CountryUnited States
CitySan Francisco
Period13/08/1617/08/16

Abstract

Data stream research has grown rapidly over the last decade. Two major features distinguish data stream from batch learning: stream data are generated on the fly, possibly in a fast and variable rate; and the underlying data distribution can be non-stationary, leading to a phenomenon known as concept drift. Therefore, most of the research on data stream classification focuses on proposing effcient models that can adapt to concept drifts and maintain a stable performance over time. However, specifically for the classification task, the majority of such methods rely on the instantaneous availability of true labels for all already classified instances. This is a strong assumption that is rarely fulfilled in practical applications. Hence there is a clear need for effcient methods that can detect concept drifts in an unsupervised way. One possibility is the well-known Kolmogorov-Smirnov test, a statistical hypothesis test that checks whether two samples differ. This work has two main contributions. The first one is the Incremental Kolmogorov-Smirnov algorithm that allows performing the Kolmogorov-Smirnov hypothesis test instantly using two samples that change over time, where the change is an insertion and/or removal of an observation. Our algorithm employs a randomized tree and is able to perform the insertion and removal operations in O(log N) with high probability and calculate the Kolmogorov-Smirnov test in O(1), where N is the number of sample observations. This is a significant speed-up compared to the O(N log N) cost of the non-incremental implementation. The second contribution is the use of the Incremental Kolmogorov-Smirnov test to detect concept drifts without true labels. Classification algorithms adapted to use the test rely on a limited portion of those labels just to update the classification model after a concept drift is detected.

    Structured keywords

  • Jean Golding

    Research areas

  • Cartesian tree, Concept drift, Data stream, Kolmogorov-Smirnov, Lazy propagation, Treap

Event

22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016

Duration13 Aug 201617 Aug 2016
CitySan Francisco
CountryUnited States
SponsorsACM SIGKDD (External organisation), ACM SIGMOD (External organisation)

Event: Conference

Download statistics

No data available

Documents

Documents

  • Full-text PDF (accepted author manuscript)

    Rights statement: This is the accepted author manuscript (AAM). The final published version (version of record) is available online via ACM at https://doi.org/10.1145/2939672.2939836. Please refer to any applicable terms of use of the publisher.

    Accepted author manuscript, 537 KB, PDF document

    Licence: Unspecified

DOI

View research connections

Related faculties, schools or groups