Projects per year

### Abstract

Data stream research has grown rapidly over the last decade. Two major features distinguish data stream from batch learning: stream data are generated on the fly, possibly in a fast and variable rate; and the underlying data distribution can be non-stationary, leading to a phenomenon known as concept drift. Therefore, most of the research on data stream classification focuses on proposing effcient models that can adapt to concept drifts and maintain a stable performance over time. However, specifically for the classification task, the majority of such methods rely on the instantaneous availability of true labels for **all** already classified instances. This is a strong assumption that is rarely fulfilled in practical applications. Hence there is a clear need for effcient methods that can detect concept drifts in an unsupervised way. One possibility is the well-known Kolmogorov-Smirnov test, a statistical hypothesis test that checks whether two samples differ. This work has two main contributions. The first one is the Incremental Kolmogorov-Smirnov algorithm that allows performing the Kolmogorov-Smirnov hypothesis test instantly using two samples that change over time, where the change is an insertion and/or removal of an observation. Our algorithm employs a randomized tree and is able to perform the insertion and removal operations in *O*(log *N*) with high probability and calculate the Kolmogorov-Smirnov test in *O*(1), where *N* is the number of sample observations. This is a significant speed-up compared to the *O*(*N* log *N*) cost of the non-incremental implementation. The second contribution is the use of the Incremental Kolmogorov-Smirnov test to detect concept drifts without true labels. Classification algorithms adapted to use the test rely on a limited portion of those labels just to update the classification model after a concept drift is detected.

Original language | English |
---|---|

Title of host publication | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |

Publisher | Association for Computing Machinery (ACM) |

Pages | 1545-1554 |

Number of pages | 10 |

ISBN (Print) | 9781450342322 |

DOIs | |

Publication status | Published - 13 Aug 2016 |

Event | 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016 - San Francisco, United States Duration: 13 Aug 2016 → 17 Aug 2016 |

### Conference

Conference | 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016 |
---|---|

Country | United States |

City | San Francisco |

Period | 13/08/16 → 17/08/16 |

### Structured keywords

- Jean Golding

### Keywords

- Cartesian tree
- Concept drift
- Data stream
- Kolmogorov-Smirnov
- Lazy propagation
- Treap

## Fingerprint Dive into the research topics of 'Fast unsupervised online drift detection using incremental Kolmogorov-Smirnov test'. Together they form a unique fingerprint.

## Projects

- 1 Finished

## Cite this

*Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*(pp. 1545-1554). Association for Computing Machinery (ACM). https://doi.org/10.1145/2939672.2939836