The analysis of data streams requires methods which can cope with a very high volume of data points. Under the requirement that algorithms must have constant computational complexity and a fixed amount of memory, we develop a framework for detecting changes in data streams when the distributional form of the stream variables is unknown. We consider the general problem of detecting a change in the location and/or scale parameter of a stream of random variables, and adapt several nonparametric hypothesis tests to create a streaming change detection algorithm. This algorithm uses a test statistic with a null distribution independent of the data. This allows a desired rate of false alarms to be maintained for any stream even when its distribution is unknown. Our method is based on hypothesis tests which involve ranking data points, and we propose a method for calculating these ranks online in a manner which respects the constraints of data stream analysis.
- Change detection, Nonparametric tests, Streaming data