Projects per year
We revisit the complexity of one of the most basic problems in pattern matching. In the k-mismatch problem we must compute the Hamming distance between a pattern of length m and every m-length substring of a text of length n, as long as that Hamming distance is at most k. Where the Hamming distance is greater than k at some alignment of the pattern and text, we simply output “No”.
We study this problem in both the standard offline setting and also as a streaming problem. In the streaming k-mismatch problem the text arrives one symbol at a time and we must give an output before processing any future symbols. Our main results are as follows:
Our first result is a deterministic O(nk2 log k/m + n polylog m) time offline algorithm for k-mismatch on a text of length n. This is a factor of k improvement over the fastest previous result of this form from SODA 2000 [9, 10].
We then give a randomised and online algorithm which runs in the same time complexity but requires only O(k2 polylog m) space in total.
Next we give a randomised (1 + ∊)-approximation algorithm for the streaming k-mismatch problem which uses O(k2 polylog m/∊2) space and runs in O(polylog m/∊2) worst-case time per arriving symbol.
Finally we combine our new results to derive a randomised O(k2 polylog m) space algorithm for the streaming k-mismatch problem which runs in O(√k log k + polylog m) worst-case time per arriving symbol. This improves the best previous space complexity for streaming k-mismatch from FOCS 2009  by a factor of k. We also improve the time complexity of this previous result by an even greater factor to match the fastest known offline algorithm (up to logarithmic factors).
|Title of host publication||Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms|
|Publisher||Society for Industrial and Applied Mathematics|
|Number of pages||14|
|Publication status||Published - 2016|