Abstract
We consider the well known problem of pattern matching under the Hamming distance. Previous approaches have shown how to count the number of mismatches efficiently, especially when a bound is known for the maximum Hamming distance. Our interest is different in that we wish collect a random sample of mismatches of fixed size at each position in the text. Given a pattern p of length m and a text t of length n, we show how to sample with high probability c mismatches where possible from every alignment of p and t in O((c + logn)(n + mlogm)logm) time. Further, we guarantee that the mismatches are sampled uniformly and can therefore be seen as representative of the types of mismatches that occur.
Translated title of the contribution | String Processing and Information Retrieval |
---|---|
Original language | English |
Pages (from-to) | 99 - 108 |
Number of pages | 10 |
Journal | Mismatch Sampling |
Volume | 280/2009 |
DOIs | |
Publication status | Published - 2009 |