Abstract
In the longest common substring problem we are given two strings of length n and must find a substring of maximal length that occurs in both strings. It is well-known that the problem can be solved in linear time, but the solution is not robust and can vary greatly when the input strings are changed even by one letter. To circumvent this, Leimester and Morgenstern introduced the problem of the longest common substring with k mismatches. Lately, this problem has received a lot of attention in the literature, and several algorithms have been suggested. The running time of these algorithms is n2-o(1), and unfortunately, conditional lower bounds have been shown which imply that there is little hope to improve this bound. In this paper we study a different but closely related problem of the longest common substring with approximately k mismatches and use computational geometry techniques to show that it admits a randomised solution with strongly subquadratic running time.
Original language | English |
---|---|
Title of host publication | 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016 |
Publisher | Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing |
Pages | 21.1-21.11 |
Number of pages | 11 |
Volume | 54 |
ISBN (Electronic) | 9783959770125 |
DOIs | |
Publication status | Published - 1 Jun 2016 |
Event | 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016 - Tel Aviv, Israel Duration: 27 Jun 2016 → 29 Jun 2016 |
Conference
Conference | 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016 |
---|---|
Country/Territory | Israel |
City | Tel Aviv |
Period | 27/06/16 → 29/06/16 |
Keywords
- Radomised algorithms
- string similarity measures
- longest common substring
- sketching
- locality-sensitive hashing