Longest common substring with approximately k mismatches

Tatiana Starikovskaia*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

9 Citations (Scopus)
253 Downloads (Pure)

Abstract

In the longest common substring problem we are given two strings of length n and must find a substring of maximal length that occurs in both strings. It is well-known that the problem can be solved in linear time, but the solution is not robust and can vary greatly when the input strings are changed even by one letter. To circumvent this, Leimester and Morgenstern introduced the problem of the longest common substring with k mismatches. Lately, this problem has received a lot of attention in the literature, and several algorithms have been suggested. The running time of these algorithms is n2-o(1), and unfortunately, conditional lower bounds have been shown which imply that there is little hope to improve this bound. In this paper we study a different but closely related problem of the longest common substring with approximately k mismatches and use computational geometry techniques to show that it admits a randomised solution with strongly subquadratic running time.

Original languageEnglish
Title of host publication27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Pages21.1-21.11
Number of pages11
Volume54
ISBN (Electronic)9783959770125
DOIs
Publication statusPublished - 1 Jun 2016
Event27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016 - Tel Aviv, Israel
Duration: 27 Jun 201629 Jun 2016

Conference

Conference27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016
Country/TerritoryIsrael
CityTel Aviv
Period27/06/1629/06/16

Keywords

  • Radomised algorithms
  • string similarity measures
  • longest common substring
  • sketching
  • locality-sensitive hashing

Fingerprint

Dive into the research topics of 'Longest common substring with approximately k mismatches'. Together they form a unique fingerprint.

Cite this