A hard-disk based suffix tree implementation

TM Snowsill, F Nicart

Research output: Working paper

Abstract

Suffix trees are incredibly useful structures for computational genomics and combinatorial pattern matching. Due to the small alphabet sizes used in computational genomics, specialised hard-disk based suffix trees have been designed, but the problem of creating an efficient hard-disk based suffix tree for large and unbounded alphabet sizes remains essentially unsolved. We have designed a hard-disk based hybrid suffix tree, residing on hard-disk and in RAM, which takes advantage of memory mapping, a method for treating data on a hard-disk transparently as though it was in memory. Memory mapping is provided by many modern operating systems. Through the use of memory mapping the implementation only loads a small amount of the suffix tree into working memory, which allows it to load faster and maintains a fairly efficient query speed. The implementation is based on Ukkonen's suffix tree construction algorithm.
Translated title of the contributionA hard-disk based suffix tree implementation
Original languageEnglish
PublisherUniversity of Bristol
Number of pages14
Publication statusPublished - 2011

Fingerprint

Dive into the research topics of 'A hard-disk based suffix tree implementation'. Together they form a unique fingerprint.

Cite this