A hard-disk based suffix tree implementation

TM Snowsill, F Nicart

Research output: Working paperWorking paper and Preprints

Abstract

Suffix trees are incredibly useful structures for computational genomics and combinatorial pattern matching. Due to the small alphabet sizes used in computational genomics, specialised hard-disk based suffix trees have been designed, but the problem of creating an efficient hard-disk based suffix tree for large and unbounded alphabet sizes remains essentially unsolved. We have designed a hard-disk based hybrid suffix tree, residing on hard-disk and in RAM, which takes advantage of memory mapping, a method for treating data on a hard-disk transparently as though it was in memory. Memory mapping is provided by many modern operating systems. Through the use of memory mapping the implementation only loads a small amount of the suffix tree into working memory, which allows it to load faster and maintains a fairly efficient query speed. The implementation is based on Ukkonen's suffix tree construction algorithm.
Translated title of the contributionA hard-disk based suffix tree implementation
Original languageEnglish
PublisherUniversity of Bristol
Number of pages14
Publication statusPublished - 2011

Fingerprint Dive into the research topics of 'A hard-disk based suffix tree implementation'. Together they form a unique fingerprint.

Cite this