Suffix trees are incredibly useful structures for computational genomics and combinatorial pattern matching. Due to the small alphabet sizes used in computational genomics, specialised hard-disk based suffix trees have been designed, but the problem of creating an efficient hard-disk based suffix tree for large and unbounded alphabet sizes remains essentially unsolved. We have designed a hard-disk based hybrid suffix tree, residing on hard-disk and in RAM, which takes advantage of memory mapping, a method for treating data on a hard-disk transparently as though it was in memory. Memory mapping is provided by many modern operating systems. Through the use of memory mapping the implementation only loads a small amount of the suffix tree into working memory, which allows it to load faster and maintains a fairly efficient query speed. The implementation is based on Ukkonen's suffix tree construction algorithm.
|Translated title of the contribution||A hard-disk based suffix tree implementation|
|Publisher||University of Bristol|
|Number of pages||14|
|Publication status||Published - 2011|