Abstract
Suffix trees are incredibly useful structures for computational genomics and combinatorial pattern matching. Due to the small alphabet sizes used in computational genomics, specialised hard-disk based suffix trees have been designed, but the problem of creating an efficient hard-disk based suffix tree for large and unbounded alphabet sizes remains essentially unsolved.
We have designed a hard-disk based hybrid suffix tree, residing on hard-disk and in RAM, which takes advantage of memory mapping, a method for treating data on a hard-disk transparently as though it was in memory. Memory mapping is provided by many modern operating systems. Through the use of memory mapping the implementation only loads a small amount of the suffix tree into working memory, which allows it to load faster and maintains a fairly efficient query speed.
The implementation is based on Ukkonen's suffix tree construction algorithm.
Translated title of the contribution | A hard-disk based suffix tree implementation |
---|---|
Original language | English |
Publisher | University of Bristol |
Number of pages | 14 |
Publication status | Published - 2011 |