We present a new variant of the suffix tree called a distributed suffix tree (DST) which allows for much larger databases of strings to be handled efficiently. The method is based on a new linear time construction algorithm for subtrees of a suffix tree. The new data structure tackles the memory bottleneck problem by constructing these subtrees independently and in parallel. It is designed for distributed memory parallel computing environments (e.g., Beowulf clusters). The central advantage is that standard operations of biological importance on suffix trees are shown to be easily translatable to this new data structure. While none of these operations on the DST require inter-process communication, many have optimal expected parallel running times.
Bibliographical notePublisher: Elsevier