haccepted-0.1.0.0: Data structures and algorithms
Safe HaskellNone
LanguageHaskell2010

SuffixTree

Description

Suffix tree

A moderately simple and flexible suffix tree. A suffix tree of a string is a compressed trie of all the suffixes of the string, useful for fast matching on substrings of the string or to calculate certain properties of the string. Other suffix structures such as suffix arrays and suffix automata may serve as alternates to suffix trees.

The implementation here constructs a SuffixTree a from a given string index function, with string elements of type Int. Ukkonen's algorithm is used for construction. This is not lazy, the entire tree is constructed right away. Only the implicit suffix tree is constructed, which is a suffix tree where suffixes that are prefixes of other suffixes do not end in leaves. To make them end in leaves, set the last element of the string to a unique value. A SuffixTree a also stores accumulated values of type a at every node. These values are calculated using user supplied functions, which can be chosen based on what the tree will be used for.

Sources:

String indexing is assumed to take O(1). Let k be the alphabet size. Let the complexity of IntMap operations be f(n), where n is the size of the map. f(n) is O(min(n, word size)), see IntMap documentation for details.

Synopsis

Documentation

data SufTNode a Source #

Constructors

SufTNode !a !(IntMap (SufTEdge a)) 

Instances

Instances details
NFData a => NFData (SufTNode a) Source # 
Instance details

Defined in SuffixTree

Methods

rnf :: SufTNode a -> () #

data SufTEdge a Source #

Constructors

SufTEdge !Int !Int !(SufTNode a) 

Instances

Instances details
NFData a => NFData (SufTEdge a) Source # 
Instance details

Defined in SuffixTree

Methods

rnf :: SufTEdge a -> () #

type Chr = Int Source #

buildSufT :: (Int -> a) -> (a -> Int -> a) -> (a -> a -> a) -> Int -> (Int -> Chr) -> SufTNode a Source #

Builds a suffix tree. n is the length of the string. at is a 0-based indexing function into the string. fromLeaf constructs a value from a leaf index. updEdge constructs a value from an existing value and a length of an edge leading to it. merge combines two values. Examples: To query the number of times a pattern occurs in the string, use const 1, const, and (+). To calculate the number of distinct substrings, use const 0, (+), and (+). O(n * f(k)), plus the above functions are each called O(n) times.

matchSufT :: (a -> Int -> a) -> (Int -> Chr) -> SufTNode a -> Int -> (Int -> Chr) -> Maybe a Source #

Matches a pattern on the suffix tree. updEdge and at are as defined for buildSufT. m is the length of the pattern. at' is a 0-based indexing function into the pattern. If the pattern is not present, returns Nothing. Otherwise returns the accumulated value for the subtree where the pattern ends. O(m * f(k)) plus at most one call of updEdge.

buildMatchSufT :: (Int -> a) -> (a -> Int -> a) -> (a -> a -> a) -> Int -> (Int -> Chr) -> Int -> (Int -> Chr) -> Maybe a Source #

buildSufT together with matchSufT to avoid having to repeat arguments. Apply partially for multiple queries.

drawSufT :: Show a => SufTNode a -> String Source #

Draws a suffix tree. Can be used for debugging.