haccepted-0.1.0.0: Data structures and algorithms
Safe HaskellSafe-Inferred
LanguageHaskell2010

AhoCorasick

Description

Aho-Corasick algorithm

The Aho-Corasick algorithm builds an automaton from a set of pattern strings, and then uses it to find positions in a search string where each of the pattern strings occur.

This implementation only works on ByteStrings, to keep things fast. If required it can be adapted to work on other sequence types.

A TrieAC a can be constructed from pattern strings with associated values a, which can be then be turned into an ACRoot a. An ACRoot a can then be run on a search string to find matches. Construction and matching are both lazy.

Sources:

For complexities below, k is the alphabet range (max 256).

Synopsis

Documentation

data TrieAC a Source #

Instances

Instances details
Show a => Show (TrieAC a) Source # 
Instance details

Defined in AhoCorasick

Methods

showsPrec :: Int -> TrieAC a -> ShowS #

show :: TrieAC a -> String #

showList :: [TrieAC a] -> ShowS #

emptyTAC :: TrieAC a Source #

An empty trie.

insertTAC :: ByteString -> a -> TrieAC a -> TrieAC a Source #

Inserts a string with an associated value into a trie. O(n log k) where n is the length of the string.

fromListTAC :: [(ByteString, a)] -> TrieAC a Source #

Builds a trie from a list of strings and associated values. O(n log k) where n is total length of the strings.

data ACRoot a Source #

Instances

Instances details
NFData a => NFData (ACRoot a) Source # 
Instance details

Defined in AhoCorasick

Methods

rnf :: ACRoot a -> () #

fromTrieAC :: TrieAC a -> ACRoot a Source #

Builds an Aho-Corasick automaton from a trie. O(n), where n is the number of nodes in the trie. This is not more than the total length of strings the trie was constructed with.

matchAC :: ACRoot a -> ByteString -> [[a]] Source #

Returns a list of length (m + 1) where m is the length of the search string. This list contains a list of pattern matches for every position in the string, including before the first character. A match at a position is present as the associated value of the pattern string found to be ending at that position. O(m log k + z), where m is the length of the string and z is the total number of matches.