Safe Haskell | None |
---|---|
Language | Haskell2010 |
Suffix array and LCP array
A suffix array is the sorted array of all suffixes of a string, each suffix represented by its start index. An LCP (longest common prefix) array is an array of the lengths of the longest common prefixes of consecutive suffixes in the suffix array. A suffix array, sometimes with its LCP array, can be used to perform tasks like finding patterns in the string or calculating certain properties of the string. Other suffix structures, such as suffix trees and suffix automata may serve as alternates to suffix arrays.
This implementation constructs the suffix array and LCP array from a given string indexing function, with elements of type Int. It takes O(n log n), which is usually fast enough. O(n) algorithms to construct suffix arrays exist.
Sources:
- Udi Manber and Gene Myers, "Suffix Arrays: A New Method for On-Line String Searches", 1990 https://dl.acm.org/doi/10.5555/320176.320218
- Kasai et al., "Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications", 2001 https://link.springer.com/chapter/10.1007/3-540-48194-X_17
- https://sites.google.com/site/indy256/algo/suffix_array
Documentation
buildSufA :: Chr -> Int -> (Int -> Chr) -> (UArray Int SuffixId, UArray Int Int) Source #
Builds a suffix array and LCP array. n is the length of the string. at is a 0-based indexing function into the string. Characters must be in [0..b-1]. Faster than buildSufAL unless b is too large. O(b + n log n).