Suffix Array | Set 2 (nLogn Algorithm) - GeeksforGeeks
A suffix array is a sorted array of all suffixes of a given string.
O(nLogn) algorithm
O(nLognLogn) algorithm
Let us first discuss a O(n * Logn * Logn) algorithm for simplicity. The idea is to use the fact that strings that are to be sorted are suffixes of a single string.
We first sort all suffixes according to first character, then according to first 2 characters, then first 4 characters and so on while the number of characters to be considered is smaller than 2n. The important point is, if we have sorted suffixes according to first 2i characters, then we can sort suffixes according to first 2i+1 characters in O(nLogn) time using a nLogn sorting algorithm like Merge Sort. This is possible as two suffixes can be compared in O(1) time (we need to compare only two values).
The sort function is called O(Logn) times (Note that we increase number of characters to be considered in powers of 2). Therefore overall time complexity becomes O(nLognLogn).
nocow 后缀数组
http://www.aiuxian.com/article/p-1831057.html
Video: https://www.youtube.com/watch?v=HKPrVm5FWvg
https://courses.cs.ut.ee/MTAT.03.190/2012_fall/uploads/Main/karkkainen_sanders.pdf
http://algs4.cs.princeton.edu/63suffix/
http://web.stanford.edu/class/cs97si/suffix-array.pdf
Read full article from Suffix Array | Set 2 (nLogn Algorithm) - GeeksforGeeks
A suffix array is a sorted array of all suffixes of a given string.
O(nLogn) algorithm
O(nLognLogn) algorithm
Let us first discuss a O(n * Logn * Logn) algorithm for simplicity. The idea is to use the fact that strings that are to be sorted are suffixes of a single string.
We first sort all suffixes according to first character, then according to first 2 characters, then first 4 characters and so on while the number of characters to be considered is smaller than 2n. The important point is, if we have sorted suffixes according to first 2i characters, then we can sort suffixes according to first 2i+1 characters in O(nLogn) time using a nLogn sorting algorithm like Merge Sort. This is possible as two suffixes can be compared in O(1) time (we need to compare only two values).
The sort function is called O(Logn) times (Note that we increase number of characters to be considered in powers of 2). Therefore overall time complexity becomes O(nLognLogn).
Suffix arrays can be constructed in O(n) time also.
http://blog.csdn.net/itegel84/article/details/6019842// Structure to store information of a suffix
struct
suffix
{
int
index;
// To store original index
int
rank[2];
// To store ranks and next rank pair
};
// A comparison function used by sort() to compare two suffixes
// Compares two pairs, returns 1 if first pair is smaller
int
cmp(
struct
suffix a,
struct
suffix b)
{
return
(a.rank[0] == b.rank[0])? (a.rank[1] < b.rank[1] ?1: 0):
(a.rank[0] < b.rank[0] ?1: 0);
}
// This is the main function that takes a string 'txt' of size n as an
// argument, builds and return the suffix array for the given string
int
*buildSuffixArray(
char
*txt,
int
n)
{
// A structure to store suffixes and their indexes
struct
suffix suffixes[n];
// Store suffixes and their indexes in an array of structures.
// The structure is needed to sort the suffixes alphabatically
// and maintain their old indexes while sorting
for
(
int
i = 0; i < n; i++)
{
suffixes[i].index = i;
suffixes[i].rank[0] = txt[i] -
'a'
;
suffixes[i].rank[1] = ((i+1) < n)? (txt[i + 1] -
'a'
): -1;
}
// Sort the suffixes using the comparison function
// defined above.
sort(suffixes, suffixes+n, cmp);
// At his point, all suffixes are sorted according to first
// 2 characters. Let us sort suffixes according to first 4
// characters, then first 8 and so on
int
ind[n];
// This array is needed to get the index in suffixes[]
// from original index. This mapping is needed to get
// next suffix.
for
(
int
k = 4; k < 2*n; k = k*2)
{
// Assigning rank and index values to first suffix
int
rank = 0;
int
prev_rank = suffixes[0].rank[0];
suffixes[0].rank[0] = rank;
ind[suffixes[0].index] = 0;
// Assigning rank to suffixes
for
(
int
i = 1; i < n; i++)
{
// If first rank and next ranks are same as that of previous
// suffix in array, assign the same new rank to this suffix
if
(suffixes[i].rank[0] == prev_rank &&
suffixes[i].rank[1] == suffixes[i-1].rank[1])
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = rank;
}
else
// Otherwise increment rank and assign
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = ++rank;
}
ind[suffixes[i].index] = i;
}
// Assign next rank to every suffix
for
(
int
i = 0; i < n; i++)
{
int
nextindex = suffixes[i].index + k/2;
suffixes[i].rank[1] = (nextindex < n)?
suffixes[ind[nextindex]].rank[0]: -1;
}
// Sort the suffixes according to first k characters
sort(suffixes, suffixes+n, cmp);
}
// Store indexes of all sorted suffixes in the suffix array
int
*suffixArr =
new
int
[n];
for
(
int
i = 0; i < n; i++)
suffixArr[i] = suffixes[i].index;
// Return the suffix array
return
suffixArr;
}
Prefix Doubling算法(倍增法)是构造后缀数组一个比较实用的算法。其基本思想是先计算出每个后缀的k-前缀的rank值,然后在此基础上计算每个后缀的2k-前缀rank值,k从1开始。直到每个后缀都排出先后顺序为止(任何两个后缀都不会相等,换句话说,每个后缀终究都能排出先后顺序)。在处理2k-前缀时,只需要使用基数排序(radix sort)算法,先后对两位数字排序(可以采用计数排序算法(counting sort)对每一位数字排序)。在最坏情况下,需要做lgn次基数排序,每一次基数排序的操作次数为2*O(n),因此它的时间复杂度是O(nlgn)。
nocow 后缀数组
http://www.aiuxian.com/article/p-1831057.html
Video: https://www.youtube.com/watch?v=HKPrVm5FWvg
https://courses.cs.ut.ee/MTAT.03.190/2012_fall/uploads/Main/karkkainen_sanders.pdf
http://algs4.cs.princeton.edu/63suffix/
http://web.stanford.edu/class/cs97si/suffix-array.pdf
Read full article from Suffix Array | Set 2 (nLogn Algorithm) - GeeksforGeeks