Related: Longest Repeating Subsequence
https://leetcode.com/problems/longest-duplicate-substring/
Given a string
S
, consider all duplicated substrings: (contiguous) substrings of S that occur 2 or more times. (The occurrences may overlap.)
Return any duplicated substring that has the longest possible length. (If
S
does not have a duplicated substring, the answer is ""
.)
Example 1:
Input: "banana" Output: "ana"
Example 2:
Input: "abcd" Output: ""
Note:
2 <= S.length <= 10^5
S
consists of lowercase English letters.
X. Approach 1: Binary Search + Rabin-Karp
https://leetcode.com/articles/longest-duplicate-substring/
String Searching Algorithms
The problem is a follow-up of Longest Repeating Substring, and typically used to check if you're comfortable with string searching algortihms.
Best algorithms have a linear execution time in average. The most popular ones are Aho-Corasick, KMPand Rabin-Karp: Aho-Corasick is used by fgrep, KMP is used for chinese string searching, and Rabin-Karp is used for plagiarism detection and in bioinformatics to look for similarities in two or more proteins.
The first two are optimised for a single pattern search, and Rabin-Karp for a multiple pattern search, that is exactly the case here.
public int search(int L, int a, long modulus, int n, int[] nums) { // compute the hash of string S[:L] long h = 0; for(int i = 0; i < L; ++i) h = (h * a + nums[i]) % modulus; // already seen hashes of strings of length L HashSet<Long> seen = new HashSet(); seen.add(h); // const value to be used often : a**L % modulus long aL = 1; for (int i = 1; i <= L; ++i) aL = (aL * a) % modulus; for(int start = 1; start < n - L + 1; ++start) { // compute rolling hash in O(1) time h = (h * a - nums[start - 1] * aL % modulus + modulus) % modulus; h = (h + nums[start + L - 1]) % modulus; if (seen.contains(h)) return start; seen.add(h); } return -1; } public String longestDupSubstring(String S) { int n = S.length(); // convert string to array of integers // to implement constant time slice int[] nums = new int[n]; for(int i = 0; i < n; ++i) nums[i] = (int)S.charAt(i) - (int)'a'; // base value for the rolling hash function int a = 26; // modulus value for the rolling hash function to avoid overflow long modulus = (long)Math.pow(2, 32); // binary search, L = repeating string length int left = 1, right = n; int L; while (left != right) { L = left + (right - left) / 2; if (search(L, a, modulus, n, nums) != -1) left = L + 1; else right = L; } int start = search(left - 1, a, modulus, n, nums); return start != -1 ? S.substring(start, start + left - 1) : ""; }https://leetcode.com/problems/longest-duplicate-substring/discuss/290871/Python-Binary-Search
Suffix array is typical solution for this problem.
The fastest way is to copy a template form the Internet.
The code will be quite long.
Here I want to share a binary search solution.
The code will be quite long.
Here I want to share a binary search solution.
Explanation
Binary search the length of longest duplicate substring and call the help function
rolling hash the string in this window,
record the
and try to find duplicated string.
test(L)
.test(L)
slide a window of length L
,rolling hash the string in this window,
record the
seen
string in a hashset,and try to find duplicated string.
I give it a big
Actually there could be hash collision.
One solution is to have two different mod for hash.
Or we can use a hashmap to record the index of string.
mod
for rolling hash and it should be enough for this problem.Actually there could be hash collision.
One solution is to have two different mod for hash.
Or we can use a hashmap to record the index of string.
Complexity
Binary Search in range 1 and N, so it's
Rolling hash
Overall
Space
O(logN)
Rolling hash
O(N)
Overall
O(NlogN)
Space
O(N)
Python:
def longestDupSubstring(self, S):
A = [ord(c) - ord('a') for c in S]
mod = 2**63 - 1
def test(L):
p = pow(26, L, mod)
cur = reduce(lambda x, y: (x * 26 + y) % mod, A[:L], 0)
seen = {cur}
for i in xrange(L, len(S)):
cur = (cur * 26 + A[i] - A[i - L] * p) % mod
if cur in seen: return i - L + 1
seen.add(cur)
res, lo, hi = 0, 0, len(S)
while lo < hi:
mi = (lo + hi + 1) / 2
pos = test(mi)
if pos:
lo = mi
res = pos
else:
hi = mi - 1
return S[res:res + lo]
思路:二分。为了节省内存,将string hash成整数(用了mod,能AC是因为数据量不是很大)
为了降低hash的时间,利用前面substring的hash结果
X. Suffix Array
https://leetcode.com/problems/longest-duplicate-substring/discuss/290852/Suffix-array-clear-solution