https://www.geeksforgeeks.org/suffix-tree-application-3-longest-repeated-substring/
This is suffix tree for string “ABABABA$”.
In this string, following substrings are repeated:
A, B, AB, BA, ABA, BAB, ABAB, BABA, ABABA
And Longest Repeated Substring is ABABA.
In a suffix tree, one node can’t have more than one outgoing edge starting with same character, and so if there are repeated substring in the text, they will share on same path and that path in suffix tree will go through one or more internal node(s) down the tree (below the point where substring ends on that path).
In above figure, we can see that
https://blog.csdn.net/lin_tuer/article/details/82778962
https://sandipanweb.wordpress.com/2017/05/10/suffix-tree-construction-and-the-longest-repeated-substring-problem-in-python/
Given a text string, find Longest Repeated Substring in the text. If there are more than one Longest Repeated Substrings, get any one of them.
Longest Repeated Substring in GEEKSFORGEEKS is: GEEKS Longest Repeated Substring in AAAAAAAAAA is: AAAAAAAAA
This is suffix tree for string “ABABABA$”.
In this string, following substrings are repeated:
A, B, AB, BA, ABA, BAB, ABAB, BABA, ABABA
And Longest Repeated Substring is ABABA.
In a suffix tree, one node can’t have more than one outgoing edge starting with same character, and so if there are repeated substring in the text, they will share on same path and that path in suffix tree will go through one or more internal node(s) down the tree (below the point where substring ends on that path).
In above figure, we can see that
- Path with Substring “A” has three internal nodes down the tree
- Path with Substring “AB” has two internal nodes down the tree
- Path with Substring “ABA” has two internal nodes down the tree
- Path with Substring “ABAB” has one internal node down the tree
- Path with Substring “ABABA” has one internal node down the tree
- Path with Substring “B” has two internal nodes down the tree
- Path with Substring “BA” has two internal nodes down the tree
- Path with Substring “BAB” has one internal node down the tree
- Path with Substring “BABA” has one internal node down the tree
All above substrings are repeated.
Substrings ABABAB, ABABABA, BABAB, BABABA have no internal node down the tree (after the point where substring end on the path), and so these are not repeated.
Can you see how to find longest repeated substring ??
We can see in figure that, longest repeated substring will end at the internal node which is farthest from the root (i.e. deepest node in the tree), because length of substring is the path label length from root to that internal node.
We can see in figure that, longest repeated substring will end at the internal node which is farthest from the root (i.e. deepest node in the tree), because length of substring is the path label length from root to that internal node.
So finding longest repeated substring boils down to finding the deepest node in suffix tree and then get the path label from root to that deepest internal node.
public String LCS(String str) {
if (str == null || str.length() == 0)
return "";
TrieNode root = new TrieNode();
buildSuffixTrie(str, root);
return longestCommonSuffix(root, "");
}
public void buildSuffixTrie(String str, TrieNode root) {
if (str == null || str.length() == 0)
return;
for (int i = 0; i < str.length(); i++) {
String tmp = str.substring(i);
TrieNode node = root;
for (int j = 0; j < tmp.length(); j++) {
int index = tmp.charAt(j) - 'a';
if (node.next[index] == null) {
node.next[index] = new TrieNode();
} else {
node.next[index].count++;
}
node = node.next[index];
}
node.isWord = true;
}
}
public String longestCommonSuffix(TrieNode root, String cur_prefix) {
if (root == null)
return "";
String res = cur_prefix;
for (int i = 0; i < 26; i++) {
String tmp = cur_prefix;
if (root.next[i] != null) {
if (root.next[i].count <= 0) {
break;
}
tmp = longestCommonSuffix(root.next[i], cur_prefix + String.valueOf(((char) ('a' + i))));
}
if (tmp.length() > res.length())
res = tmp;
}
return res;
}
class TrieNode {
boolean isWord;
TrieNode[] next;
int count; // 重要,用来保存有几个 word 的前缀共用当前的字符串
public TrieNode() {
isWord = false;
next = new TrieNode[26];
count = 0;
}
}
https://sandipanweb.wordpress.com/2017/05/10/suffix-tree-construction-and-the-longest-repeated-substring-problem-in-python/