Suffix Tree Application 1 - Substring Check - GeeksforGeeks
http://www.ideserve.co.in/learn/pattern-matching-using-trie
The time taken by this algorithm is O(n^2) where 'n' is the length of the text. This time is essentially taken to build the trie. Note that this is one time activity and subsequent searches of another pattern in this text would take O(m) time where m is the length of the pattern.
Read full article from Suffix Tree Application 1 - Substring Check - GeeksforGeeks
As a prerequisite, we must know how to build a suffix tree in one or the other way.
Once we have a suffix tree built for given text, we need to traverse the tree from root to leaf against the characters in pattern. If we do not fall off the tree (i.e. there is a path from root to leaf or somewhere in middle) while traversal, then pattern exists in text as a substring.
Once we have a suffix tree built for given text, we need to traverse the tree from root to leaf against the characters in pattern. If we do not fall off the tree (i.e. there is a path from root to leaf or somewhere in middle) while traversal, then pattern exists in text as a substring.
Here we will build suffix tree using Ukkonen’s Algorithm, discussed already as below:
Ukkonen’s Suffix Tree Construction – Part 1
Ukkonen’s Suffix Tree Construction – Part 2
Ukkonen’s Suffix Tree Construction – Part 3
Ukkonen’s Suffix Tree Construction – Part 4
Ukkonen’s Suffix Tree Construction – Part 5
Ukkonen’s Suffix Tree Construction – Part 6
Ukkonen’s Suffix Tree Construction – Part 1
Ukkonen’s Suffix Tree Construction – Part 2
Ukkonen’s Suffix Tree Construction – Part 3
Ukkonen’s Suffix Tree Construction – Part 4
Ukkonen’s Suffix Tree Construction – Part 5
Ukkonen’s Suffix Tree Construction – Part 6
Ukkonen’s Suffix Tree Construction takes O(N) time and space to build suffix tree for a string of length N and after that, traversal for substring check takes O(M) for a pattern of length M.
With slight modification in traversal algorithm discussed here, we can answer following:
- Find all occurrences of a given pattern P present in text T.
- How to check if a pattern is prefix of a text?
- How to check if a pattern is suffix of a text?
int
doTraversal(Node *n,
char
* str,
int
idx)
{
if
(n == NULL)
{
return
-1;
// no match
}
int
res = -1;
//If node n is not root node, then traverse edge
//from node n's parent to node n.
if
(n->start != -1)
{
res = traverseEdge(str, idx, n->start, *(n->end));
if
(res != 0)
return
res;
// match (res = 1) or no match (res = -1)
}
//Get the character index to search
idx = idx + edgeLength(n);
//If there is an edge from node n going out
//with current character str[idx], travrse that edge
if
(n->children[str[idx]] != NULL)
return
doTraversal(n->children[str[idx]], str, idx);
else
return
-1;
// no match
}
void
checkForSubString(
char
* str)
{
int
res = doTraversal(root, str, 0);
if
(res == 1)
printf
(
"Pattern <%s> is a Substring\n"
, str);
else
printf
(
"Pattern <%s> is NOT a Substring\n"
, str);
}
The first step of this algorithm is to construct a trie from given text. To construct this trie, all the suffixes of text are inserted into the trie.
At the time of insertion, each node also stores the indices where the character corresponding to that node occurs in the text and that character is the last character of the sub-string starting from root node.
The time taken by this algorithm is O(n^2) where 'n' is the length of the text. This time is essentially taken to build the trie. Note that this is one time activity and subsequent searches of another pattern in this text would take O(m) time where m is the length of the pattern.
The worst case space complexity of this algorithm is O(n^2) where 'n' is the length of the text. This worst case occurs for text like "aaaaa".
7 | final static int ALPHABET_SIZE = 26; |
8 | final static int NON_VALUE = -1; |
9 | |
10 | class TrieNode |
11 | { |
12 | |
13 | ArrayList<Integer> endingIndices; |
14 | TrieNode[] children; |
15 | |
16 | TrieNode(boolean isLeafNode, int value) |
17 | { |
18 | children = new TrieNode[ALPHABET_SIZE]; |
19 | this.endingIndices = new ArrayList(); |
20 | } |
21 | } |
22 | |
23 | TrieNode root; |
24 | TrieForPatternMatching() |
25 | { |
26 | this.root = new TrieNode(false, NON_VALUE); |
27 | } |
28 | |
29 | private int getIndex(char ch) |
30 | { |
31 | return ch - 'a'; |
32 | } |
33 | |
34 | public void insert(String key, int offset) |
35 | { |
36 | |
37 | if (key == null) return; |
38 | |
39 | key = key.toLowerCase(); |
40 | |
41 | TrieNode currentNode = this.root; |
42 | int charIndex = 0; |
43 | |
44 | while (charIndex < key.length()) |
45 | { |
46 | int childIndex = getIndex(key.charAt(charIndex)); |
47 | |
48 | if (childIndex < 0 || childIndex >= ALPHABET_SIZE) |
49 | { |
50 | System.out.println("Invalid Key" ); |
51 | return; |
52 | } |
53 | |
54 | if (currentNode.children[childIndex] == null) |
55 | { |
56 | currentNode.children[childIndex] = new TrieNode(false, NON_VALUE); |
57 | } |
58 | |
59 | currentNode = currentNode.children[childIndex]; |
60 | |
61 | |
62 | currentNode.endingIndices.add(charIndex + offset); |
63 | |
64 | charIndex += 1; |
65 | } |
66 | |
67 | } |
68 | |
69 | |
70 | |
71 | private void matchAndPrintPattern(String str) |
72 | { |
73 | if (str == null) return; |
74 | |
75 | TrieNode currentNode = root; |
76 | int charIndex = 0; |
77 | |
78 | while ((charIndex < str.length())) |
79 | { |
80 | if (currentNode == null) break; |
81 | |
82 | currentNode = currentNode.children[getIndex(str.charAt(charIndex))]; |
83 | |
84 | charIndex += 1; |
85 | } |
86 | |
87 | if (charIndex < str.length()) |
88 | { |
89 | System.out.println("Pattern does not exist." ); |
90 | } |
91 | else |
92 | { |
93 | System.out.println("Pattern found in text starting at following indices.." ); |
94 | for (int i = 0; i < currentNode.endingIndices.size(); i++) |
95 | { |
96 | System.out.print(currentNode.endingIndices.get(i) - (str.length() - 1)); |
97 | System.out.print("," ); |
98 | } |
99 | } |
100 | return; |
101 | } |
102 | |
103 | private void generateAndInsertSuffixes(String text) |
104 | { |
105 | for (int i = 0; i < text.length(); i++) |
106 | { |
107 | this.insert(text.substring(i), i); |
108 | } |
109 | } |
110 | |
111 | public void printPatternMatches(String text, String pattern) |
112 | { |
113 | generateAndInsertSuffixes(text); |
114 | matchAndPrintPattern(pattern); |
115 | } |