Pattern Searching | Set 6 (Efficient Construction of Finite Automata) - GeeksforGeeks
FA can be constructed in O(m*NO_OF_CHARS) time. The idea is similar to lps (longest prefix suffix) array construction discussed in the KMP algorithm. We use previously filled rows to fill a new row.
Algorithm:
1) Fill the first row. All entries in first row are always 0 except the entry for pat[0] character. For pat[0] character, we always need to go to state 1.
2) Initialize lps as 0. lps for the first index is always 0.
3) Do following for rows at index i = 1 to M. (M is the length of the pattern)
…..a) Copy the entries from the row at index equal to lps.
…..b) Update the entry for pat[i] character to i+1.
…..c) Update lps “lps = TF[lps][pat[i]]” where TF is the 2D array which is being constructed.
}
FA can be constructed in O(m*NO_OF_CHARS) time. The idea is similar to lps (longest prefix suffix) array construction discussed in the KMP algorithm. We use previously filled rows to fill a new row.
Algorithm:
1) Fill the first row. All entries in first row are always 0 except the entry for pat[0] character. For pat[0] character, we always need to go to state 1.
2) Initialize lps as 0. lps for the first index is always 0.
3) Do following for rows at index i = 1 to M. (M is the length of the pattern)
…..a) Copy the entries from the row at index equal to lps.
…..b) Update the entry for pat[i] character to i+1.
…..c) Update lps “lps = TF[lps][pat[i]]” where TF is the 2D array which is being constructed.
Time Complexity for FA construction is O(M*NO_OF_CHARS).
/* This function builds the TF table which represents Finite Automata for a
given pattern */
void
computeTransFun(
char
*pat,
int
M,
int
TF[][NO_OF_CHARS])
{
int
i, lps = 0, x;
// Fill entries in first row
for
(x =0; x < NO_OF_CHARS; x++)
TF[0][x] = 0;
TF[0][pat[0]] = 1;
// Fill entries in other rows
for
(i = 1; i<= M; i++)
{
// Copy values from row at index lps
for
(x = 0; x < NO_OF_CHARS; x++)
TF[i][x] = TF[lps][x];
// Update the entry corresponding to this character
TF[i][pat[i]] = i + 1;
// Update lps for next row to be filled
if
(i < M)
lps = TF[lps][pat[i]];
}
}
void
search(
char
*pat,
char
*txt)
{
int
M =
strlen
(pat);
int
N =
strlen
(txt);
int
TF[M+1][NO_OF_CHARS];
computeTransFun(pat, M, TF);
// process text over FA.
int
i, j=0;
for
(i = 0; i < N; i++)
{
j = TF[j][txt[i]];
if
(j == M)
{
printf
(
"\n pattern found at index %d"
, i-M+1);
}
}
Consider this example
Pattern – A C A C
At state-0, we have only “A”, so lps = 0
Transition from state-0 to state-1, probable cases may be
Case-1 a new ‘A’ comes, then we go back to our longest prefix suffix till now which is “A” which is state-0 and see what if a ‘A’ comes, in this case it is 1
Case-2 a new ‘G’ comes, then we go back to our longest prefix suffix till now and see what if ‘G’ comes in this case it will be 0
Case-3 a new ‘C’ comes, then also value will be 0
That’s why we are first copying the lps row values into the current ith row.
Then we update the state transition for patt[i] in this case for ‘C’ state will be 2.
Then we calculate the current lps value, that is “AC” but still lps =0 as there is no longest prefix suffix.
Pattern – A C A C
At state-0, we have only “A”, so lps = 0
Transition from state-0 to state-1, probable cases may be
Case-1 a new ‘A’ comes, then we go back to our longest prefix suffix till now which is “A” which is state-0 and see what if a ‘A’ comes, in this case it is 1
Case-2 a new ‘G’ comes, then we go back to our longest prefix suffix till now and see what if ‘G’ comes in this case it will be 0
Case-3 a new ‘C’ comes, then also value will be 0
That’s why we are first copying the lps row values into the current ith row.
Then we update the state transition for patt[i] in this case for ‘C’ state will be 2.
Then we calculate the current lps value, that is “AC” but still lps =0 as there is no longest prefix suffix.
Calculation of lps can be clear from state transition-2 to 3.
Current lps is 0, now ‘A’ comes so that new lps is 1 for “ACA” which can be found out in row [lps][‘A’]
Read full article from Pattern Searching | Set 6 (Efficient Construction of Finite Automata) - GeeksforGeeksCurrent lps is 0, now ‘A’ comes so that new lps is 1 for “ACA” which can be found out in row [lps][‘A’]