Pattern Searching | Set 6 (Efficient Construction of Finite Automata) - GeeksforGeeks
FA can be constructed in O(m*NO_OF_CHARS) time. The idea is similar to lps (longest prefix suffix) array construction discussed in the KMP algorithm. We use previously filled rows to fill a new row.

Algorithm:
1) Fill the first row. All entries in first row are always 0 except the entry for pat[0] character. For pat[0] character, we always need to go to state 1.
2) Initialize lps as 0. lps for the first index is always 0.
3) Do following for rows at index i = 1 to M. (M is the length of the pattern)
…..a) Copy the entries from the row at index equal to lps.
…..b) Update the entry for pat[i] character to i+1.
…..c) Update lps “lps = TF[lps][pat[i]]” where TF is the 2D array which is being constructed.
}
FA can be constructed in O(m*NO_OF_CHARS) time. The idea is similar to lps (longest prefix suffix) array construction discussed in the KMP algorithm. We use previously filled rows to fill a new row.
1) Fill the first row. All entries in first row are always 0 except the entry for pat[0] character. For pat[0] character, we always need to go to state 1.
2) Initialize lps as 0. lps for the first index is always 0.
3) Do following for rows at index i = 1 to M. (M is the length of the pattern)
…..a) Copy the entries from the row at index equal to lps.
…..b) Update the entry for pat[i] character to i+1.
…..c) Update lps “lps = TF[lps][pat[i]]” where TF is the 2D array which is being constructed.
Time Complexity for FA construction is O(M*NO_OF_CHARS).
/* This function builds the TF table which represents Finite Automata for a given pattern */void computeTransFun(char *pat, int M, int TF[][NO_OF_CHARS]){ int i, lps = 0, x; // Fill entries in first row for (x =0; x < NO_OF_CHARS; x++) TF[0][x] = 0; TF[0][pat[0]] = 1; // Fill entries in other rows for (i = 1; i<= M; i++) { // Copy values from row at index lps for (x = 0; x < NO_OF_CHARS; x++) TF[i][x] = TF[lps][x]; // Update the entry corresponding to this character TF[i][pat[i]] = i + 1; // Update lps for next row to be filled if (i < M) lps = TF[lps][pat[i]]; }}void search(char *pat, char *txt){ int M = strlen(pat); int N = strlen(txt); int TF[M+1][NO_OF_CHARS]; computeTransFun(pat, M, TF); // process text over FA. int i, j=0; for (i = 0; i < N; i++) { j = TF[j][txt[i]]; if (j == M) { printf ("\n pattern found at index %d", i-M+1); } }
Consider this example
Pattern – A C A C
At state-0, we have only “A”, so lps = 0
Transition from state-0 to state-1, probable cases may be
Case-1 a new ‘A’ comes, then we go back to our longest prefix suffix till now which is “A” which is state-0 and see what if a ‘A’ comes, in this case it is 1
Case-2 a new ‘G’ comes, then we go back to our longest prefix suffix till now and see what if ‘G’ comes in this case it will be 0
Case-3 a new ‘C’ comes, then also value will be 0
That’s why we are first copying the lps row values into the current ith row.
Then we update the state transition for patt[i] in this case for ‘C’ state will be 2.
Then we calculate the current lps value, that is “AC” but still lps =0 as there is no longest prefix suffix.
Pattern – A C A C
At state-0, we have only “A”, so lps = 0
Transition from state-0 to state-1, probable cases may be
Case-1 a new ‘A’ comes, then we go back to our longest prefix suffix till now which is “A” which is state-0 and see what if a ‘A’ comes, in this case it is 1
Case-2 a new ‘G’ comes, then we go back to our longest prefix suffix till now and see what if ‘G’ comes in this case it will be 0
Case-3 a new ‘C’ comes, then also value will be 0
That’s why we are first copying the lps row values into the current ith row.
Then we update the state transition for patt[i] in this case for ‘C’ state will be 2.
Then we calculate the current lps value, that is “AC” but still lps =0 as there is no longest prefix suffix.
Calculation of lps can be clear from state transition-2 to 3.
Current lps is 0, now ‘A’ comes so that new lps is 1 for “ACA” which can be found out in row [lps][‘A’]
Read full article from Pattern Searching | Set 6 (Efficient Construction of Finite Automata) - GeeksforGeeksCurrent lps is 0, now ‘A’ comes so that new lps is 1 for “ACA” which can be found out in row [lps][‘A’]