Searching for Patterns | Set 5 (Finite Automata) - GeeksforGeeks
http://www.slideshare.net/8neutron8/string-matching-with-finite-automataaho-corasick
Whenever you use a search engine, or a “find” function like sed or grep, you are utilizing a string matching program. Many of these programs create finite automata in order to effectively search for your string.
A finite state machine (FSM, also known as a deterministic finite automaton or DFA) is a way of representing a language
http://web.cs.mun.ca/~wang/courses/cs6783-13f/n2-string-1.pdf
The string-matching automaton is very efficient: it examines each character in the text exactly once and reports all the valid shifts in O(n) time.
http://en.literateprograms.org/Finite_automaton_string_search_algorithm_(Java)
http://www.slideshare.net/8neutron8/string-matching-with-finite-automataaho-corasick
Whenever you use a search engine, or a “find” function like sed or grep, you are utilizing a string matching program. Many of these programs create finite automata in order to effectively search for your string.
A finite state machine (FSM, also known as a deterministic finite automaton or DFA) is a way of representing a language
we represent the language as the set of those strings accepted by some program. So, once you've found the right machine, we can test whether a given string matches just by running it.
One particularly useful representation is a transition table: we make a table with rows indexed by states, and columns indexed by possible input characters
It takes something like O(m^3 + n) time:
O(m^3) to build the state table described above,
O(n) to simulate it on the input file.
http://web.cs.mun.ca/~wang/courses/cs6783-13f/n2-string-1.pdf
The string-matching automaton is very efficient: it examines each character in the text exactly once and reports all the valid shifts in O(n) time.
The basic idea is to build a automaton in which
• Each character in the pattern has a state.
• Each match sends the automaton into a new state.
• If all the characters in the pattern has been matched, the automaton enters the accepting state.
• Otherwise, the automaton will return to a suitable state according to the current state and the input character such that this returned state reflects the maximum advantage we can take from the previous matching.
• the matching takes O(n) time since each character is examined once.
The construction of the stringmatching automaton is based on the given pattern. The time of this construction may be O(m3 |S|).
The construction of string-matching automaton
http://en.literateprograms.org/Finite_automaton_string_search_algorithm_(Java)
Each state will have a transitions map which determines, given the current state and the next character in the input, which state we go to next.
To perform this in code, we visit states in the order we discover them. As transitions maps get filled in, more states may be added; we will also iterate through those new states as the
states.size()
increases. When the loop is done, all states will have their transition maps filled. We could not use an Iterator here because it would complain about concurrent modifications.
Performing the search
Once we have our machine built, the actual string matching step is very simple. We feed the string being searched into the machine, following the transition maps to move between states as instructed, until we reach a final state or use up the string: