http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/bmen.htm
Boyer and Moore compares the pattern with the text from right to left. If the text symbol that is compared with the rightmost pattern symbol does not occur in the pattern at all, then the pattern can be shifted by m positions behind this text symbol.
Good-suffix and bad-character shiftsBoyer and Moore compares the pattern with the text from right to left. If the text symbol that is compared with the rightmost pattern symbol does not occur in the pattern at all, then the pattern can be shifted by m positions behind this text symbol.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... |
---|---|---|---|---|---|---|---|---|---|---|
a | b | b | a | d | a | b | a | c | b | a |
b | a | b | a | c | ||||||
b | a | b | a | c |
Bad character heuristics
If a mismatch happens, the pattern can be shifted so that it is aligned to this text symbol.
Good suffix heuristics
An alignment of the rightmost occurence of the pattern symbol a with the text symbol a would produce a negative shift. Instead, a shift by 1 would be possible. However, in this case it is better to derive the maximum possible shift distance from the structure of the pattern.
The pattern is shifted by the longest of the two distances that are given by the bad character and the good suffix heuristics.
Preprocessing for the bad character heuristics
For the bad character heuristics a function occ is required which yields, for each symbol of the alphabet, the position of its rightmost occurrence in the pattern, or -1 if the symbol does not occur in the pattern.
occurrence = new int[BASE];
for (int c = 0; c < BASE; c++)
occurrence[c] = -1;
for (int j = 0; j < pattern.length(); j++)
occurrence[pattern.charAt(j)] = j;
Preprocessing for the good-suffix heuristics
For the good-suffix heuristics an array s is used. Each entry s[i] contains the shift distance of the pattern if a mismatch at positioni – 1 occurs
In order to determine the shift distance, two cases have to be considered.
Case 1: The matching suffix occurs somewhere else in the pattern (Figure 1).
Case 2: Only a part of the matching suffix occurs at the beginning of the pattern (Figure 2).
Java code http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/stringmatchingclasses/BmStringMatcher.java
http://www.fmi.uni-sofia.bg/fmi/logic/vboutchkova/sources/BoyerMoore_java.htmlhttp://algs4.cs.princeton.edu/53substring/BoyerMoore.java.html
http://www.fmi.uni-sofia.bg/fmi/logic/vboutchkova/sources/BoyerMoore_java.html
Also refer to http://java.dzone.com/articles/algorithm-week-boyer-moore
https://weblogs.java.net/blog/potty/archive/2012/05/21/string-searching-algorithms-part-iii
http://www.mathcs.emory.edu/~cheung/Courses/323/Syllabus/Text/Matching-Boyer-Moore1.html
Read full article from Boyer-Moore algorithm