http://coursera.cs.princeton.edu/algs4/assignments/boggle.html
The Boggle game. Boggle is a word game designed by Allan Turoff and distributed by Hasbro. It involves a board made up of 16 cubic dice, where each die has a letter printed on each of its sides. At the beginning of the game, the 16 dice are shaken and randomly distributed into a 4-by-4 tray, with only the top sides of the dice visible. The players compete to accumulate points by building valid words out of the dice according to the following rules:
The Qu special case. In the English language, the letter Q is almost always followed by the letter U. Consequently, the side of one die is printed with the two-letter sequence Qu instead of Q (and this two-letter sequence must be used together when forming words). When scoring, Qu counts as two letters; for example, the word QuEUE scores as a 5-letter word even though it is formed by following a sequence of 4 dice.
Your task. Your challenge is to write a Boggle solver that finds all valid words in a given Boggle board, using a given dictionary. Implement an immutable data type BoggleSolver with the following API:
The Boggle game. Boggle is a word game designed by Allan Turoff and distributed by Hasbro. It involves a board made up of 16 cubic dice, where each die has a letter printed on each of its sides. At the beginning of the game, the 16 dice are shaken and randomly distributed into a 4-by-4 tray, with only the top sides of the dice visible. The players compete to accumulate points by building valid words out of the dice according to the following rules:
- A valid word must be composed by following a sequence of adjacent dice—two dice are adjacent if they are horizontal, vertical, or diagonal neighbors.
- A valid word can use each die at most once.
- A valid word must contain at least 3 letters.
- A valid word must be in the dictionary (which typically does not contain proper nouns).
word length | points |
0–2 | 0 |
3–4 | 1 |
5 | 2 |
6 | 3 |
7 | 5 |
8+ | 11 |
The Qu special case. In the English language, the letter Q is almost always followed by the letter U. Consequently, the side of one die is printed with the two-letter sequence Qu instead of Q (and this two-letter sequence must be used together when forming words). When scoring, Qu counts as two letters; for example, the word QuEUE scores as a 5-letter word even though it is formed by following a sequence of 4 dice.
Your task. Your challenge is to write a Boggle solver that finds all valid words in a given Boggle board, using a given dictionary. Implement an immutable data type BoggleSolver with the following API:
public class BoggleSolver { // Initializes the data structure using the given array of strings as the dictionary. // (You can assume each word in the dictionary contains only the uppercase letters A through Z.) public BoggleSolver(String[] dictionary) // Returns the set of all valid words in the given Boggle board, as an Iterable. public Iterable<String> getAllValidWords(BoggleBoard board) // Returns the score of the given word if it is in the dictionary, zero otherwise. // (You can assume the word contains only the uppercase letters A through Z.) public int scoreOf(String word) }https://github.com/nastra/AlgorithmsPartII-Princeton/tree/master/src
https://github.com/nastra/AlgorithmsPartII-Princeton/blob/master/src/BoggleBoard.java
private BoggleTrieST<Integer> dict = new BoggleTrieST<>();
public BoggleSolver(String[] dictionary) {
for (String s : dictionary) {
dict.put(s, 1);
}
}
private static class BoggleTrieST<Value> {
private static final int R = 26; // A-Z letters
private static final int OFFSET = 65; // Offset of letter A in ASCII table
private Node root = new Node();
private static class Node {
private Object val;
private Node[] next = new Node[R];
}
public enum NodeType {
PREFIX, MATCH, NON_MATCH
}
/****************************************************
* Is the key in the symbol table?
****************************************************/
public boolean contains(String key) {
return get(key) != null;
}
public Value get(String key) {
Node x = get(root, key, 0);
if (x == null)
return null;
return (Value) x.val;
}
private Node get(Node x, String key, int d) {
if (x == null)
return null;
if (d == key.length())
return x;
char c = key.charAt(d);
return get(x.next[c - OFFSET], key, d + 1);
}
/****************************************************
* Insert key-value pair into the symbol table.
****************************************************/
public void put(String key, Value val) {
root = put(root, key, val, 0);
}
private Node put(Node x, String key, Value val, int d) {
if (x == null)
x = new Node();
if (d == key.length()) {
x.val = val;
return x;
}
char c = key.charAt(d);
x.next[c - OFFSET] = put(x.next[c - OFFSET], key, val, d + 1);
return x;
}
// find the key that is the longest prefix of s
public String longestPrefixOf(String query) {
int length = longestPrefixOf(root, query, 0, 0);
return query.substring(0, length);
}
// find the key in the subtrie rooted at x that is the longest
// prefix of the query string, starting at the dth character
private int longestPrefixOf(Node x, String query, int d, int length) {
if (x == null)
return length;
if (x.val != null)
length = d;
if (d == query.length())
return length;
char c = query.charAt(d);
return longestPrefixOf(x.next[c - OFFSET], query, d + 1, length);
}
public Iterable<String> keys() {
return keysWithPrefix("");
}
public Iterable<String> keysWithPrefix(String prefix) {
Queue<String> queue = new Queue<String>();
Node x = get(root, prefix, 0);
collect(x, prefix, queue);
return queue;
}
public boolean isPrefix(String prefix) {
return get(root, prefix, 0) != null;
}
public NodeType getNodeType(String key) {
Node x = get(root, key, 0);
if (x == null)
return NodeType.NON_MATCH;
else if (x.val == null)
return NodeType.PREFIX;
else
return NodeType.MATCH;
}
private void collect(Node x, String key, Queue<String> queue) {
if (x == null)
return;
if (x.val != null)
queue.enqueue(key);
for (int c = 0; c < R; c++)
collect(x.next[c - OFFSET], key + (char) c, queue);
}
public Iterable<String> keysThatMatch(String pat) {
Queue<String> q = new Queue<String>();
collect(root, "", pat, q);
return q;
}
public void collect(Node x, String prefix, String pat, Queue<String> q) {
if (x == null)
return;
if (prefix.length() == pat.length() && x.val != null)
q.enqueue(prefix);
if (prefix.length() == pat.length())
return;
char next = pat.charAt(prefix.length());
for (int c = 0; c < R; c++)
if (next == '.' || next == c)
collect(x.next[c - OFFSET], prefix + (char) c, pat, q);
}
public void delete(String key) {
root = delete(root, key, 0);
}
private Node delete(Node x, String key, int d) {
if (x == null)
return null;
if (d == key.length())
x.val = null;
else {
char c = key.charAt(d);
x.next[c - OFFSET] = delete(x.next[c - OFFSET], key, d + 1);
}
if (x.val != null)
return x;
for (int c = 0; c < R; c++)
if (x.next[c - OFFSET] != null)
return x;
return null;
}
}
public Iterable<String> getAllValidWords(BoggleBoard board) {
TreeSet<String> words = new TreeSet<>();
for (int i = 0; i < board.rows(); i++) {
for (int j = 0; j < board.cols(); j++) {
searchWords(board, i, j, words);
}
}
return words;
}
private void searchWords(BoggleBoard board, int i, int j, TreeSet<String> words) {
boolean[][] visited = new boolean[board.rows()][board.cols()];
dfs(board, i, j, words, visited, "");
}
private void dfs(BoggleBoard board, int i, int j, Set<String> words, boolean[][] visited, String prefix) {
if (visited[i][j]) {
return;
}
char letter = board.getLetter(i, j);
prefix = prefix + (letter == 'Q' ? "QU" : letter);
if (prefix.length() > 2 && dict.contains(prefix)) {
words.add(prefix);
}
if (!dict.isPrefix(prefix)) {
return;
}
visited[i][j] = true;
// do a DFS for all adjacent cells
if (i > 0) {
dfs(board, i - 1, j, words, visited, prefix);
if (j > 0) {
dfs(board, i - 1, j - 1, words, visited, prefix);
}
if (j < board.cols() - 1) {
dfs(board, i - 1, j + 1, words, visited, prefix);
}
}
if (j > 0) {
dfs(board, i, j - 1, words, visited, prefix);
}
if (j < board.cols() - 1) {
dfs(board, i, j + 1, words, visited, prefix);
}
if (i < board.rows() - 1) {
if (j > 0) {
dfs(board, i + 1, j - 1, words, visited, prefix);
}
if (j < board.cols() - 1) {
dfs(board, i + 1, j + 1, words, visited, prefix);
}
dfs(board, i + 1, j, words, visited, prefix);
}
visited[i][j] = false;
}
https://segmentfault.com/a/1190000005345079
本次任务相当于编写图示游戏的AI部分——BoggleSolver,而要拿到满分以至Bonus,需要一秒内解决成千上万个Boggle Board。
下面是我经历的迭代过程的简短介绍:(只选出了几个典型版本,括号内为autograder针对getAllValidWords()
的时间测试,值为5秒内调用次数的reference / student ratio,越小越快;满分要求小于2,Bonus要求小于0.5;源材料没有保留完整,数据可能稍有出入)
- 一版(~2.5):简单更改库中现成的TST(三叉搜索树)类(增加一个单行的PrefixQuery小函数),将Board预处理为charW*H,使用非递归方式的DFS(主要维护两个stack)遍历Board实现;
- 二版(~1.6):在前版基础上,改TST为手写实现的26-way Trie类,Board预处理为Bag<Integer>[W*H],所有函数都为非递归方式实现(如checklist所要求);
- 三版(~0.8):在前版基础上,改非递归方式DFS为递归,将Trie类内容直接写入BoggleSolver,在DFS过程中直接传递Node指针而非调用PrefixQuery函数;
- 终版(0.55):在前版基础上,全面整理了各步骤细节,cache中间变量,使用Bag而非HashSet存储查询结果,及(参考论坛讨论后)其他细节上的技巧性优化。
- 开始时其实比较抗拒自己实现一些工具类,如TST或Trie,除了必要的功能外很不愿意改动它们;但当完成前期版本,开始寻求性能优化,实实在在地了解了这些工具类运作机理后,才发现最大的障碍其实来自这些工具类“为通用性而做出的性能上的牺牲”,于是义无反顾地自己手动实现了满足“最低需求”的版本,但因细节尽在自己的掌握,使得在DFS中传递Node指针这样最直接有效的实现方式变得可能,自然,也得到了对应的回报;
- checklist也不是圣经,有独立思考问题的意识才可能发现更大的世界:在实现非递归版DFS时明显感到比较吃力,同时需要追踪维护很多变量,而DFS的逻辑本身也更适合递归;非递归版DFS全当练手,而能够转回递归实现版本则是大胆独立思考的结果;
- 有同学使用多线程达到了非常快的成绩(0.24),但在算法课程作业这个意义上,多线程其实尚处于“灰色区域”,有作弊的嫌疑——况且也没什么技术含量,无须在意。
By searching neighbours of neighbours we end up with an exponential algorithm, and we don’t stop trying new letters on the end of the word until we run out of neighbours to try. So, the worst case word candidates will use every letter on the board, in every possible arrangement. It’s very unlikely that any of these candidates will be real words. It’s not even likely that any word longer than 10 or so letters will be valid. Appending a new letter to “AAPYNU” isn’t likely to make anything better.The real problem here is that the exponential search space through the board is immensely larger than the number of valid words in the dictionary. The brute force approach on a 4×4 grid of letters evaluates 12 million distinct paths, but there are only ~264,000 words in the test dictionary. The vast majority of all those distinct paths are going to be a complete waste of time.Earlier I mentioned that we end up tracing through words that start with a prefix that no real word starts with. If we can prune the evaluation tree as soon as we hit such a prefix, we can save a lot of time. But having our set of dictionary words in a hash table doesn’t let us do that kind of a prefix search. We need a data structure better suited to this task – a trie.The basic trie approach I implemented searches the dictionary for each possible word from the root of the trie each time. So in the above example, if the solver is evaluating the letters “TE”, it searches from the root, through each letter. Then we check “TEA”, and it searches again starting from the first letter. We already know “TE” is a valid prefix, we only need to search the next letter we’re evaluating.Approach #3 – stepping through the board and trie in sync
Instead of checking if the whole current word we’re looking at is a valid prefix each time we add a new letter, we can keep track of where we’re up to in the trie, and check if the next letter on the board is a valid next letter in the dictionary from where we are.No amount of micro optimisations on the wrong implementation are going to get anywhere near a better algorithm here. A profiler can show you where your code is taking lots of time, but it can’t look at your algorithm and tell you that you’re approaching the problem the wrong way.