## Friday, July 8, 2016

### Re-Space - Cracking Coding Interview

Oh, no! You have accidentally removed all spaces, punctuation, and capitalization in a
lengthy document. A sentence like "I reset the c omputer. It still didn't boot!"
became "iresetthec omputeri tstilldidntboot''. You'll deal with the punctuation and capitalization
later; right now you need to re-insert the spaces. Most of the words are in a dictionary but
a few are not. Given a dictionary (a list of strings) and the document (a string), design an algorithm
to unconcatenate the document in a way that minimizes the number of unrecognized characters.
EXAMPLE
Input jesslookedjustliketimherbrother

Output: jess looked just like tim her brother (7 unrecognized characters)

http://www.cnblogs.com/EdwardLiu/p/4356967.html

int unrecogNum = dict.contains(s.substring(j, i))? 0 : i-j; //看index从j到i-1的substring在不在dictionary里，如果不在，unrecogNum=j到i-1的char数
res[i] = Math.min(res[i], res[j]+unrecogNum);
``` 9         if (s==null || s.length()==0) return 0;
10         int len = s.length();
11         if (dict.isEmpty()) return len;
12         int[] res = new int[len+1]; // res[i] refers to minimized # of unrecognized chars in first i chars
13         Arrays.fill(res, Integer.MAX_VALUE);
14         res[0] = 0;
15         for (int i=1; i<=len; i++) {
16             for (int j=0; j<i; j++) {
17                 String str = s.substring(j, i);
18                 int unrecogNum = dict.contains(str)? 0 : i-j;
19                 res[i] = Math.min(res[i], res[j]+unrecogNum);
20             }
21         }
22         return res[len];
23     }```
http://www.shuati123.com/blog/2014/10/01/optimal-unconcatenate-doc/
``````public static int parse(String doc, Trie dict) {
int len = doc.length();
int[] dp = new int[len + 1];
// dp[i] denotes the number of special chars in first i chars of docs
for (int i = 1; i <= len; i++) {
dp[i] =  Integer.MAX_VALUE;
for (int j = 0; j < i; j++) {
String str = doc.substring(j, i);
if (dict.contains(str, true)) {
// consider (i to j) a valid word
dp[i] = Math.min(dp[i], dp[j]);
} else {
// consider (i to j) special chars
dp[i] = Math.min(dp[i], dp[j] + i - j);
}
}
}
return dp[len];
}
``````

http://www.cnblogs.com/grandyang/p/5448539.html
1. 一些递归重复计算了，我们可以使用哈希表来保存之前计算好的最优拆分了，那么在之后的递归中遇到了直接取出来用就行了。
2. 在有些情况下我们可以预测一种拆分方式是否会生成无效单词，比如当我们拆分单词xten，没有单词是以xt开头的，但是当前的算法会计算xt+p(en)，xten+p(n)和xten。每一次我们都会发现这样的单词不存在，这样我们直接在x后面加一个空格就可以了，那么我们怎么样知道没有单词是以xt开始的呢，用前缀树trie，参见parseOptimized()函数。

O(N^2)
public static String bestSplit(HashSet<String> dictionary, String sentence) {
ParseResult[] memo = new ParseResult[sentence.length()];
ParseResult r = split(dictionary, sentence, 0, memo);
return r == null ? null : r.parsed;
}

public static ParseResult split(HashSet<String> dictionary, String sentence, int start, ParseResult[] memo) {
if (start >= sentence.length()) {
return new ParseResult(0, "");
} if (memo[start] != null) {
return memo[start];
}

int bestInvalid = Integer.MAX_VALUE;
String bestParsing = null;

String partial = "";
int index = start;
while (index < sentence.length()) {
char c = sentence.charAt(index);
partial += c;
int invalid = dictionary.contains(partial) ? 0 :
partial.length();
if (invalid < bestInvalid) { // Short circuit
/* Recurse, putting a space after this character. If this
* is better than the current best option, replace the best
* option. */
ParseResult result = split(dictionary, sentence, index + 1, memo);
if (invalid + result.invalid < bestInvalid) {
bestInvalid = invalid + result.invalid;
bestParsing = partial + " " + result.parsed;
if (bestInvalid == 0) break; // Short circuit
}
}

index++;
}
memo[start] = new ParseResult(bestInvalid, bestParsing);
return memo[start];
}

public static String clean(String str) {
char[] punctuation = {',', '"', '!', '.', '\'', '?', ','};
for (char c : punctuation) {
str = str.replace(c, ' ');
}
return str.replace(" ", "").toLowerCase();
}

Brute Force:
public static String bestSplit(HashSet<String> dictionary, String sentence) {
ParseResult r = split(dictionary, sentence, 0);
return r == null ? null : r.parsed;
}

public static ParseResult split(HashSet<String> dictionary, String sentence, int start) {
if (start >= sentence.length()) {
return new ParseResult(0, "");
}

int bestInvalid = Integer.MAX_VALUE;
String bestParsing = null;

String partial = "";
int index = start;
while (index < sentence.length()) {
char c = sentence.charAt(index);
partial += c;
int invalid = dictionary.contains(partial) ? 0 :
partial.length();
if (invalid < bestInvalid) { // Short circuit
/* Recurse, putting a space after this character. If this
* is better than the current best option, replace the best
* option. */
ParseResult result = split(dictionary, sentence, index + 1);
if (invalid + result.invalid < bestInvalid) {
bestInvalid = invalid + result.invalid;
bestParsing = partial + " " + result.parsed;
if (bestInvalid == 0) break;
}
}

index++;
}
return new ParseResult(bestInvalid, bestParsing);
}

public static String clean(String str) {
char[] punctuation = {',', '"', '!', '.', '\'', '?', ','};
for (char c : punctuation) {
str = str.replace(c, ' ');
}
return str.replace(" ", "").toLowerCase();
}