## Monday, August 8, 2016

### Google – Plate and Dictionary

http://chuansong.me/n/449596249789

(1) 如果dictionary里有上百万个字，该如何加速
(2) 如果dictionary有上百万个字，然后給你上千个车牌号码，要你回传相对应的最短字串，该如何optimize?

[Solution]
[Solution #1]
Brute Force很简单，拿plate和dictionary里的单词挨个比就好。
[Solution #2]

query time为O(len + LEN), where len is the length of plate. Worst case的情况出现于dictionary里面所有单词都包含车牌里的英语字母。
[Solution #3]

1. 把26个字母分别map到一个 prime number.
2. 把每个单词对应的字母的质数乘起来，得到一个product，然后对dictionary建String -> product的map.
3. Query的时候用同样方法计算plate的product，扫一遍dict, 谁能除得进plate的product，就说明包含Plate中的英文字母。

`  ``public` `String plateAndDict(String[] dict, String plate) {`
`    ``Map<Character, Integer> primeMap = ``new` `HashMap<>();`
`    ``initMap(primeMap);`
`    ``plate = plate.toLowerCase();`
`    ``long` `platePrime = 1l;`
`    ``for` `(``char` `c : plate.toCharArray()) {`
`      ``if` `(primeMap.containsKey(c)) {`
`        ``platePrime *= primeMap.get(c);`
`      ``}`
`    ``}`
`    ``String result = ``null``;`
`    ``for` `(String word : dict) {`
`      ``String lowerWord = word.toLowerCase();`
`      ``long` `wordPrime = 1l;`
`      ``for` `(``char` `c : lowerWord.toCharArray()) {`
`        ``if` `(primeMap.containsKey(c)) {`
`          ``wordPrime *= primeMap.get(c);`
`        ``}`
`      ``}`
`      ``if` `(wordPrime % platePrime == ``0` `&& (result == ``null` `|| word.length() < result.length())) {`
`        ``result = word;`
`      ``}`
`    ``}`
`    ``return` `result;`
`  ``}`
`  ``private` `void` `initMap(Map<Character, Integer> primeMap) {`
`    ``primeMap.put(``'a'``, ``2``);`
`    ``int` `prev = ``2``;`
`    ``for` `(``char` `i = ``'b'``; i <= ``'z'``; i++) {`
`      ``int` `curr = nextPrime(prev);`
`      ``primeMap.put(i, curr);`
`      ``prev = curr;`
`    ``}`
`  ``}`
`  ``private` `int` `nextPrime(``int` `n) {`
`    ``int` `result = n + ``1``;`
`    ``while` `(result % ``2` `== ``0` `|| !isPrime(result)) {`
`      ``result++;`
`    ``}`
`    ``return` `result;`
`  ``}`
`  ``private` `boolean` `isPrime(``int` `n) {`
`    ``for` `(``int` `i = ``2``; i * i <= n; i++) {`
`      ``if` `(n % i == ``0``) {`
`        ``return` `false``;`
`      ``}`
`    ``}`
`    ``return` `true``;`
`  ``}`

“AC1234R” => CAR, ARC | CART, CARED not the shortest
OR4567S” => SORT, SORE | SORTED valid, not the shortest | OR is not valid, missing S

1 <= letters <= 7
O(4M) words in the dictionary

d1 = abc   000....111
d2 = bcd   000…..110
s1 = ac1234r  00001...101
s1 & d1 == s1 d1
s1 & d2 != s1

`````` int convert2Num(string s) {
int res = 0;
for (int i = 0; i < s.size(); ++i) {
if (!isdigit(s[i]) {
res |= 1 << (s[i] - ‘a’);
}
}
return res;
}

string find_shortest_word(string license, vector<string> words) {
string res;
for (int i = 0; i < words.size(); ++i) {
int word_num = convert2Num(words[i]); // conversion;
if (lis_num & word_num == lis_num) {
if (res.empty() || res.size() > words[i].size())
res = words[i]; // brute force; 可以sorting来减少比较次数
}
}
return res;
}
``````
<key, conversion number》
<words, save shortest length of words>
abc, abccc,   <000...111, abc>

dictionary = { “BAZ”, “FIZZ”, “BUZZ” } | BAZ only has one Z

vector<int> map(26, 0);
a -> 0,   z -> 25;
map[s-’a’]++;
need[26];
need[i] < map[i]

`````` bool compare(const string& s1, const string& s2) {
return s1.size() < s2.size();
}

string find_shortest_string(string license, vector<string> words) {
sort(word.begin(), word.end(), compare);
vector<int> map(26, 0);
for (auto i : license) {
if (!isdigit(i)) map[i-’a’]++;
}

for (int i = 0; i < words.size(); ++i) {
int word_num = convert2Num(words[i]);
// First check that it has all the letters, then check the counts
if (lic_num & word_num != lic_num) continue;

vector<int> word_map(26, 0);
for (auto k : words[i]) {
if (!isdigit(k)) map[k-’a’]++;
}
int j = 0;
for (; j < 26; ++j) {
if (word_map[‘a’+j] < map[‘a’+j]) break;
}
if (j == 26) return words[i];
}
}
``````

find_shortest_word(“12345ZZ”, dictionary) 50% => “FIZZ” | 50% => “BUZZ”
vector<string> s = {};
s[rand() % 2];

reservoir sampling;
FIZZ: s = words;
count = 1;
BUZZ: count++; count = 2;
rand() % count == 0 : s = BUZZ;