Design a search autocomplete system for a search engine. Users may input a sentence (at least one word and end with a special character
'#'
). For each character they type except '#', you need to return the top 3 historical hot sentences that have prefix the same as the part of sentence already typed. Here are the specific rules:- The hot degree for a sentence is defined as the number of times a user typed the exactly same sentence before.
- The returned top 3 hot sentences should be sorted by hot degree (The first is the hottest one). If several sentences have the same degree of hot, you need to use ASCII-code order (smaller one appears first).
- If less than 3 hot sentences exist, then just return as many as you can.
- When the input is a special character, it means the sentence ends, and in this case, you need to return an empty list.
Your job is to implement the following functions:
The constructor function:
AutocompleteSystem(String[] sentences, int[] times):
This is the constructor. The input is historical data. Sentences
is a string array consists of previously typed sentences. Times
is the corresponding times a sentence has been typed. Your system should record these historical data.
Now, the user wants to input a new sentence. The following function will provide the next character the user types:
List<String> input(char c):
The input c
is the next character typed by the user. The character will only be lower-case letters ('a'
to 'z'
), blank space (' '
) or a special character ('#'
). Also, the previously typed sentence should be recorded in your system. The output will be the top 3 historical hot sentences that have prefix the same as the part of sentence already typed.
Example:
Operation: AutocompleteSystem(["i love you", "island","ironman", "i love leetcode"], [5,3,2,2])
The system have already tracked down the following sentences and their corresponding times:
Now, the user begins another search:
Operation: input('i')
Output: ["i love you", "island","i love leetcode"]
Explanation:
There are four sentences that have prefix
Operation: input(' ')
Output: ["i love you","i love leetcode"]
Explanation:
There are only two sentences that have prefix
Operation: input('a')
Output: []
Explanation:
There are no sentences that have prefix
Operation: input('#')
Output: []
Explanation:
The user finished the input, the sentence
Operation: AutocompleteSystem(["i love you", "island","ironman", "i love leetcode"], [5,3,2,2])
The system have already tracked down the following sentences and their corresponding times:
"i love you"
: 5
times"island"
: 3
times"ironman"
: 2
times"i love leetcode"
: 2
timesNow, the user begins another search:
Operation: input('i')
Output: ["i love you", "island","i love leetcode"]
Explanation:
There are four sentences that have prefix
"i"
. Among them, "ironman" and "i love leetcode" have same hot degree. Since ' '
has ASCII code 32 and 'r'
has ASCII code 114, "i love leetcode" should be in front of "ironman". Also we only need to output top 3 hot sentences, so "ironman" will be ignored.Operation: input(' ')
Output: ["i love you","i love leetcode"]
Explanation:
There are only two sentences that have prefix
"i "
.Operation: input('a')
Output: []
Explanation:
There are no sentences that have prefix
"i a"
.Operation: input('#')
Output: []
Explanation:
The user finished the input, the sentence
"i a"
should be saved as a historical sentence in system. And the following input will be counted as a new search.
Note:
- The input sentence will always start with a letter and end with '#', and only one blank space will exist between two words.
- The number of complete sentences that to be searched won't exceed 100. The length of each sentence including those in the historical data won't exceed 100.
- Please use double-quote instead of single-quote when you write test cases even for a character input.
- Please remember to RESET your class variables declared in class AutocompleteSystem, as static/class variables are persisted across multiple test cases. Please see here for more details.
题目大意:
设计搜索自动完成系统
包含如下两个方法:
构造方法:
AutocompleteSystem(String[] sentences, int[] times): 输入句子sentences,及其出现次数times
输入方法:
List<String> input(char c): 输入字符c可以是26个小写英文字母,也可以是空格,以'#'结尾。返回输入字符前缀对应频率最高的至多3个句子,频率相等时按字典序排列。
解题思路:
Trie(字典树)
利用字典树记录所有出现过的句子集合,利用字典保存每个句子出现的次数。
http://hehejun.blogspot.com/2018/12/leetcodedesign-search-autocomplete.html
https://www.jianshu.com/p/6a78f73a57be
题目很长,但不太难,用Trie即可解决,不过需要注意:
- 在input中,需要边读边写,但注意一定是先读再写,否则会查出刚插进去的sentence
- 遇到'#'时,要返回empty list
- 由于要返回Trie中存储的sentences,一种做法是给Trie添加一个成员变量str用来存储root到curr路径上形成的str,另外一种做法是在查询时将str作为参数传入child。本题目采用的是后面的思路。
- 在dfs trie时,一定要现将root添加到结果集,再遍历children!否则会漏掉栈底的root。这点很重要,Trie的查询都要这么写。
- 在getKHot时,可以用PriorityQueue来做,也可以用List来做然后排序。
class AutocompleteSystem {
private Trie root;
private Trie curr;
private String str; // store currently visiting str
public AutocompleteSystem(String[] sentences, int[] times) {
root = new Trie();
for (int i = 0; i < sentences.length; ++i) {
insert(root, sentences[i], times[i]);
}
this.curr = root;
this.str = "";
}
public List<String> input(char c) {
if (c == '#') {
insert(root, str, 1);
curr = root;
str = "";
return Collections.EMPTY_LIST; // return empty as designed
}
int i = getIndex(c);
if (curr.children[i] == null) {
curr.children[i] = new Trie();
}
str += c;
curr = curr.children[i];
return getKHot(curr, str, 3);
}
private void insert(Trie root, String s, int plusTimes) {
for (int i = 0; i < s.length(); ++i) {
int j = getIndex(s.charAt(i));
if (root.children[j] == null) {
root.children[j] = new Trie();
}
root = root.children[j];
}
root.times += plusTimes; // accumulate in case duplicate sentences
}
private List<String> getKHot(Trie root, String s, int k) {
List<Pair> list = new ArrayList<>();
dfs(root, s, list);
Collections.sort(list, (a, b)
-> (b.times != a.times
? b.times - a.times : a.str.compareTo(b.str)));
List<String> res = new ArrayList<>();
for (int i = 0; i < Math.min(k, list.size()); ++i) {
res.add(list.get(i).str);
}
return res;
}
private void dfs(Trie root, String s, List<Pair> list) {
if (root.times > 0) { // add root first
list.add(new Pair(s, root.times));
}
for (char c = 'a'; c <= 'z'; ++c) {
int i = getIndex(c);
if (root.children[i] != null) {
dfs(root.children[i], s + c, list);
}
}
if (root.children[26] != null) {
dfs(root.children[26], s + ' ', list);
}
}
private int getIndex(char c) {
return c == ' ' ? 26 : c - 'a';
}
class Pair {
String str;
int times;
public Pair(String s, int t) {
str = s;
times = t;
}
}
class Trie {
Trie[] children;
int times;
public Trie() {
children = new Trie[27];
}
}
}
可以用trie树来解决这个问题。
由于要返回前3个搜索次数最多的句子,我们可以用priority_queue来存储所返回的所有的句子和它的次数的键值对。
首先构造trie tree,主要为trieNode的结构以及insert 方法。
构造完trieNode类后, 这个系统实际上主要为一个巨大的trietree,我们需要一个树的根节点。
由于我们每次都要输入一个字符,我们可以用一个私有的Node:curNode来追踪当前我们节点。
curNode初始化为root,在每次输入完一个句子时,即输入的字符为‘#’时,我们需要将其置为root
同时需要一个string类型stn来表示当前的搜索的句子。
需要注意的是我们priority_queue中存储的为pair<string,int>我们需要给它重写比较器。
所以我们每输入一个字符,首先检查是不是结尾标识“#”,如果是的话,将当前句子加入trie树,重置相关变量,返回空数组。
如不是,检查当前TrieNode对应的child是否含有c的对应节点。如果没有,将curNode置为NULL并且返回空数组。
若存在,将curNode 更新为c对应的节点,并且对curNode进行dfs。
dfs时,我们首先检查当前是不是一个完整的句子,如果是,将句子与其次数同时加入priority_queue中,然后对其child中可能存在的子节点进行dfs。
进行完dfs后,我们需要取出前三个,需要注意的是,可能可选择的结果不满3个,所以要在while中多加入检测q为空的条件语句。
最后要将q中的所有元素都弹出。
tags: Design, Trie, Hash Table, MinHeap, PriorityQueue
time: input: O(x), where x = possible words, constructor: O(mn) m = max length, n = # of words
space: O(n^2), n = # of possible words, n = # of trie levels; mainlay saving the `Map<S, freq>`
Description is long, but in short: 做 search auto complete.
Best problem to review Trie (prefix search), Top K frequent elements (Hash Map), and MinHeap (PriorityQueue)
Easier to revisit https://leetcode.com/problems/design-search-autocomplete-system/description/
#### 思考方向
- 做text的search, 毋庸置疑要用Prefix tree, trie.
##### Find all possible word/leaf, 两种方案:
- Trie造好之后, 做prefix search, 然后DFS/BFS return all leaf items. [high runtime complexity]
- 在TrieNode里面存所有的possible words. [high space usage]
- in memory space 应该不是 大问题, 所以我们可以选择 store all possible words
##### Given k words, find top k frequent items. 肯定用MinHeap, 但也有两种方案:
- Store MinHeap with TrieNode: 因为会不断搜索新此条, 同样的prefix (尤其是在higher level), 会被多次搜索.
- [ complexity: need to update heaps across all visited TrieNodes once new sentence is completed]
- Compute MinHeap on the fly: 当然我们不能每次都来一个DFS 不然会很慢, 所以就必须要store list of possible candidates in TrieNode.
- 这里就用到了`Top K Frequent Words` 里面的 `Map<String, freq>`, 这样O(m) 构建 min-heap其实很方便.
##### Train the system
- 每次 `#` 后 标记一个词条被add进search history. 那么就要 `insert it into trie`.
- 这一条在最后遇到`#`再做就可以了, 非常简洁
#### Trie, PriorityQueue, HashMap
- Trie Prefix Search + maintain top k frequent items
class TrieNode {
public boolean isEnd;
public Map<String, Integer> freq;
public Map<Character, TrieNode> children; // Map is more applicable to all chars, not
// limited to 256 ASCII
public TrieNode() {
this.freq = new HashMap<>();
this.children = new HashMap<>();
}
}
class Pair {
String s;
int count;
public Pair(String s, int count) {
this.s = s;
this.count = count;
}
}
TrieNode root, curr;
StringBuffer sb;
public AutocompleteSystem(String[] sentences, int[] times) {
if (sentences == null || times == null || sentences.length != times.length)
return;
reset();
root = new TrieNode();
for (int i = 0; i < times.length; i++) {
insert(sentences[i], times[i]);
}
}
public List<String> input(char c) {
List<String> rst = new ArrayList<>();
if (curr == null)
curr = root;
if (c == '#') { // save sentence and reset state
insert(sb.toString(), 1);
reset();
return rst;
}
// Update global variable (curr TrieNode and string buffer); or append new character if not
// exist.
sb.append(c);
curr.children.putIfAbsent(c, new TrieNode());
curr = curr.children.get(c);
// MinHeap to find top 3.
rst.addAll(findTopK(curr, 3));
return rst;
}
private List<String> findTopK(TrieNode node, int k) {
List<String> rst = new ArrayList<>();
if (node.freq.isEmpty())
return rst;
PriorityQueue<Pair> queue = new PriorityQueue<>(
(a, b) -> a.count == b.count ? b.s.compareTo(a.s) : a.count - b.count);
for (Map.Entry<String, Integer> entry : node.freq.entrySet()) {
if (queue.size() < 3 || entry.getValue() >= queue.peek().count) {
queue.offer(new Pair(entry.getKey(), entry.getValue()));
}
if (queue.size() > 3)
queue.poll();
}
while (!queue.isEmpty()) {
rst.add(0, queue.poll().s);
}
return rst;
}
private void reset() {
curr = null;
sb = new StringBuffer();
}
private void insert(String sentence, int count) {
if (sentence == null || sentence.length() == 0)
return;
TrieNode node = root;
for (char c : sentence.toCharArray()) {
node.children.putIfAbsent(c, new TrieNode());
node = node.children.get(c);
node.freq.put(sentence, node.freq.getOrDefault(sentence, 0) + count);
}
node.isEnd = true; // can set word to node as well, if needed
}
https://discuss.leetcode.com/topic/96150/java-solution-trie-and-priorityqueuehttps://flycode.co/archives/70567
class TrieNode {
Map<Character, TrieNode> children;
Map<String, Integer> counts;
boolean isWord;
public TrieNode() {
children = new HashMap<Character, TrieNode>();
counts = new HashMap<String, Integer>();
isWord = false;
}
}
class Pair {
String s;
int c;
public Pair(String s, int c) {
this.s = s; this.c = c;
}
}
TrieNode root;
String prefix;
public AutocompleteSystem(String[] sentences, int[] times) {
root = new TrieNode();
prefix = "";
for (int i = 0; i < sentences.length; i++) {
add(sentences[i], times[i]);
}
}
private void add(String s, int count) {
TrieNode curr = root;
for (char c : s.toCharArray()) {
TrieNode next = curr.children.get(c);
if (next == null) {
next = new TrieNode();
curr.children.put(c, next);
}
curr = next;
curr.counts.put(s, curr.counts.getOrDefault(s, 0) + count);
}
curr.isWord = true;
}
public List<String> input(char c) {
if (c == '#') {
add(prefix, 1);
prefix = "";
return new ArrayList<String>();
}
prefix = prefix + c;
TrieNode curr = root;
for (char cc : prefix.toCharArray()) {
TrieNode next = curr.children.get(cc);
if (next == null) {
return new ArrayList<String>();
}
curr = next;
}
PriorityQueue<Pair> pq = new PriorityQueue<>((a, b) -> (a.c == b.c ? a.s.compareTo(b.s) : b.c - a.c));
for (String s : curr.counts.keySet()) {
pq.add(new Pair(s, curr.counts.get(s)));
}
List<String> res = new ArrayList<String>();
for (int i = 0; i < 3 && !pq.isEmpty(); i++) {
res.add(pq.poll().s);
}
return res;
}
X. http://www.cnblogs.com/grandyang/p/7897166.html
AutocompleteSystem(vector<string> sentences, vector<int> times) { for (int i = 0; i < sentences.size(); ++i) { freq[sentences[i]] += times[i]; } data = ""; } vector<string> input(char c) { if (c == '#') { ++freq[data]; data = ""; return {}; } data.push_back(c); auto cmp = [](pair<string, int>& a, pair<string, int>& b) { return a.second > b.second || (a.second == b.second && a.first < b.first); }; priority_queue<pair<string, int>, vector<pair<string, int>>, decltype(cmp) > q(cmp); for (auto f : freq) { bool matched = true; for (int i = 0; i < data.size(); ++i) { if (data[i] != f.first[i]) { matched = false; break; } } if (matched) { q.push(f); if (q.size() > 3) q.pop(); } } vector<string> res(q.size()); for (int i = q.size() - 1; i >= 0; --i) { res[i] = q.top().first; q.pop(); } return res; } private: unordered_map<string, int> freq; string data;
思路:直接用能排序的TreeMap,因为可能出现计数值相等的情况(在这个情况下根据String的字典顺序),所以key用一个封装的对象Node,但是对象除非重写equals方法,不然在map里面找不到,所以额外用一个HashMap来存储String到Node的映射,而且因为这里String用的是搜索的内容,所以唯一
public class AutocompleteSystem {
class Node {
int cnt;
String content;
private Node(int cnt, String content) {
this.cnt = cnt;
this.content = content;
}
}
StringBuilder sb = new StringBuilder();
TreeMap<Node, String> map = new TreeMap<Node, String>(new Comparator<Node>() {
@Override
public int compare(Node o1, Node o2) {
if (o1.cnt == o2.cnt)
return o1.content.compareTo(o2.content);
return o2.cnt - o1.cnt;
}
});
Map<String, Node> m = new HashMap<String, Node>();
public AutocompleteSystem(String[] sentences, int[] times) {
for (int i = 0; i < sentences.length; i++) {
Node t = new Node(times[i], sentences[i]);
map.put(t, sentences[i]);
m.put(sentences[i], t);
}
}
public List<String> input(char c) {
if (c == '#') {
String s = sb.toString();
if (m.containsKey(s)) {
Node t = m.get(s);
map.remove(t);
t.cnt = t.cnt + 1;
map.put(t, t.content);
} else {
Node t = new Node(1, s);
m.put(s, t);
map.put(t, s);
}
sb = new StringBuilder();
return new ArrayList<String>();
}
if (c != '#')
sb.append(c);
List<String> ret = new ArrayList<String>();
String prefix = sb.toString();
for (Node k : map.keySet()) {
if (k.content.startsWith(prefix)) {
ret.add(k.content);
if (ret.size() == 3)
break;
}
}
return ret;
}
}