https://leetcode.com/problems/top-k-frequent-words/solution/
Given a non-empty list of words, return the k most frequent elements.
Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first.
Example 1:
Input: ["i", "love", "leetcode", "i", "love", "coding"], k = 2 Output: ["i", "love"] Explanation: "i" and "love" are the two most frequent words. Note that "i" comes before "love" due to a lower alphabetical order.
Example 2:
Input: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4 Output: ["the", "is", "sunny", "day"] Explanation: "the", "is", "sunny" and "day" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively.
Note:
- You may assume k is always valid, 1 ≤ k ≤ number of unique elements.
- Input words contain only lowercase letters.
Follow up:
- Try to solve it in O(n log k) time and O(n) extra space.
X. https://leetcode.com/problems/top-k-frequent-words/discuss/108399/Java-O(n)-solution-using-HashMap-BucketSort-and-Trie-22ms-Beat-81
X.
This problem is quite similar to the problem Top K Frequent Elements. You can refer to this post for the solution of the problem.
We can solve this problem with the similar idea:
Firstly, we need to calculate the frequency of each word and store the result in a hashmap.
Firstly, we need to calculate the frequency of each word and store the result in a hashmap.
Secondly, we will use bucket sort to store words. Why? Because the minimum frequency is greater than or equal to 1 and the maximum frequency is less than or equal to the length of the input string array.
Thirdly, we can define a trie within each bucket to store all the words with the same frequency. With Trie, it ensures that the lower alphabetical word will be met first, saving the trouble to sort the words within the bucket.
From the above analysis, we can see the time complexity is O(n).
Here is my code:
Here is my code:
public List<String> topKFrequent(String[] words, int k) {
// calculate frequency of each word
Map<String, Integer> freqMap = new HashMap<>();
for(String word : words) {
freqMap.put(word, freqMap.getOrDefault(word, 0) + 1);
}
// build the buckets
TrieNode[] count = new TrieNode[words.length + 1];
for(String word : freqMap.keySet()) {
int freq = freqMap.get(word);
if(count[freq] == null) {
count[freq] = new TrieNode();
}
addWord(count[freq], word);
}
// get k frequent words
List<String> list = new LinkedList<>();
for(int f = count.length - 1; f >= 1 && list.size() < k; f--) {
if(count[f] == null) continue;
getWords(count[f], list, k);
}
return list;
}
private void getWords(TrieNode node, List<String> list, int k) {
if(node == null) return;
if(node.word != null) {
list.add(node.word);
}
if(list.size() == k) return;
for(int i = 0; i < 26; i++) {
if(node.next[i] != null) {
getWords(node.next[i], list, k);
}
}
}
private boolean addWord(TrieNode root, String word) {
TrieNode curr = root;
for(char c : word.toCharArray()) {
if(curr.next[c - 'a'] == null) {
curr.next[c - 'a'] = new TrieNode();
}
curr = curr.next[c - 'a'];
}
curr.word = word;
return true;
}
class TrieNode {
TrieNode[] next;
String word;
TrieNode() {
this.next = new TrieNode[26];
this.word = null;
}
}
X.
- Time Complexity: , where is the length of
words
. We count the frequency of each word in time, then we add words to the heap, each in time. Finally, we pop from the heap up to times. As , this is in total.
In Python, we improve this to : our
heapq.heapify
operation and counting operations are , and each of heapq.heappop
operations are .- Space Complexity: , the space used to store our
count
.
public List<String> topKFrequent(String[] words, int k) { Map<String, Integer> count = new HashMap(); for (String word: words) { count.put(word, count.getOrDefault(word, 0) + 1); } PriorityQueue<String> heap = new PriorityQueue<String>( (w1, w2) -> count.get(w1) != count.get(w2) ? count.get(w1) - count.get(w2) : w2.compareTo(w1) ); for (String word: count.keySet()) { heap.offer(word); if (heap.size() > k) heap.poll(); } List<String> ans = new ArrayList(); while (!heap.isEmpty()) ans.add(heap.poll()); Collections.reverse(ans); return ans; }
- Time Complexity: , where is the length of
words
. We count the frequency of each word in time, then we sort the given words in time. - Space Complexity: , the space used to store our
candidates
.
public List<String> topKFrequent(String[] words, int k) { Map<String, Integer> count = new HashMap(); for (String word: words) { count.put(word, count.getOrDefault(word, 0) + 1); } List<String> candidates = new ArrayList(count.keySet()); Collections.sort(candidates, (w1, w2) -> count.get(w1) != count.get(w2) ? count.get(w2) - count.get(w1) : w1.compareTo(w2)); return candidates.subList(0, k);