## Monday, April 4, 2016

### LeetCode 340 - Longest Substring with At Most K Distinct Characters

http://www.cnblogs.com/grandyang/p/5351347.html
Given a string, find the length of the longest substring T that contains at most k distinct characters.
For example, Given s = `“eceba”` and k = 2,
T is "ece" which its length is 3.
https://leetcode.com/discuss/95558/15-lines-java-solution-using-slide-window
One optimization, update the res only when valid substring beginning at i reaches the longest:
• Put update of res, i.e. res = Math.max(res, j - i + 1), into the if statement, just above the while loop. Replace it by res = Math.max(res, j - i). This is the place where a valid substring grows to the greatest.
• Put another update for res before return statement, i.e. res = Math.max(s.length() - i, res).
This optimization is very general, it avoids unnecessary updates.

public int lengthOfLongestSubstringKDistinct(String s, int k) { int[] count = new int[256]; int num = 0, i = 0, res = 0; for (int j = 0; j < s.length(); j++) { if (count[s.charAt(j)]++ == 0) num++; if (num > k) { while (--count[s.charAt(i++)] > 0);//?? num--; } res = Math.max(res, j - i + 1); } return res; }
http://buttercola.blogspot.com/2016/06/leetcode-340-longest-substring-with-at.html
The problem is very similar to the Leetcode question 3 (Longest Substring Without Repeating Characters).

The key idea of the solution is to use two pointers to maintain a "sliding window". We also use a HashMap to store all the characters in the window. Each time we move forward the "start" pointer and increase the size of the sliding window until the size of the window is greater than K. Then we can easily calculate the size of the window which contains at most K distinct characters. The next step is to shrink the window by moving forward the "end" pointer. In order to get the maximum window size, we must move the minimum steps of the end pointer. So each step we move the end pointer, we need to update the map and remove the character out of the sliding window. The stop condition is that when the window size is again equal to the K, which means the window contains K distinct characters. That's the minimum steps we need to move forward the end pointer.

The only trick here is we need to check at the last that if the start pointer is out of boundary, we still need to check if the largest window size. That's a common trick for major string w/ two pointer problems.

https://leetcode.com/discuss/95698/generic-solution-in-java-that-can-be-used-for-unicode
This problem can be solved using two pointers. The important part is `while (map.size() > k)`, we move left pointer to make sure the map size is less or equal to `k`. This can be easily extended to any number of unique characters.
A more generic solution as follows, can be solution for Unicode string:

``````public int lengthOfLongestSubstringKDistinct(String s, int k) {
Map<Character, Integer> map = new HashMap<>();
int left = 0;
int best = 0;
for(int i = 0; i < s.length(); i++) {
// character at the right pointer
char c = s.charAt(i);
map.put(c, map.getOrDefault(c, 0) + 1);
// make sure map size is valid, no need to check left pointer less than s.length()
while (map.size() > k) {
char leftChar = s.charAt(left);
if (map.containsKey(leftChar)) {
map.put(leftChar, map.get(leftChar) - 1);
if (map.get(leftChar) == 0) {
map.remove(leftChar);
}
}
left++;
}
best = Math.max(best, i - left + 1);
}
return best;
}
``````
```    public static int lengthOfLongestSubstringKDistinct(String s, int k) {
int[] count = new int[256];
int num = 0, i = 0, res = 0;
for (int j = 0; j < s.length(); j++) {
if (count[s.charAt(j)]++ == 0) num++;
if (num > k) {
while (--count[s.charAt(i++)] > 0);
num--;
}
res = Math.max(res, j - i + 1);
}
return res;
}```
```    public static int lengthOfLongestSubstringKDistinct(String s, int k) {
Map<Character, Integer> map = new HashMap<>();
int left = 0;
int res = 0;
for(int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
map.put(c, map.getOrDefault(c, 0) + 1);
while (map.size() > k) {
char leftChar = s.charAt(left);
left+=map.get(leftChar);
map.remove(leftChar);
}
res = Math.max(res, i - left + 1);
}
return res;
}```
https://discuss.leetcode.com/topic/48827/java-o-nlogk-using-treemap-to-keep-last-occurrence-interview-follow-up-question/3
The interviewer may say that the string is given as a steam. In this situation, we can't maintain a "left pointer" as the classical O(n) hashmap solution.
Solving the problem with O(n) time is not enough, some interviewer may require this solution as a followup. Instead of recording each char's count, we keep track of char's last occurrence. If you consider k as constant, it is also a O(n) algorithm.
inWindow keeps track of each char in window and its last occurrence position
lastOccurrence is used to find the char in window with left most last occurrence. A better idea is to use a PriorityQueue, as it takes O(1) to getMin, However Java's PQ does not support O(logn) update a internal node, it takes O(n). TreeMap takes O(logn) to do both getMin and update.
Every time when the window is full of k distinct chars, we lookup TreeMap to find the one with leftmost last occurrence and set left bound j to be 1 + first to exclude the char to allow new char coming into window.
``````        public int lengthOfLongestSubstringKDistinct(String str, int k) {
if (str == null || str.isEmpty() || k == 0) {
return 0;
}
TreeMap<Integer, Character> lastOccurrence = new TreeMap<>();
Map<Character, Integer> inWindow = new HashMap<>();
int j = 0;
int max = 1;
for (int i = 0; i < str.length(); i++) {
char in = str.charAt(i);
while (inWindow.size() == k && !inWindow.containsKey(in)) {
int first = lastOccurrence.firstKey();
char out = lastOccurrence.get(first);
inWindow.remove(out);
lastOccurrence.remove(first);
j = first + 1;
}
//update or add in's position in both maps
if (inWindow.containsKey(in)) {
lastOccurrence.remove(inWindow.get(in));
}
inWindow.put(in, i);
lastOccurrence.put(i, in);
max = Math.max(max, i - j + 1);
}
return max;
}``````

1. int maxLength(StreamReader& streamreader, int k) {.1point3acres缃�
2.   unordered_map<char, int> lastInx; // last index of distinct char, space O(k)
3.   unordered_map<int, char> chars;   // distinct char of last index, space O(k)
4.   char c = '\0'; // current char
5.   int front = 0; // first char's index of valid substring
6.   int cur = -1;  // current char's index of valid substring
7.   int maxLen = 0; // max length of valid substring
8.   while ((c = streamreader.read()) != '\0') { // time: O(N)
9.     // update maps of last index and char
10.     if (lastInx.count(c)) chars.erase(lastInx[c]);
11.     lastInx[c] = ++cur; chars[cur] = c;
12.     // advance front index of substring if invalid
13.     while (chars.size() > k) {
14.       if (chars.count(front)) { lastInx.erase(chars[front]); chars.erase(front); }
15.       ++front;. 鐗涗汉浜戦泦,涓€浜╀笁鍒嗗湴
16.     }
17.     // update max length
18.     maxLen = max(maxLen, cur - front + 1);. more info on 1point3acres.com
19.   }
20.   return maxLen;
21. }

Unicode allows for 17 planes, each of 65,536 possible characters (or 'code points'). This gives a total of 1,114,112 possible characters. At present, only about 10% of this space has been allocated.
http://www.programcreek.com/2013/02/longest-substring-which-contains-2-unique-characters/
In this solution, a hashmap is used to track the unique elements in the map. When a third character is added to the map, the left pointer needs to move right.
 ```public int lengthOfLongestSubstringTwoDistinct(String s) { int max=0; HashMap map = new HashMap(); int start=0;   for(int i=0; i2){ max = Math.max(max, i-start);   while(map.size()>2){ char t = s.charAt(start); int count = map.get(t); if(count>1){ map.put(t, count-1); }else{ map.remove(t); } start++; } } }   max = Math.max(max, s.length()-start);   return max; }```