LeetCode 340 - Longest Substring with At Most K Distinct Characters


http://www.cnblogs.com/grandyang/p/5351347.html
Given a string, find the length of the longest substring T that contains at most k distinct characters.
For example, Given s = “eceba” and k = 2,
T is "ece" which its length is 3.
https://leetcode.com/discuss/95558/15-lines-java-solution-using-slide-window
One optimization, update the res only when valid substring beginning at i reaches the longest:
  • Put update of res, i.e. res = Math.max(res, j - i + 1), into the if statement, just above the while loop. Replace it by res = Math.max(res, j - i). This is the place where a valid substring grows to the greatest.
  • Put another update for res before return statement, i.e. res = Math.max(s.length() - i, res).
This optimization is very general, it avoids unnecessary updates.

public int lengthOfLongestSubstringKDistinct(String s, int k) { int[] count = new int[256]; int num = 0, i = 0, res = 0; for (int j = 0; j < s.length(); j++) { if (count[s.charAt(j)]++ == 0) num++; if (num > k) { while (--count[s.charAt(i++)] > 0);//?? num--; } res = Math.max(res, j - i + 1); } return res; }
http://buttercola.blogspot.com/2016/06/leetcode-340-longest-substring-with-at.html
The problem is very similar to the Leetcode question 3 (Longest Substring Without Repeating Characters). 

The key idea of the solution is to use two pointers to maintain a "sliding window". We also use a HashMap to store all the characters in the window. Each time we move forward the "start" pointer and increase the size of the sliding window until the size of the window is greater than K. Then we can easily calculate the size of the window which contains at most K distinct characters. The next step is to shrink the window by moving forward the "end" pointer. In order to get the maximum window size, we must move the minimum steps of the end pointer. So each step we move the end pointer, we need to update the map and remove the character out of the sliding window. The stop condition is that when the window size is again equal to the K, which means the window contains K distinct characters. That's the minimum steps we need to move forward the end pointer. 

The only trick here is we need to check at the last that if the start pointer is out of boundary, we still need to check if the largest window size. That's a common trick for major string w/ two pointer problems.

https://leetcode.com/discuss/95698/generic-solution-in-java-that-can-be-used-for-unicode
This problem can be solved using two pointers. The important part is while (map.size() > k), we move left pointer to make sure the map size is less or equal to k. This can be easily extended to any number of unique characters.
A more generic solution as follows, can be solution for Unicode string:

public int lengthOfLongestSubstringKDistinct(String s, int k) {
    Map<Character, Integer> map = new HashMap<>();
    int left = 0;
    int best = 0;
    for(int i = 0; i < s.length(); i++) {
        // character at the right pointer
        char c = s.charAt(i);
        map.put(c, map.getOrDefault(c, 0) + 1);
        // make sure map size is valid, no need to check left pointer less than s.length()
        while (map.size() > k) {
            char leftChar = s.charAt(left);
            if (map.containsKey(leftChar)) {
                map.put(leftChar, map.get(leftChar) - 1);                     
                if (map.get(leftChar) == 0) { 
                    map.remove(leftChar);
                }
            }
            left++;
        }
        best = Math.max(best, i - left + 1);
    }
    return best;
}
    public static int lengthOfLongestSubstringKDistinct(String s, int k) {
        int[] count = new int[256];
        int num = 0, i = 0, res = 0;
        for (int j = 0; j < s.length(); j++) {
            if (count[s.charAt(j)]++ == 0) num++;
            if (num > k) {
                while (--count[s.charAt(i++)] > 0);
                num--;
            }
            res = Math.max(res, j - i + 1);
        }
        return res;
    }
    public static int lengthOfLongestSubstringKDistinct(String s, int k) {
        Map<Character, Integer> map = new HashMap<>();
        int left = 0;
        int res = 0;
        for(int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            map.put(c, map.getOrDefault(c, 0) + 1);
            while (map.size() > k) {
                char leftChar = s.charAt(left);
                left+=map.get(leftChar);
                map.remove(leftChar);
            }
            res = Math.max(res, i - left + 1);
        }
        return res;
    }
Follow up
https://discuss.leetcode.com/topic/48827/java-o-nlogk-using-treemap-to-keep-last-occurrence-interview-follow-up-question/3
The interviewer may say that the string is given as a steam. In this situation, we can't maintain a "left pointer" as the classical O(n) hashmap solution.
Solving the problem with O(n) time is not enough, some interviewer may require this solution as a followup. Instead of recording each char's count, we keep track of char's last occurrence. If you consider k as constant, it is also a O(n) algorithm.
inWindow keeps track of each char in window and its last occurrence position
lastOccurrence is used to find the char in window with left most last occurrence. A better idea is to use a PriorityQueue, as it takes O(1) to getMin, However Java's PQ does not support O(logn) update a internal node, it takes O(n). TreeMap takes O(logn) to do both getMin and update.
Every time when the window is full of k distinct chars, we lookup TreeMap to find the one with leftmost last occurrence and set left bound j to be 1 + first to exclude the char to allow new char coming into window.
        public int lengthOfLongestSubstringKDistinct(String str, int k) {
            if (str == null || str.isEmpty() || k == 0) {
                return 0;
            }
            TreeMap<Integer, Character> lastOccurrence = new TreeMap<>();
            Map<Character, Integer> inWindow = new HashMap<>();
            int j = 0;
            int max = 1;
            for (int i = 0; i < str.length(); i++) {
                char in = str.charAt(i);
                while (inWindow.size() == k && !inWindow.containsKey(in)) {
                    int first = lastOccurrence.firstKey();
                    char out = lastOccurrence.get(first);
                    inWindow.remove(out);
                    lastOccurrence.remove(first);
                    j = first + 1;
                }
                //update or add in's position in both maps
                if (inWindow.containsKey(in)) {
                    lastOccurrence.remove(inWindow.get(in));
                }
                inWindow.put(in, i);
                lastOccurrence.put(i, in);
                max = Math.max(max, i - j + 1);
            }
            return max;
        }
http://www.1point3acres.com/bbs/thread-209202-1-1.html
第一轮,Longest Substring with At Most K Distinct Characters变形题,输入是stream而不是string,所以无法知道输入的长度,而且内存不够,无法存入整个stream,所以维持sliding window来更新最大长度的算法无法解决该问题。最后是通过一个map纪录每个字符上次出现的index来实现。提供了函数streamreader.reader()来返回下一个字符

第一轮,Longest Substring with At Most K Distinct Characters变形题: 假设API streamreader.read()返回'\0' 若stream已读完(具体的要求不影响实现)。因为内存有限制,那么这个题还有一个细节就是所给的k大小能不能认为是有界的,一般若假设是26或256个字符的话那么用一个unordered_map<char, int> lastInx记录当前最后k个distinct char出现的最后index 就可以了。但是这样实际会造成O(Nk)时间复杂度,因为在前进substring front的时候还会涉及判断front index是不是某个char的最后index。我是再定义一个reverse mapping: unordered_map<int, char> chars来记录每个最后index所对应的char. 这样用O(2k)的空间就可以实现O(N)的时间。
当然假设内存虽然有限制,但至少是O(k)的 (即使不是O(N)的)。
  1. int maxLength(StreamReader& streamreader, int k) {.1point3acres缃�
  2.   unordered_map<char, int> lastInx; // last index of distinct char, space O(k)
  3.   unordered_map<int, char> chars;   // distinct char of last index, space O(k)
  4.   char c = '\0'; // current char
  5.   int front = 0; // first char's index of valid substring
  6.   int cur = -1;  // current char's index of valid substring
  7.   int maxLen = 0; // max length of valid substring
  8.   while ((c = streamreader.read()) != '\0') { // time: O(N)
  9.     // update maps of last index and char
  10.     if (lastInx.count(c)) chars.erase(lastInx[c]);
  11.     lastInx[c] = ++cur; chars[cur] = c;
  12.     // advance front index of substring if invalid
  13.     while (chars.size() > k) {
  14.       if (chars.count(front)) { lastInx.erase(chars[front]); chars.erase(front); }
  15.       ++front;. 鐗涗汉浜戦泦,涓€浜╀笁鍒嗗湴
  16.     }
  17.     // update max length
  18.     maxLen = max(maxLen, cur - front + 1);. more info on 1point3acres.com
  19.   }
  20.   return maxLen;
  21. }
关键看你的sliding window在advance front index的时候需要记录什么。
在无内存限制的时候,advance front index需要判断删除char s[front_index]后是否减少了某种字符的出现,若是这样实现必然要存储substring从front_index一直到cur_index的字符,这需要O(N)空间,也就是这道题不允许的。所以按照Leetcode原题只记录char frequency是不行的。.鏈枃鍘熷垱鑷�1point3acres璁哄潧
实际上每次front_index++时并不需要知道s[front_index]本身是什么,只需要知道该字符是否是最后出现的instance。这个改进应该就是这个题的考点吧。
当然,前提是内存虽然没有O(N),但至少有O(k). 注意O(k)可以远远小于O(charset).

Unicode allows for 17 planes, each of 65,536 possible characters (or 'code points'). This gives a total of 1,114,112 possible characters. At present, only about 10% of this space has been allocated.
http://www.programcreek.com/2013/02/longest-substring-which-contains-2-unique-characters/
In this solution, a hashmap is used to track the unique elements in the map. When a third character is added to the map, the left pointer needs to move right.
public int lengthOfLongestSubstringTwoDistinct(String s) {
    int max=0;
    HashMap<Character,Integer> map = new HashMap<Character, Integer>();
    int start=0;
 
    for(int i=0; i<s.length(); i++){
        char c = s.charAt(i);
        if(map.containsKey(c)){
            map.put(c, map.get(c)+1);
        }else{
            map.put(c,1);
        }
 
        if(map.size()>2){
            max = Math.max(max, i-start);
 
            while(map.size()>2){
                char t = s.charAt(start);
                int count = map.get(t);
                if(count>1){
                    map.put(t, count-1);
                }else{
                    map.remove(t);
                }
                start++;
            }
        }
    }
 
    max = Math.max(max, s.length()-start);
 
    return max;
}

Labels

LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

Popular Posts