Monday, April 4, 2016

LeetCode 340 - Longest Substring with At Most K Distinct Characters


http://www.cnblogs.com/grandyang/p/5351347.html
Given a string, find the length of the longest substring T that contains at most k distinct characters.
For example, Given s = “eceba” and k = 2,
T is "ece" which its length is 3.
https://leetcode.com/discuss/95558/15-lines-java-solution-using-slide-window
One optimization, update the res only when valid substring beginning at i reaches the longest:
  • Put update of res, i.e. res = Math.max(res, j - i + 1), into the if statement, just above the while loop. Replace it by res = Math.max(res, j - i). This is the place where a valid substring grows to the greatest.
  • Put another update for res before return statement, i.e. res = Math.max(s.length() - i, res).
This optimization is very general, it avoids unnecessary updates.

public int lengthOfLongestSubstringKDistinct(String s, int k) { int[] count = new int[256]; int num = 0, i = 0, res = 0; for (int j = 0; j < s.length(); j++) { if (count[s.charAt(j)]++ == 0) num++; if (num > k) { while (--count[s.charAt(i++)] > 0);//?? num--; } res = Math.max(res, j - i + 1); } return res; }
This problem can be solved using two pointers. The important part is while (map.size() > k), we move left pointer to make sure the map size is less or equal to k. This can be easily extended to any number of unique characters.
A more generic solution as follows, can be solution for Unicode string:

public int lengthOfLongestSubstringKDistinct(String s, int k) {
    Map<Character, Integer> map = new HashMap<>();
    int left = 0;
    int best = 0;
    for(int i = 0; i < s.length(); i++) {
        // character at the right pointer
        char c = s.charAt(i);
        map.put(c, map.getOrDefault(c, 0) + 1);
        // make sure map size is valid, no need to check left pointer less than s.length()
        while (map.size() > k) {
            char leftChar = s.charAt(left);
            if (map.containsKey(leftChar)) {
                map.put(leftChar, map.get(leftChar) - 1);                     
                if (map.get(leftChar) == 0) { 
                    map.remove(leftChar);
                }
            }
            left++;
        }
        best = Math.max(best, i - left + 1);
    }
    return best;
}
    public static int lengthOfLongestSubstringKDistinct(String s, int k) {
        int[] count = new int[256];
        int num = 0, i = 0, res = 0;
        for (int j = 0; j < s.length(); j++) {
            if (count[s.charAt(j)]++ == 0) num++;
            if (num > k) {
                while (--count[s.charAt(i++)] > 0);
                num--;
            }
            res = Math.max(res, j - i + 1);
        }
        return res;
    }
    public static int lengthOfLongestSubstringKDistinct(String s, int k) {
        Map<Character, Integer> map = new HashMap<>();
        int left = 0;
        int res = 0;
        for(int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            map.put(c, map.getOrDefault(c, 0) + 1);
            while (map.size() > k) {
                char leftChar = s.charAt(left);
                left+=map.get(leftChar);
                map.remove(leftChar);
            }
            res = Math.max(res, i - left + 1);
        }
        return res;
    }
Follow up
https://discuss.leetcode.com/topic/48827/java-o-nlogk-using-treemap-to-keep-last-occurrence-interview-follow-up-question/3
The interviewer may say that the string is given as a steam. In this situation, we can't maintain a "left pointer" as the classical O(n) hashmap solution.
Solving the problem with O(n) time is not enough, some interviewer may require this solution as a followup. Instead of recording each char's count, we keep track of char's last occurrence. If you consider k as constant, it is also a O(n) algorithm.
inWindow keeps track of each char in window and its last occurrence position
lastOccurrence is used to find the char in window with left most last occurrence. A better idea is to use a PriorityQueue, as it takes O(1) to getMin, However Java's PQ does not support O(logn) update a internal node, it takes O(n). TreeMap takes O(logn) to do both getMin and update.
Every time when the window is full of k distinct chars, we lookup TreeMap to find the one with leftmost last occurrence and set left bound j to be 1 + first to exclude the char to allow new char coming into window.
        public int lengthOfLongestSubstringKDistinct(String str, int k) {
            if (str == null || str.isEmpty() || k == 0) {
                return 0;
            }
            TreeMap<Integer, Character> lastOccurrence = new TreeMap<>();
            Map<Character, Integer> inWindow = new HashMap<>();
            int j = 0;
            int max = 1;
            for (int i = 0; i < str.length(); i++) {
                char in = str.charAt(i);
                while (inWindow.size() == k && !inWindow.containsKey(in)) {
                    int first = lastOccurrence.firstKey();
                    char out = lastOccurrence.get(first);
                    inWindow.remove(out);
                    lastOccurrence.remove(first);
                    j = first + 1;
                }
                //update or add in's position in both maps
                if (inWindow.containsKey(in)) {
                    lastOccurrence.remove(inWindow.get(in));
                }
                inWindow.put(in, i);
                lastOccurrence.put(i, in);
                max = Math.max(max, i - j + 1);
            }
            return max;
        }
http://www.1point3acres.com/bbs/thread-209202-1-1.html
第一轮,Longest Substring with At Most K Distinct Characters变形题,输入是stream而不是string,所以无法知道输入的长度,而且内存不够,无法存入整个stream,所以维持sliding window来更新最大长度的算法无法解决该问题。最后是通过一个map纪录每个字符上次出现的index来实现。提供了函数streamreader.reader()来返回下一个字符

第一轮,Longest Substring with At Most K Distinct Characters变形题: 假设API streamreader.read()返回'\0' 若stream已读完(具体的要求不影响实现)。因为内存有限制,那么这个题还有一个细节就是所给的k大小能不能认为是有界的,一般若假设是26或256个字符的话那么用一个unordered_map<char, int> lastInx记录当前最后k个distinct char出现的最后index 就可以了。但是这样实际会造成O(Nk)时间复杂度,因为在前进substring front的时候还会涉及判断front index是不是某个char的最后index。我是再定义一个reverse mapping: unordered_map<int, char> chars来记录每个最后index所对应的char. 这样用O(2k)的空间就可以实现O(N)的时间。
当然假设内存虽然有限制,但至少是O(k)的 (即使不是O(N)的)。
  1. int maxLength(StreamReader& streamreader, int k) {.1point3acres缃�
  2.   unordered_map<char, int> lastInx; // last index of distinct char, space O(k)
  3.   unordered_map<int, char> chars;   // distinct char of last index, space O(k)
  4.   char c = '\0'; // current char
  5.   int front = 0; // first char's index of valid substring
  6.   int cur = -1;  // current char's index of valid substring
  7.   int maxLen = 0; // max length of valid substring
  8.   while ((c = streamreader.read()) != '\0') { // time: O(N)
  9.     // update maps of last index and char
  10.     if (lastInx.count(c)) chars.erase(lastInx[c]);
  11.     lastInx[c] = ++cur; chars[cur] = c;
  12.     // advance front index of substring if invalid
  13.     while (chars.size() > k) {
  14.       if (chars.count(front)) { lastInx.erase(chars[front]); chars.erase(front); }
  15.       ++front;. 鐗涗汉浜戦泦,涓€浜╀笁鍒嗗湴
  16.     }
  17.     // update max length
  18.     maxLen = max(maxLen, cur - front + 1);. more info on 1point3acres.com
  19.   }
  20.   return maxLen;
  21. }
关键看你的sliding window在advance front index的时候需要记录什么。
在无内存限制的时候,advance front index需要判断删除char s[front_index]后是否减少了某种字符的出现,若是这样实现必然要存储substring从front_index一直到cur_index的字符,这需要O(N)空间,也就是这道题不允许的。所以按照Leetcode原题只记录char frequency是不行的。.鏈枃鍘熷垱鑷�1point3acres璁哄潧
实际上每次front_index++时并不需要知道s[front_index]本身是什么,只需要知道该字符是否是最后出现的instance。这个改进应该就是这个题的考点吧。
当然,前提是内存虽然没有O(N),但至少有O(k). 注意O(k)可以远远小于O(charset).

Unicode allows for 17 planes, each of 65,536 possible characters (or 'code points'). This gives a total of 1,114,112 possible characters. At present, only about 10% of this space has been allocated.


No comments:

Post a Comment

Labels

GeeksforGeeks (976) Algorithm (811) LeetCode (652) to-do (599) Review (360) Classic Algorithm (334) Classic Interview (298) Dynamic Programming (263) Google Interview (233) LeetCode - Review (233) Tree (146) POJ (137) Difficult Algorithm (136) EPI (127) Different Solutions (119) Bit Algorithms (110) Cracking Coding Interview (110) Smart Algorithm (109) Math (91) HackerRank (85) Lintcode (83) Binary Search (73) Graph Algorithm (73) Greedy Algorithm (61) Interview Corner (61) Binary Tree (58) List (58) DFS (56) Algorithm Interview (53) Advanced Data Structure (52) Codility (52) ComProGuide (52) LeetCode - Extended (47) USACO (46) Geometry Algorithm (45) BFS (43) Data Structure (42) Mathematical Algorithm (42) ACM-ICPC (41) Jobdu (39) Interval (38) Recursive Algorithm (38) Stack (38) String Algorithm (38) Binary Search Tree (37) Knapsack (37) Codeforces (36) Introduction to Algorithms (36) Matrix (36) Must Known (36) Beauty of Programming (35) Sort (35) Space Optimization (34) Array (33) Trie (33) prismoskills (33) Segment Tree (32) Union-Find (32) Backtracking (31) HDU (31) Google Code Jam (30) Permutation (30) Puzzles (30) Array O(N) (29) Data Structure Design (29) Company-Zenefits (28) Microsoft 100 - July (28) to-do-must (28) Random (27) Sliding Window (27) GeeksQuiz (25) Logic Thinking (25) hihocoder (25) High Frequency (23) Palindrome (23) Algorithm Game (22) Company - LinkedIn (22) Graph (22) Hash (22) Queue (22) DFS + Review (21) TopCoder (21) Binary Indexed Trees (20) Brain Teaser (20) CareerCup (20) Company - Twitter (20) Pre-Sort (20) Company-Facebook (19) UVA (19) Probabilities (18) Follow Up (17) Codercareer (16) Company-Uber (16) Game Theory (16) Heap (16) Shortest Path (16) String Search (16) Topological Sort (16) Tree Traversal (16) itint5 (16) Iterator (15) Merge Sort (15) O(N) (15) Bisection Method (14) Difficult (14) Number (14) Number Theory (14) Post-Order Traverse (14) Priority Quieue (14) Amazon Interview (13) BST (13) Basic Algorithm (13) Codechef (13) Majority (13) mitbbs (13) Combination (12) Computational Geometry (12) KMP (12) Long Increasing Sequence(LIS) (12) Modify Tree (12) Reconstruct Tree (12) Reservoir Sampling (12) 尺取法 (12) AOJ (11) DFS+Backtracking (11) Fast Power Algorithm (11) Graph DFS (11) LCA (11) LeetCode - DFS (11) Ordered Stack (11) Princeton (11) Tree DP (11) 挑战程序设计竞赛 (11) Binary Search - Bisection (10) Company - Microsoft (10) Company-Airbnb (10) Euclidean GCD (10) Facebook Hacker Cup (10) HackerRank Easy (10) Reverse Thinking (10) Rolling Hash (10) SPOJ (10) Theory (10) Tutorialhorizon (10) X Sum (10) Coin Change (9) Divide and Conquer (9) Lintcode - Review (9) Mathblog (9) Max-Min Flow (9) Stack Overflow (9) Stock (9) Two Pointers (9) Book Notes (8) Bottom-Up (8) DP-Space Optimization (8) Graph BFS (8) LeetCode - DP (8) LeetCode Hard (8) Prefix Sum (8) Prime (8) Suffix Tree (8) System Design (8) Tech-Queries (8) Time Complexity (8) Use XOR (8) 穷竭搜索 (8) Algorithm Problem List (7) DFS+BFS (7) Facebook Interview (7) Fibonacci Numbers (7) Game Nim (7) HackerRank Difficult (7) Hackerearth (7) Interval Tree (7) Linked List (7) Longest Common Subsequence(LCS) (7) Math-Divisible (7) Miscs (7) O(1) Space (7) Probability DP (7) Radix Sort (7) Simulation (7) Xpost (7) n00tc0d3r (7) 蓝桥杯 (7) Bucket Sort (6) Catalan Number (6) Classic Data Structure Impl (6) DFS+DP (6) DP - Tree (6) How To (6) Interviewstreet (6) Kadane’s Algorithm (6) Knapsack - MultiplePack (6) Level Order Traversal (6) Manacher (6) Minimum Spanning Tree (6) One Pass (6) Programming Pearls (6) Quick Select (6) Rabin-Karp (6) Randomized Algorithms (6) Sampling (6) Schedule (6) Suffix Array (6) Threaded (6) reddit (6) AI (5) Art Of Programming-July (5) Big Data (5) Brute Force (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Crazyforcode (5) DFS+Cache (5) DP-Multiple Relation (5) DP-Print Solution (5) Dutch Flag (5) Fast Slow Pointers (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Inversion (5) Java (5) Kadane - Extended (5) Matrix Chain Multiplication (5) Microsoft Interview (5) Morris Traversal (5) Pruning (5) Quadtrees (5) Quick Partition (5) Quora (5) SPFA(Shortest Path Faster Algorithm) (5) Subarray Sum (5) Sweep Line (5) Traversal Once (5) TreeMap (5) jiuzhang (5) to-do-2 (5) 单调栈 (5) 树形DP (5) 1point3acres (4) Anagram (4) Approximate Algorithm (4) Backtracking-Include vs Exclude (4) Brute Force - Enumeration (4) Chess Game (4) Company-Amazon (4) Consistent Hash (4) Convex Hull (4) Cycle (4) DP-Include vs Exclude (4) Dijkstra (4) Distributed (4) Eulerian Cycle (4) Flood fill (4) Graph-Classic (4) HackerRank AI (4) Histogram (4) Kadane Max Sum (4) Knapsack - Mixed (4) Knapsack - Unbounded (4) Left and Right Array (4) MinMax (4) Multiple Data Structures (4) N Queens (4) Nerd Paradise (4) Parallel Algorithm (4) Practical Algorithm (4) Pre-Sum (4) Probability (4) Programcreek (4) Quick Sort (4) Spell Checker (4) Stock Maximize (4) Subsets (4) Sudoku (4) Symbol Table (4) TreeSet (4) Triangle (4) Water Jug (4) Word Ladder (4) algnotes (4) fgdsb (4) 最大化最小值 (4) A Star (3) Abbreviation (3) Algorithm - Brain Teaser (3) Algorithm Design (3) Anagrams (3) B Tree (3) Big Data Algorithm (3) Binary Search - Smart (3) Caterpillar Method (3) Coins (3) Company - Groupon (3) Company - Indeed (3) Cumulative Sum (3) DP-Fill by Length (3) DP-Two Variables (3) Dedup (3) Dequeue (3) Dropbox (3) Easy (3) Edit Distance (3) Expression (3) Finite Automata (3) Forward && Backward Scan (3) Github (3) GoLang (3) Include vs Exclude (3) Joseph (3) Jump Game (3) Knapsack-多重背包 (3) LeetCode - Bit (3) LeetCode - TODO (3) Linked List Merge Sort (3) LogN (3) Master Theorem (3) Maze (3) Min Cost Flow (3) Minesweeper (3) Missing Numbers (3) NP Hard (3) Online Algorithm (3) Pascal's Triangle (3) Pattern Match (3) Project Euler (3) Rectangle (3) Scala (3) SegmentFault (3) Stack - Smart (3) State Machine (3) Streaming Algorithm (3) Subset Sum (3) Subtree (3) Transform Tree (3) Two Pointers Window (3) Warshall Floyd (3) With Random Pointer (3) Word Search (3) bookkeeping (3) codebytes (3) Activity Selection Problem (2) Advanced Algorithm (2) AnAlgorithmADay (2) Application of Algorithm (2) Array Merge (2) BOJ (2) BT - Path Sum (2) Balanced Binary Search Tree (2) Bellman Ford (2) Binomial Coefficient (2) Bit Mask (2) Bit-Difficult (2) Bloom Filter (2) Book Coding Interview (2) Branch and Bound Method (2) Clock (2) Codesays (2) Company - Baidu (2) Complete Binary Tree (2) DFS+BFS, Flood Fill (2) DP - DFS (2) DP-3D Table (2) DP-Classical (2) DP-Output Solution (2) DP-Slide Window Gap (2) DP-i-k-j (2) DP-树形 (2) Distributed Algorithms (2) Divide and Conqure (2) Doubly Linked List (2) GoHired (2) Graham Scan (2) Graph - Bipartite (2) Graph BFS+DFS (2) Graph Coloring (2) Graph-Cut Vertices (2) Hamiltonian Cycle (2) Huffman Tree (2) In-order Traverse (2) Include or Exclude Last Element (2) Information Retrieval (2) Interview - Linkedin (2) Invariant (2) Islands (2) Knuth Shuffle (2) LeetCode - Recursive (2) Linked Interview (2) Linked List Sort (2) Longest SubArray (2) Lucene-Solr (2) MST (2) MST-Kruskal (2) Math-Remainder Queue (2) Matrix Power (2) Minimum Vertex Cover (2) Negative All Values (2) Number Each Digit (2) Numerical Method (2) Object Design (2) Order Statistic Tree (2) Palindromic (2) Parentheses (2) Parser (2) Peak (2) Programming (2) Range Minimum Query (2) Reuse Forward Backward (2) Robot (2) Rosettacode (2) Scan from right (2) Search (2) Shuffle (2) Sieve of Eratosthenes (2) SimHash (2) Simple Algorithm (2) Skyline (2) Spatial Index (2) Stream (2) Strongly Connected Components (2) Summary (2) TV (2) Tile (2) Traversal From End (2) Tree Sum (2) Tree Traversal Return Multiple Values (2) Word Break (2) Word Graph (2) Word Trie (2) Young Tableau (2) 剑指Offer (2) 数位DP (2) 1-X (1) 51Nod (1) Akka (1) Algorithm - How To (1) Algorithm - New (1) Algorithm Series (1) Algorithms Part I (1) Analysis of Algorithm (1) Array-Element Index Negative (1) Array-Rearrange (1) Auxiliary Array (1) Auxiliary Array: Inc&Dec (1) BACK (1) BK-Tree (1) BZOJ (1) Basic (1) Bayes (1) Beauty of Math (1) Big Integer (1) Big Number (1) Binary (1) Binary Tree Variant (1) Bipartite (1) Bit-Missing Number (1) BitMap (1) BitMap index (1) BitSet (1) Bug Free Code (1) BuildIt (1) C/C++ (1) CC Interview (1) Cache (1) Calculate Height at Same Recusrion (1) Cartesian tree (1) Check Tree Property (1) Chinese (1) Circular Buffer (1) Code Quality (1) Codesolutiony (1) Company - Alibaba (1) Company - Palantir (1) Company - WalmartLabs (1) Company-Apple (1) Company-Epic (1) Company-Salesforce (1) Company-Snapchat (1) Company-Yelp (1) Compression Algorithm (1) Concurrency (1) Convert BST to DLL (1) Convert DLL to BST (1) Custom Sort (1) Cyclic Replacement (1) DFS-Matrix (1) DP - Probability (1) DP Fill Diagonal First (1) DP-Difficult (1) DP-End with 0 or 1 (1) DP-Fill Diagonal First (1) DP-Graph (1) DP-Left and Right Array (1) DP-MaxMin (1) DP-Memoization (1) DP-Node All Possibilities (1) DP-Optimization (1) DP-Preserve Previous Value (1) DP-Print All Solution (1) Database (1) Detect Negative Cycle (1) Directed Graph (1) Do Two Things at Same Recusrion (1) Domino (1) Dr Dobb's (1) Duplicate (1) Equal probability (1) External Sort (1) FST (1) Failure Function (1) Fraction (1) Front End Pointers (1) Funny (1) Fuzzy String Search (1) Game (1) Generating Function (1) Generation (1) Genetic algorithm (1) GeoHash (1) Geometry - Orientation (1) Google APAC (1) Graph But No Graph (1) Graph Transpose (1) Graph Traversal (1) Graph-Coloring (1) Graph-Longest Path (1) Gray Code (1) HOJ (1) Hanoi (1) Hard Algorithm (1) How Hash (1) How to Test (1) Improve It (1) In Place (1) Inorder-Reverse Inorder Traverse Simultaneously (1) Interpolation search (1) Interview (1) Interview - Easy (1) Interview - Facebook (1) Isomorphic (1) JDK8 (1) K Dimensional Tree (1) Knapsack - Fractional (1) Knapsack - ZeroOnePack (1) Knight (1) Kosaraju’s algorithm (1) Kruskal (1) Kruskal MST (1) Kth Element (1) Least Common Ancestor (1) LeetCode - Binary Tree (1) LeetCode - Coding (1) LeetCode - Detail (1) LeetCode - Related (1) LeetCode Diffcult (1) Linked List Reverse (1) Linkedin (1) Linkedin Interview (1) Local MinMax (1) Logic Pattern (1) Longest Common Subsequence (1) Longest Common Substring (1) Longest Prefix Suffix(LPS) (1) Manhattan Distance (1) Map && Reverse Map (1) Math - Induction (1) Math-Multiply (1) Math-Sum Of Digits (1) Matrix - O(N+M) (1) Matrix BFS (1) Matrix Graph (1) Matrix Search (1) Matrix+DP (1) Matrix-Rotate (1) Max Min So Far (1) Median (1) Memory-Efficient (1) MinHash (1) MinMax Heap (1) Monotone Queue (1) Monto Carlo (1) Multi-Reverse (1) Multiple DFS (1) Multiple Tasks (1) Next Successor (1) Offline Algorithm (1) PAT (1) Parent-Only Tree (1) Partition (1) Path Finding (1) Patience Sort (1) Persistent (1) Pigeon Hole Principle (1) Power Set (1) Pratical Algorithm (1) Probabilistic Data Structure (1) Proof (1) Python (1) Queue & Stack (1) RSA (1) Ranking (1) Rddles (1) ReHash (1) Realtime (1) Recurrence Relation (1) Recursive DFS (1) Recursive to Iterative (1) Red-Black Tree (1) Region (1) Regular Expression (1) Resources (1) Reverse Inorder Traversal (1) Robin (1) Selection (1) Self Balancing BST (1) Similarity (1) Sort && Binary Search (1) String Algorithm. Symbol Table (1) String DP (1) String Distance (1) SubMatrix (1) Subsequence (1) System of Difference Constraints(差分约束系统) (1) TSP (1) Ternary Search Tree (1) Test (1) Thread (1) TimSort (1) Top-Down (1) Tournament (1) Tournament Tree (1) Transform Tree in Place (1) Tree Diameter (1) Tree Rotate (1) Trie + DFS (1) Trie and Heap (1) Trie vs Hash (1) Trie vs HashMap (1) Triplet (1) Two Data Structures (1) Two Stacks (1) USACO - Classical (1) USACO - Problems (1) UyHiP (1) Valid Tree (1) Vector (1) Wiggle Sort (1) Wikipedia (1) Yahoo Interview (1) ZOJ (1) baozitraining (1) codevs (1) cos126 (1) javabeat (1) jum (1) namic Programming (1) sqrt(N) (1) 两次dijkstra (1) 九度 (1) 二进制枚举 (1) 夹逼法 (1) 归一化 (1) 折半枚举 (1) 枚举 (1) 状态压缩DP (1) 男人八题 (1) 英雄会 (1) 逆向思维 (1)

Popular Posts