Word Break Related


http://baozitraining.org/blog/calculate-hard-runtime-complexity-in-recursive-solution/
Problem: Given a string s and a dictionary of words dict, add spaces in s to construct a sentence where each word is a valid dictionary word. Return one such possible sentences.
s = "catsanddog", dict = ["cat", "cats", "and", "dog"], return “cats and dog";
Clarification:
  • Q: What to return if s can’t be splitted into words in dict? A: Return NULL;
  • Q: What if there are multiple ways of splitting s? A: Return one split solution is fine;  
  • Q: What if S is null/empty string? A: Return NULL; - So on and so fort.
public String wordBreak(String s, Set<String> dict) {
    if (s == null || s.isEmpty()) {
      return s; 
    }
    List<String> solution = new ArrayList<String>(); 
    boolean ret = wordBreakHelper(s, dict, solution); 
    if (!ret) {
      return null;
    }
   
    //from apache commons lang
    return StringUtils.join(solution, " ");
  }
  
  private boolean wordBreakHelper(String s, Set<String> dict, List<String> solution) {
    if (dict.contains(s)) {
      solution.add(s);
      return true;
    }
    for(int index = 1; index < s.length(); ++index) {
        String left = s.substring(0, index); 
        if (dict.contains(left)) {
          solution.add(left);
          boolean ret = wordBreakHelper(s.substring(index), dict, solution); 
          if (ret) {
            return ret; 
          } else {
            solution.remove(solution.size() - 1);
          }
        }
     }
     return false;
  }

时间复杂度分析

这样的解法看起来一目了然,对于每一个字符串,分成左右子字符串,查左子串、递归右子串(想想为什么不需要递归左子串)。 那么这个算法的时间复杂度是多少呢? 这个复杂度的计算并非那么容易一眼看出,因为解法中的循环内有递归,而每一个递归中又存在循环。所以数嵌套循环数的方法并不好用。因此我们拿一个具体的例子来看:
假设字符串s是abcde; 那么算法变化如下:
当index = 1, 字符串分为左串a和右串bcde,如果a不在字典中,那么这个拆分不成立,情况简化;如果a恰好在字典中,那么算法有两种选择,一种选择是将a放入结果集,然后递归bcde,另一种是不选择a作为完整单词,而是作为单词的一部分去试探下一个index。
当index = 2, 由于index = 1时产生了两种情况:a bcde 和 abcde。 对于a bcde的情况,由于算法递归bcde,和index = 1的情况类似,如果b在字典中,那么会演化出两种情况:a b cde; a bcde; 对于abcde的情况,如果ab在字典中,那么会演化出两种情况:ab cde; abcde;
写到这里,大家应该看得出来,对于每一个index上的字母,是否选择该字母作为单词结束的决定会产生两种可能,而且是层层叠加。因此该算法的复杂度就是2的n次方。
上面的写法虽然思路简单,但是不可避免的进行了很多重复运算。比如cde这个单词至少会被a b cde 和ab cde这两种拆分方法共同计算,如果我们能利用一个额外的集合记录之前的计算结果,那么算法的复杂度会大大下降。
private boolean wordBreakHelper(String s, Set<String> dict, Set<String> unsplitableSet, List<String> solution) {
    if (dict.contains(s)) {
      solution.add(s);
      return true;
    }
    if (unsplitableSet.contains(s)) {
      return false;
    }
    for(int index = 1; index < s.length(); ++index) {
        String left = s.substring(0, index); 
        if (dict.contains(left)) {
          solution.add(left);
          boolean ret = wordBreakHelper(s.substring(index), dict, unsplitableSet, solution); 
          if (ret) {
            return ret; 
          } else {
            solution.remove(solution.size() - 1);
          }
        }
     }
     unsplitableSet.add(s);
     return false;
  }

既然上面的算法避免了相同子串的重复计算,那么以上这个算法的时间复杂度是多少呢? 考虑到所有重复计算已经被排除,那么我们不防穷举所有可能的计算尝试,同样以abcde为例:
以a开头的尝试:a, ab, abc, abcd, abcde,
以b开头的尝试:b, bc, bcd, bcde,
以c开头的尝试:c, cd, cde,
显然所有的计算尝试是n平方的复杂度(n位字符串长度),考虑到所有的尝试只计算一次(算法中的unsplitableSet起到的作用),因此上面的时间复杂度位n的2次方。

Labels

LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

Popular Posts