LeetCode 115 - Distinct Subsequences


[LeetCode] Distinct Subsequences
Given a string S and a string T, count the number of distinct subsequences of T in S.
A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).
Here is an example:
S = "rabbbit"T = "rabbit"
Return 3.
给定两个字符串S和T,求S有多少个不同的子串与T相同。S的子串定义为在S中任意去掉0个或者多个字符形成的串。

When you see string problem that is about subsequence or matching, dynamic programming method should come to your mind naturally.
https://robinliu.gitbooks.io/algorithms/content/shuang_xu_lie_xing.html
双序列型
状态: f[i][j] 表示第一个 sequence 的前 i 个, 与第二个 sequence 的前 j 个的关系
方程: f[i][j] 第 i 个和第 j 个的匹配关系
初始化: f[i][0] 和 f[0][j]
答案: f[m][n]
state: f[i][j]: 表示 S 的前 i 个字符中选取 T 的前j 个字符有多少种方案
function:
f[i][j] = f[i-1][j] + f[i-1][j-1] for S[i-1] == T[j-1]
f[i][j] = f[i-1][j] for S[i-1] != T[j-1]

http://blog.csdn.net/abcbc/article/details/8978146
遇到这种两个串的问题,很容易想到DP。但是这道题的递推关系不明显。可以先尝试做一个二维的表int[][] dp,用来记录匹配子序列的个数(以S="rabbbit",T = "rabbit"为例):
    r a b b b i t
  1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1
a 0 1 1 1 1
b 0 0 2 3 3 3
b 0 0 0 0 3 3 3
i 0 0 0 0 0 0 3 3
t 0 0 0 0 0 0 0 3 
dp[i][j] = dp[i][j - 1] + (T[i - 1] == S[j - 1] ? dp[i - 1][j - 1] : 0).1 <= i <= T.length(), 1 <= j <= S.length()
X. DP Solution:
https://discuss.leetcode.com/topic/9488/easy-to-understand-dp-in-java
  • we will build an array mem where mem[i+1][j+1] means that S[0..j] contains T[0..i] that many times as distinct subsequences. Therefor the result will be mem[T.length()][S.length()].
  • we can build this array rows-by-rows:
  • the first row must be filled with 1. That's because the empty string is a subsequence of any string but only 1 time. So mem[0][j] = 1 for every j. So with this we not only make our lives easier, but we also return correct value if T is an empty string.
  • the first column of every rows except the first must be 0. This is because an empty string cannot contain a non-empty string as a substring -- the very first item of the array: mem[0][0] = 1, because an empty string contains the empty string 1 time.
So the matrix looks like this:
  S 0123....j
T +----------+
  |1111111111|
0 |0         |
1 |0         |
2 |0         |
. |0         |
. |0         |
i |0         |
From here we can easily fill the whole grid: for each (x, y), we check if S[x] == T[y] we add the previous item and the previous item in the previous row, otherwise we copy the previous item in the same row. The reason is simple:
  • if the current character in S doesn't equal to current character T, then we have the same number of distinct subsequences as we had without the new character.
  • if the current character in S equal to the current character T, then the distinct number of subsequences: the number we had before plus the distinct number of subsequences we had with less longer T and less longer S.
An example:
S: [acdabefbc] and T: [ab]
first we check with a:
           *  *
      S = [acdabefbc]
mem[1] = [0111222222]
then we check with ab:
               *  * ]
      S = [acdabefbc]
mem[1] = [0111222222]
mem[2] = [0000022244]
And the result is 4, as the distinct subsequences are:
      S = [a   b    ]
      S = [a      b ]
      S = [   ab    ]
      S = [   a   b ]


The complexity is O(mn).
We can use num[i][j] to save the number of distinct subsequences of T(0, j) in S(0, i). We know that for any number i, num[i][0] = 1

If the character at position i in S is equal to the character at position j in T, there are two options.

Delete the character at position i in S. Then the number of distinct subsequences should be the number of distinct subsequences of T(0, j) in S(0, i – 1).
Remains the character at position i in S. Then the number is the number of distinct subsequences of T(0, j – 1) in S(0, i – 1).
So num[i][j] = num[i – 1][j] + num[i – 1][j – 1].

If the character at position i in S is not equal to the character at position j in T, then we can only delete this character. So num[i][j] = num[i – 1][j].
http://www.cs.cmu.edu/~yandongl/distinctseq.html
public int numDistinct(String S, String T) {
    // array creation
    int[][] mem = new int[T.length()+1][S.length()+1];

    // filling the first row: with 1s
    for(int j=0; j<=S.length(); j++) {
        mem[0][j] = 1;
    }
    
    // the first column is 0 by default in every other rows but the first, which we need.
    
    for(int i=0; i<T.length(); i++) {
        for(int j=0; j<S.length(); j++) {
            if(T.charAt(i) == S.charAt(j)) {
                mem[i+1][j+1] = mem[i][j] + mem[i+1][j];
            } else {
                mem[i+1][j+1] = mem[i+1][j];
            }
        }
    }
    
    return mem[T.length()][S.length()];
}


We define the computation structure to be C[i][j] indicating the number of solutions for S[0...i-1] and T[0...j-1]. i/j in C represents #chars in the substring. It's easier if we include 0 in the structure to accommodate the case when there's no chars(empty string) considered. In order to expand this structure, when updating C[i][j] we have two options:


  • C[i][j] = C[i-1][j]. No matter what current char of S is we simply don't use it. We will only use chars [0,...i-2] from S no matter how many solutions there are to cover T[0...j-1]
  • But if current char of S is same to current of T (S[i-1]==T[j-1]) then we have another choice: we can use all the solutions of C[i-1][j-1] to increment the solution C[i][j]. Therefore C[i][j]+=C[i-1][j-1]

  •     public int numDistinct(String S, String T) {
            int M = S.length();
            int N = T.length();
            int[][] num = new int[M + 1][N + 1];
            for (int i = 0; i <= N; i++)
                num[0][i] = 0;
            for (int i = 0; i <= M; i++)
                num[i][0] = 1;
            for (int i = 1; i <= M; i++) {
                for (int j = 1; j <= N; j++) {
                    if (S.charAt(i - 1) != T.charAt(j - 1)) {
                        num[i][j] = num[i - 1][j];
                    } else {
                        num[i][j] = num[i - 1][j] + num[i - 1][j - 1];
                    }
                }
            }
            return num[M][N];
        }
     6         int lens = S.length(), lent = T.length();
     7         if(lent == 0)return 1;
     8         else if(lens == 0)return 0;

    X: Dimension DP, Space Optimization
    用滚动数组减少内存消耗。
    https://discuss.leetcode.com/topic/6465/a-dp-solution-with-clarification-and-explanation/
    As one can observe from the algorithm, in the inner loop, we only refer to the values in the previous iteration, i.e. the values in the previous row of the dp matrix. Therefore, to optimize the algorithm, we could reduce the space by keeping only two arrays, instead of the entire matrix. The algorithm can be optimized as follows. The running time is then reduced from 448ms to 424 ms.
    public int numDistinct_sdp(String S, String T) {
     int sl = S.length();
     int tl = T.length();
     
     int [] preComb = new int[sl+1];
     int [] comb = new int[sl+1];
     
     
     for(int i=0; i<=sl; i++)
      preComb[i] = 1;  
    
     for(int t=1; t<=tl; ++t){
      for(int s=1; s<=sl; ++s){
       if(T.charAt(t-1) != S.charAt(s-1)){
        comb[s] = comb[s-1];
       }else{
        comb[s] = comb[s-1] + preComb[s-1];
       }
      }
      
      for(int i=0; i<=sl; ++i){
       preComb[i] = comb[i];
      }
     }
     
     return preComb[sl];
    }
    According to your analysis, I think we could reduce the space to one array.
    You set preComb[i] = comb[i], and have not set the value again, so I think we don't need this preComb[] anyway.
    One tricky part is that in the inner iteration (with "t" as the iterator), we should do the reverse order (i.e. from the end index to the beginning). Otherwise, the result would not be correct.
    public int numDistinct(String S, String T) {
        int sl = S.length();
        int tl = T.length();
    
        int[] dp = new int[tl+1];
        dp[0] = 1;
    
        for(int s=1; s<=sl; s++)
            for(int t=tl; t>=1; t--){
                if(S.charAt(s-1)==T.charAt(t-1))
                    dp[t] += dp[t-1];
            }
    
        return dp[tl];
    }



    when see question about two strings , DP should be considered first.
    in its i-th iteration, it only needs results in (i-1)-th iteration. More specifically, it only needs table[i-1][j] and table[i][j-1].

    When we use an 1-dimensional array, at the beginning of i-th iteration, recurs[j] means the number of recurrences of T(0:j) in S(0:i-1) and it can be updated as recurs[j]+=recurs[j-1] if the current character matches.
    In i-th iteration,
    at the beginning, recurs[j] = the number of distinct subsequences of T[0,j] in S[0,i-1];
    after being updated, recurs[i] = the number of distinct subsequences of T[0,j] in S[0,i].
    We have to run from T.length down to 0 since we don't want to overwrite recurs[j] which would be used for recurs[j+1] later.
     public int numDistinct(String S, String T) {  
       int si = S.length(), ti = T.length();   
       if (si<=0 || ti<=0 || si<ti) return 0;   
       int[] recurs = new int[ti];   
       for (int i=0; i<si; ++i) {   
         for (int j=Math.min(i, ti-1); j>=0; --j) {   
           if (S.charAt(i)==T.charAt(j)) {   
             recurs[j] += (j==0) ? 1 : recurs[j-1];   
           }   
         }   
       }   
       return recurs[ti-1];       
     } 
    计算dp[i][j]需要的额外信息只有左边一格的旧制,那么就直接用hold将其保存起来不就好了.
        public int numDistinct(String S, String T) {
            if (T.length() == 0) {
                return 1;
            }
            int[] dp = new int[T.length() + 1];
            dp[0] = 1;
            for (int i = 0; i < S.length(); i++) {
                int hold = 1;
                for (int j = 1; j < dp.length; j++) {
                    if (dp[j - 1] == 0)
                        break;
                    int h = dp[j];
                    if (S.charAt(i) == T.charAt(j - 1)) {
                        dp[j] = hold + dp[j];
                    }
                    hold = h;
                }
            }
            return dp[dp.length - 1];
        }
        int numDistinct(string S, string T) {
            int n = S.size(), m = T.size();
            vector<int> dp(n+1, 1);
            for(int i=1; i<=m; i++) {
                int upLeft = dp[0];
                dp[0] = 0;
                for(int j=1; j<=n; j++) {
                    int temp = dp[j];
                    dp[j] = dp[j-1];
                    if(S[j-1]==T[i-1]) dp[j] += upLeft;
                    upLeft = temp;
                }
            }
            return dp[n];
        }

    1:  int numDistinct(string S, string T) { 
    4:      int match[200];  
    5:      if(S.size() < T.size()) return 0; // exit early 
    6:      match[0] = 1;  
    7:      for(int i=1; i <= T.size(); i++)  
    8:        match[i] = 0;  
    9:      for(int i=1; i<= S.size(); i ++)  
    10:        for(int j =T.size(); j>=1; j--)  
    11:          if(S[i-1] == T[j-1])  
    12:            match[j]+= match[j-1];  
    13:      return match[T.size()];  
    14:    } 
    http://blog.theliuy.com/distinct-subsequences/
        int numDistinct(string S, string T) {
            // Start typing your C/C++ solution below
            // DO NOT write int main() function
            
            if (S.size() < T.size())
                return 0;
            
            int *prev = new int[S.size() + 1];
            int *curr = new int[S.size() + 1];
            int it, is;
            int num;
            
            for (is = 0; is <= S.size(); ++is) {
                prev[is] = 1;
            }
            
            for (it = 1; it < T.size() + 1; ++it) {
                
                curr[it] = (S[it - 1] == T[it - 1]) ? prev[it - 1] : 0;
                for (is = it + 1; is < S.size() + 1; ++is) {
                    curr[is] = (S[is - 1] == T[it - 1]) ? prev[is - 1] + curr[is - 1] : curr[is - 1];
                }
                
                swap(prev, curr);
            }
            
            num = prev[S.size()];
            delete prev;
            delete curr;
            
            return num;
        }
    Recursive Version: Inefficient.
    http://n00tc0d3r.blogspot.com/2013/02/find-number-of-distinct-subsequences.html
     private int numSubseq(String S, String T, int si, int ti) {  
       if (si<0 || ti<0 || si<ti) return 0;  
       
       if (S.charAt(si)==T.charAt(ti)) {  
         if (ti==0)  
           return numSubseq(S, T, si-1, ti) + 1;  
         else  
           return numSubseq(S, T, si-1, ti) + numSubseq(S, T, si-1, ti-1);  
       }  
       
       return numSubseq(S, T, si-1, ti);  
     }  
     public int numDistinct(String S, String T) {  
       return numSubseq(S, T, S.length()-1, T.length()-1);  
     }
      public int numDistinct(String S, String T) {
        if (S.length() == 0) {
          return T.length() == 0 ? 1 : 0;
        }
        if (T.length() == 0) {
          return 1;
        }
        int cnt = 0;
        for (int i = 0; i < S.length(); i++) {
          if (S.charAt(i) == T.charAt(0)) {
            cnt += numDistinct(S.substring(i + 1), T.substring(1));
          }
        }
        return cnt;
      }
    http://www.programcreek.com/2013/01/leetcode-distinct-subsequences-total-java/
    [LeetCode] Distinct Subsequences

    http://www.geeksforgeeks.org/count-distinct-occurrences-as-a-subsequence/
    int findSubsequenceCount(string S, string T)
    {
        int m = T.length(), n = S.length();
        // T can't appear as a subsequence in S
        if (m > n)
            return 0;
        // mat[i][j] stores the count of occurrences of
        // T(1..i) in S(1..j).
        int mat[m + 1][n + 1];
        // Initializing first column with all 0s. An empty
        // string can't have another string as suhsequence
        for (int i = 1; i <= m; i++)
            mat[i][0] = 0;
        // Initializing first row with all 1s. An empty
        // string is subsequence of all.
        for (int j = 0; j <= n; j++)
            mat[0][j] = 1;
        // Fill mat[][] in bottom up manner
        for (int i = 1; i <= m; i++)
        {
            for (int j = 1; j <= n; j++)
            {
                // If last characters don't match, then value
                // is same as the value without last character
                // in S.
                if (T[i - 1] != S[j - 1])
                    mat[i][j] = mat[i][j - 1];
                // Else value is obtained considering two cases.
                // a) All substrings without last character in S
                // b) All substrings without last characters in
                //    both.
                else
                    mat[i][j] = mat[i][j - 1] + mat[i - 1][j - 1];
            }
        }
        /* uncomment this to print matrix mat
        for (int i = 1; i <= m; i++, cout << endl)
            for (int j = 1; j <= n; j++)
                cout << mat[i][j] << " ";  */
        return mat[m][n] ;
    }
    X. DFS - brute force
    http://www.geeksforgeeks.org/find-number-times-string-occurs-given-string/
    int count(string a, string b, int m, int n)
    {
        // If both first and second string is empty,
        // or if second string is empty, return 1
        if ((m == 0 && n == 0) || n == 0)
            return 1;
        // If only first string is empty and second
        // string is not empty, return 0
        if (m == 0)
            return 0;
        // If last characters are same
        // Recur for remaining strings by
        // 1. considering last characters of both strings
        // 2. ignoring last character of first string
        if (a[m - 1] == b[n - 1])
            return count(a, b, m - 1, n - 1) +
                   count(a, b, m - 1, n);
        else
            // If last characters are different, ignore
            // last char of first string and recur for
            // remaining string
            return count(a, b, m - 1, n);
    }



    Labels

    LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

    Popular Posts