Massive Algorithms: LeetCode 115 - Distinct Subsequences

LeetCode 115 - Distinct Subsequences

Given a string S and a string T, count the number of distinct subsequences of T in S.

A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).

Here is an example:
S = "rabbbit", T = "rabbit"

Return 3.

给定两个字符串S和T，求S有多少个不同的子串与T相同。S的子串定义为在S中任意去掉0个或者多个字符形成的串。

When you see string problem that is about subsequence or matching, dynamic programming method should come to your mind naturally.
https://robinliu.gitbooks.io/algorithms/content/shuang_xu_lie_xing.html
双序列型

状态: f[i][j] 表示第一个 sequence 的前 i 个, 与第二个 sequence 的前 j 个的关系
方程: f[i][j] 第 i 个和第 j 个的匹配关系
初始化: f[i][0] 和 f[0][j]
答案: f[m][n]

state: f[i][j]: 表示 S 的前 i 个字符中选取 T 的前j 个字符有多少种方案
function:
f[i][j] = f[i-1][j] + f[i-1][j-1] for S[i-1] == T[j-1]
f[i][j] = f[i-1][j] for S[i-1] != T[j-1]

http://blog.csdn.net/abcbc/article/details/8978146
遇到这种两个串的问题，很容易想到DP。但是这道题的递推关系不明显。可以先尝试做一个二维的表int[][] dp，用来记录匹配子序列的个数（以S="rabbbit",T = "rabbit"为例）：

r a b b b i t

1 1 1 1 1 1 1 1

r 0 1 1 1 1 1 1 1

a 0 0 1 1 1 1 1 1

b 0 0 0 1 2 3 3 3

b 0 0 0 0 1 3 3 3

i 0 0 0 0 0 0 3 3

t 0 0 0 0 0 0 0 3

dp[i][j] = dp[i][j - 1] + (T[i - 1] == S[j - 1] ? dp[i - 1][j - 1] : 0).1 <= i <= T.length(), 1 <= j <= S.length()

X. DP Solution:
https://discuss.leetcode.com/topic/9488/easy-to-understand-dp-in-java

we will build an array mem where mem[i+1][j+1] means that S[0..j] contains T[0..i] that many times as distinct subsequences. Therefor the result will be mem[T.length()][S.length()].
we can build this array rows-by-rows:
the first row must be filled with 1. That's because the empty string is a subsequence of any string but only 1 time. So mem[0][j] = 1 for every j. So with this we not only make our lives easier, but we also return correct value if T is an empty string.
the first column of every rows except the first must be 0. This is because an empty string cannot contain a non-empty string as a substring -- the very first item of the array: mem[0][0] = 1, because an empty string contains the empty string 1 time.

So the matrix looks like this:

  S 0123....j
T +----------+
  |1111111111|
0 |0         |
1 |0         |
2 |0         |
. |0         |
. |0         |
i |0         |

From here we can easily fill the whole grid: for each (x, y), we check if S[x] == T[y] we add the previous item and the previous item in the previous row, otherwise we copy the previous item in the same row. The reason is simple:

if the current character in S doesn't equal to current character T, then we have the same number of distinct subsequences as we had without the new character.
if the current character in S equal to the current character T, then the distinct number of subsequences: the number we had before plus the distinct number of subsequences we had with less longer T and less longer S.

An example:
S: [acdabefbc] and T: [ab]

first we check with a:

           *  *
      S = [acdabefbc]
mem[1] = [0111222222]

then we check with ab:

               *  * ]
      S = [acdabefbc]
mem[1] = [0111222222]
mem[2] = [0000022244]

And the result is 4, as the distinct subsequences are:

      S = [a   b    ]
      S = [a      b ]
      S = [   ab    ]
      S = [   a   b ]

The complexity is O(mn).
We can use num[i][j] to save the number of distinct subsequences of T(0, j) in S(0, i). We know that for any number i, num[i][0] = 1

If the character at position i in S is equal to the character at position j in T, there are two options.

Delete the character at position i in S. Then the number of distinct subsequences should be the number of distinct subsequences of T(0, j) in S(0, i – 1).
Remains the character at position i in S. Then the number is the number of distinct subsequences of T(0, j – 1) in S(0, i – 1).
So num[i][j] = num[i – 1][j] + num[i – 1][j – 1].

If the character at position i in S is not equal to the character at position j in T, then we can only delete this character. So num[i][j] = num[i – 1][j].
http://www.cs.cmu.edu/~yandongl/distinctseq.html

public int numDistinct(String S, String T) {
    // array creation
    int[][] mem = new int[T.length()+1][S.length()+1];

    // filling the first row: with 1s
    for(int j=0; j<=S.length(); j++) {
        mem[0][j] = 1;
    }
    
    // the first column is 0 by default in every other rows but the first, which we need.
    
    for(int i=0; i<T.length(); i++) {
        for(int j=0; j<S.length(); j++) {
            if(T.charAt(i) == S.charAt(j)) {
                mem[i+1][j+1] = mem[i][j] + mem[i+1][j];
            } else {
                mem[i+1][j+1] = mem[i+1][j];
            }
        }
    }
    
    return mem[T.length()][S.length()];
}

We define the computation structure to be C[i][j] indicating the number of solutions for S[0...i-1] and T[0...j-1]. i/j in C represents #chars in the substring. It's easier if we include 0 in the structure to accommodate the case when there's no chars(empty string) considered. In order to expand this structure, when updating C[i][j] we have two options:

C[i][j] = C[i-1][j]. No matter what current char of S is we simply don't use it. We will only use chars [0,...i-2] from S no matter how many solutions there are to cover T[0...j-1]

But if current char of S is same to current of T (S[i-1]==T[j-1]) then we have another choice: we can use all the solutions of C[i-1][j-1] to increment the solution C[i][j]. Therefore C[i][j]+=C[i-1][j-1]

    public int numDistinct(String S, String T) {

        int M = S.length();

        int N = T.length();

        int[][] num = new int[M + 1][N + 1];

        for (int i = 0; i <= N; i++)

            num[0][i] = 0;

        for (int i = 0; i <= M; i++)

            num[i][0] = 1;

        for (int i = 1; i <= M; i++) {

            for (int j = 1; j <= N; j++) {

                if (S.charAt(i - 1) != T.charAt(j - 1)) {

                    num[i][j] = num[i - 1][j];

                } else {

                    num[i][j] = num[i - 1][j] + num[i - 1][j - 1];

}

}

}

        return num[M][N];

}

 6         int lens = S.length(), lent = T.length();
 7         if(lent == 0)return 1;
 8         else if(lens == 0)return 0;

X: Dimension DP, Space Optimization
用滚动数组减少内存消耗。
https://discuss.leetcode.com/topic/6465/a-dp-solution-with-clarification-and-explanation/

As one can observe from the algorithm, in the inner loop, we only refer to the values in the previous iteration, i.e. the values in the previous row of the dp matrix. Therefore, to optimize the algorithm, we could reduce the space by keeping only two arrays, instead of the entire matrix. The algorithm can be optimized as follows. The running time is then reduced from 448ms to 424 ms.

public int numDistinct_sdp(String S, String T) {
 int sl = S.length();
 int tl = T.length();
 
 int [] preComb = new int[sl+1];
 int [] comb = new int[sl+1];
 
 
 for(int i=0; i<=sl; i++)
  preComb[i] = 1;  

 for(int t=1; t<=tl; ++t){
  for(int s=1; s<=sl; ++s){
   if(T.charAt(t-1) != S.charAt(s-1)){
    comb[s] = comb[s-1];
   }else{
    comb[s] = comb[s-1] + preComb[s-1];
   }
  }
  
  for(int i=0; i<=sl; ++i){
   preComb[i] = comb[i];
  }
 }
 
 return preComb[sl];
}

According to your analysis, I think we could reduce the space to one array.
You set preComb[i] = comb[i], and have not set the value again, so I think we don't need this preComb[] anyway.

One tricky part is that in the inner iteration (with "t" as the iterator), we should do the reverse order (i.e. from the end index to the beginning). Otherwise, the result would not be correct.

public int numDistinct(String S, String T) {
    int sl = S.length();
    int tl = T.length();

    int[] dp = new int[tl+1];
    dp[0] = 1;

    for(int s=1; s<=sl; s++)
        for(int t=tl; t>=1; t--){
            if(S.charAt(s-1)==T.charAt(t-1))
                dp[t] += dp[t-1];
        }

    return dp[tl];
}

when see question about two strings , DP should be considered first.
in its i-th iteration, it only needs results in (i-1)-th iteration. More specifically, it only needs table[i-1][j] and table[i][j-1].

When we use an 1-dimensional array, at the beginning of i-th iteration, recurs[j] means the number of recurrences of T(0:j) in S(0:i-1) and it can be updated as recurs[j]+=recurs[j-1] if the current character matches.
In i-th iteration,
at the beginning, recurs[j] = the number of distinct subsequences of T[0,j] in S[0,i-1];
after being updated, recurs[i] = the number of distinct subsequences of T[0,j] in S[0,i].
We have to run from T.length down to 0 since we don't want to overwrite recurs[j] which would be used for recurs[j+1] later.

 public int numDistinct(String S, String T) {  
   int si = S.length(), ti = T.length();   
   if (si<=0 || ti<=0 || si<ti) return 0;   
   int[] recurs = new int[ti];   
   for (int i=0; i<si; ++i) {   
     for (int j=Math.min(i, ti-1); j>=0; --j) {   
       if (S.charAt(i)==T.charAt(j)) {   
         recurs[j] += (j==0) ? 1 : recurs[j-1];   
       }   
     }   
   }   
   return recurs[ti-1];       
 }

计算dp[i][j]需要的额外信息只有左边一格的旧制，那么就直接用hold将其保存起来不就好了.

    public int numDistinct(String S, String T) {
        if (T.length() == 0) {
            return 1;
        }
        int[] dp = new int[T.length() + 1];
        dp[0] = 1;
        for (int i = 0; i < S.length(); i++) {
            int hold = 1;
            for (int j = 1; j < dp.length; j++) {
                if (dp[j - 1] == 0)
                    break;
                int h = dp[j];
                if (S.charAt(i) == T.charAt(j - 1)) {
                    dp[j] = hold + dp[j];
                }
                hold = h;
            }
        }
        return dp[dp.length - 1];
    }

    int numDistinct(string S, string T) {
        int n = S.size(), m = T.size();
        vector<int> dp(n+1, 1);
        for(int i=1; i<=m; i++) {
            int upLeft = dp[0];
            dp[0] = 0;
            for(int j=1; j<=n; j++) {
                int temp = dp[j];
                dp[j] = dp[j-1];
                if(S[j-1]==T[i-1]) dp[j] += upLeft;
                upLeft = temp;
            }
        }
        return dp[n];
    }

1:  int numDistinct(string S, string T) { 
4:      int match[200];  
5:      if(S.size() < T.size()) return 0; // exit early 
6:      match[0] = 1;  
7:      for(int i=1; i <= T.size(); i++)  
8:        match[i] = 0;  
9:      for(int i=1; i<= S.size(); i ++)  
10:        for(int j =T.size(); j>=1; j--)  
11:          if(S[i-1] == T[j-1])  
12:            match[j]+= match[j-1];  
13:      return match[T.size()];  
14:    }

http://blog.theliuy.com/distinct-subsequences/

    int numDistinct(string S, string T) {

        // Start typing your C/C++ solution below

        // DO NOT write int main() function

        if (S.size() < T.size())

            return 0;

        int *prev = new int[S.size() + 1];

        int *curr = new int[S.size() + 1];

        int it, is;

        int num;

        for (is = 0; is <= S.size(); ++is) {

            prev[is] = 1;

}

        for (it = 1; it < T.size() + 1; ++it) {

            curr[it] = (S[it - 1] == T[it - 1]) ? prev[it - 1] : 0;

            for (is = it + 1; is < S.size() + 1; ++is) {

                curr[is] = (S[is - 1] == T[it - 1]) ? prev[is - 1] + curr[is - 1] : curr[is - 1];

}

            swap(prev, curr);

}

        num = prev[S.size()];

        delete prev;

        delete curr;

        return num;

}

Recursive Version: Inefficient.
http://n00tc0d3r.blogspot.com/2013/02/find-number-of-distinct-subsequences.html

 private int numSubseq(String S, String T, int si, int ti) {  
   if (si<0 || ti<0 || si<ti) return 0;  
   
   if (S.charAt(si)==T.charAt(ti)) {  
     if (ti==0)  
       return numSubseq(S, T, si-1, ti) + 1;  
     else  
       return numSubseq(S, T, si-1, ti) + numSubseq(S, T, si-1, ti-1);  
   }  
   
   return numSubseq(S, T, si-1, ti);  
 }  
 public int numDistinct(String S, String T) {  
   return numSubseq(S, T, S.length()-1, T.length()-1);  
 }

  public int numDistinct(String S, String T) {
    if (S.length() == 0) {
      return T.length() == 0 ? 1 : 0;
    }
    if (T.length() == 0) {
      return 1;
    }
    int cnt = 0;
    for (int i = 0; i < S.length(); i++) {
      if (S.charAt(i) == T.charAt(0)) {
        cnt += numDistinct(S.substring(i + 1), T.substring(1));
      }
    }
    return cnt;
  }

http://www.programcreek.com/2013/01/leetcode-distinct-subsequences-total-java/

[LeetCode] Distinct Subsequences

http://www.geeksforgeeks.org/count-distinct-occurrences-as-a-subsequence/

int findSubsequenceCount(string S, string T)

{

    int m = T.length(), n = S.length();

    // T can't appear as a subsequence in S

    if (m > n)

        return 0;

    // mat[i][j] stores the count of occurrences of

    // T(1..i) in S(1..j).

    int mat[m + 1][n + 1];

    // Initializing first column with all 0s. An empty

    // string can't have another string as suhsequence

    for (int i = 1; i <= m; i++)

        mat[i][0] = 0;

    // Initializing first row with all 1s. An empty

    // string is subsequence of all.

    for (int j = 0; j <= n; j++)

        mat[0][j] = 1;

    // Fill mat[][] in bottom up manner

    for (int i = 1; i <= m; i++)

    {

        for (int j = 1; j <= n; j++)

        {

            // If last characters don't match, then value

            // is same as the value without last character

            // in S.

            if (T[i - 1] != S[j - 1])

                mat[i][j] = mat[i][j - 1];

            // Else value is obtained considering two cases.

            // a) All substrings without last character in S

            // b) All substrings without last characters in

            //    both.

            else

                mat[i][j] = mat[i][j - 1] + mat[i - 1][j - 1];

        }

    }

    /* uncomment this to print matrix mat

    for (int i = 1; i <= m; i++, cout << endl)

        for (int j = 1; j <= n; j++)

            cout << mat[i][j] << " ";  */

    return mat[m][n] ;

}

X. DFS - brute force
http://www.geeksforgeeks.org/find-number-times-string-occurs-given-string/

int count(string a, string b, int m, int n)

{

    // If both first and second string is empty,

    // or if second string is empty, return 1

    if ((m == 0 && n == 0) || n == 0)

        return 1;

    // If only first string is empty and second

    // string is not empty, return 0

    if (m == 0)

        return 0;

    // If last characters are same

    // Recur for remaining strings by

    // 1. considering last characters of both strings

    // 2. ignoring last character of first string

    if (a[m - 1] == b[n - 1])

        return count(a, b, m - 1, n - 1) +

               count(a, b, m - 1, n);

    else

        // If last characters are different, ignore 

        // last char of first string and recur for 

        // remaining string

        return count(a, b, m - 1, n);

}

LeetCode 115 - Distinct Subsequences

Labels

Popular Posts