Related: LeetCode 115 - Distinct Subsequences
LeetCode 940 - Distinct Subsequences II
https://leetcode.com/problems/distinct-subsequences-ii/
X.
Improve time complexity from
Furthermore, we can use a
And also a
Then
https://www.jianshu.com/p/02501f516437
X.
X. https://www.geeksforgeeks.org/count-distinct-subsequences/
https://leetcode.com/problems/distinct-subsequences-ii/discuss/192030/Java-DP-O(N2)-time-greater-O(N)-time-greater-O(1)-space
For each
https://www.cnblogs.com/seyjs/p/10412749.html
记dp[i]为以S[i]元素结尾可以组成的子串的个数,很显然dp[0] = 1。显然dp[i]的前一个元素可以是dp[0] ~ dp[i-1]中的任何一个,那么应该有dp[i] = dp[0] + dp[1] +...dp[i-1]。这是对于元素没有重复的情况。假设S[j]是S[0-i]中与S[i]下标最接近的元素并且有S[i] = S[j],那么在以S[i]结尾的子串中,前一个元素是在S[0]~S[j-1]中的任何一个,都会和以S[j]结尾的子串中并且前一个元素是在S[0]~S[j-1]中的任何一个重复,因此这种情况下dp[i] = dp[j]+dp[j+1] + ... dp[i-1]。最后,返回的结果应该为sum(dp)
LeetCode 940 - Distinct Subsequences II
https://leetcode.com/problems/distinct-subsequences-ii/
Given a string
S
, count the number of distinct, non-empty subsequences of S
.
Since the result may be large, return the answer modulo
10^9 + 7
.
Example 1:
Input: "abc" Output: 7 Explanation: The 7 distinct subsequences are "a", "b", "c", "ab", "ac", "bc", and "abc".
Example 2:
Input: "aba" Output: 6 Explanation: The 6 distinct subsequences are "a", "b", "ab", "ba", "aa" and "aba".
Example 3:
Input: "aaa" Output: 3 Explanation: The 3 distinct subsequences are "a", "aa" and "aaa".
Note:
S
contains only lowercase letters.1 <= S.length <= 2000
X.
用一个数组
https://leetcode.com/problems/distinct-subsequences-ii/discuss/192017/C++JavaPython-4-lines-O%28N%29-Time-O%281%29-SpaceendsWith[26]
来表示(截止到当前位置的字符串)中以每个字母结尾的非空sequence的数量。假设当前这个统计是不重不漏的。考虑加入下一个字母S[i]
后增加的所有sequence。在不考虑重复的情况下,这个数量是sum(endsWith) + 1
(原来的所有sequence后面加上这个字母,还有这个字母自己)。如果要发生重复,显然那些被重复的序列的结尾只能是S[i]
;而那些被重复的以S[i]
结尾的sequence去掉S[i]
的结果必定以某种形式存在于endsWith
数组中,因此它们必定都被重复了一遍:因此重复序列的总数就是endsWith[S[i] - 'a']
。于是最后的效果是每次更新的时候endsWith[S[i] - 'a'] = sum(endsWith) + 1
。
Init an array
endswith[26]
endswith[i]
to count how many sub sequence that ends with i
th character.
Now we have
add a new character
then we have
N = sum(endswith)
different sub sequence,add a new character
c
to each of them,then we have
N
different sub sequence that ends with c
.
With this idea, we loop on the whole string
and we update
S
,and we update
end[c] = sum(end) + 1
for each character.
We need to plus one here, because
"c"
itself is also a sub sequence.Example
Input:
Current parsed:
"aba"
Current parsed:
"ab"
endswith
endswith
'a'
: ["a"]
endswith
'b'
: ["ab","b"]
"a"
-> "aa"
"ab"
-> "aba"
"b"
-> "ba"
""
-> "a"
endswith
endswith
result: 6
'a'
: ["aa","aba","ba","a"]
endswith
'b'
: ["ab","b"]
result: 6
public int distinctSubseqII(String S) {
long end[] = new long[26], mod = (long)1e9 + 7;
for (char c : S.toCharArray())
end[c - 'a'] = Arrays.stream(end).sum()%mod + 1;
return (int)(Arrays.stream(end).sum() % mod);
}
Use a variable to count the sum(end)
can avoid repeatly count it.Improve time complexity from
O(26N)
to O(N)
, if you want public int distinctSubseqII(String S) {
int end[] = new int[26], res = 0, added = 0, mod = (int)1e9 + 7;
for (char c : S.toCharArray()) {
added = (res + 1 - end[c - 'a']) % mod;
res = (res + added) % mod;
end[c - 'a'] = (end[c - 'a'] + added) % mod;
}
return (res + mod) % mod;
}
Furthermore, we can use a
sum
to represent sum(dp[0], ..., dp[i - 1])
.And also a
count
array, in which count[S.charAt(i) - 'a']
represents the count of presented subsequence ends with S.charAt(i)
.Then
dp[i] = sum - count[S.charAt(i) - 'a']
. public int distinctSubseqII(String S) {
int n = S.length(), M = (int)1e9 + 7;
int[] dp = new int[n];
Arrays.fill(dp, 1);
int[] count = new int[26];
int sum = 0;
for (int i = 0; i < n; i++) {
int index = S.charAt(i) - 'a';
dp[i] += sum - count[index];
dp[i] = (dp[i] + M) % M;
sum = (sum + dp[i]) % M;
count[index] = (count[index] + dp[i]) % M;
}
return sum;
}
最开始的思路是用
遍历
提交后果然答案对了,但是
转念一想,没必要存下所有节点,只需要知道当前节点的数量就好了。
假设当前有N个不同的seq,每个seq加上一个新的字母,又是新的N个不同sequence了。
但新的seq中,有一部分原来就有。
比如新的字母是
到这里,解题思路就比较清楚了。
我们需要用一个数组
最后修饰一下代码细节。
Trie
来表示所有可能subseq.遍历
string S
,对Trie
中每个节点新建叶节点。提交后果然答案对了,但是
Memory Limit Exceed
。转念一想,没必要存下所有节点,只需要知道当前节点的数量就好了。
Trie
中一个节点对应一个seq。假设当前有N个不同的seq,每个seq加上一个新的字母,又是新的N个不同sequence了。
但新的seq中,有一部分原来就有。
比如新的字母是
'a'
,那么原来以'a'
结尾的seq,每一个都会存在新的这N个seq中。到这里,解题思路就比较清楚了。
我们需要用一个数组
int endsWith[26]
,endsWith[i]
来表示以第i个字母结束的sequence数量。最后修饰一下代码细节。
- 数组全部初始化为0。
- 正好按照题目要求,空string不计作subseq。
- 每次更新的时候,end[i] = sum(end) + 1。
加一的原因是,比如我们遇到新字母'a',新的N个seq里不包含“a”,需要额外加上
这个方法比较简洁,时间O(26N),空间O(26)
可以看心情优化的地方是,用一个变量保存当前数组和,避免重复计算。
时间优化到O(N).
https://zxi.mytechroad.com/blog/dynamic-programming/leetcode-940-distinct-subsequences-ii/可以看心情优化的地方是,用一个变量保存当前数组和,避免重复计算。
时间优化到O(N).
counts[i][j] := # of distinct sub sequences of s[1->i] and ends with letter j. (‘a'<= j <= ‘z’)
Initialization:
counts[*][*] = 0
Transition:
counts[i][j] = sum(counts[i-1]) + 1 if s[i] == j else counts[i-1][j]
ans = sum(counts[n])
e.g. S = “abc”
counts[1] = {‘a’ : 1}
counts[2] = {‘a’ : 1, ‘b’ : 1 + 1 = 2}
counts[3] = {‘a’ : 1, ‘b’ : 2, ‘c’: 1 + 2 + 1 = 4}
ans = sum(counts[3]) = 1 + 2 + 4 = 7
counts[2] = {‘a’ : 1, ‘b’ : 1 + 1 = 2}
counts[3] = {‘a’ : 1, ‘b’ : 2, ‘c’: 1 + 2 + 1 = 4}
ans = sum(counts[3]) = 1 + 2 + 4 = 7
Time complexity: O(N*26)
Space complexity: O(N*26) -> O(26)
https://zhanghuimeng.github.io/post/leetcode-940-distinct-subsequences-ii/X.
这些做法使我感到,这种计数DP是在非常有限的信息下做到不重不漏地统计的过程,所以保留哪些信息是十分关键的。
https://leetcode.com/articles/distinct-subsequences-ii/
public int distinctSubseqII(String S) {
int MOD = 1_000_000_007;
int N = S.length();
int[] dp = new int[N + 1];
dp[0] = 1;
int[] last = new int[26];
Arrays.fill(last, -1);
for (int i = 0; i < N; ++i) {
int x = S.charAt(i) - 'a';
dp[i + 1] = dp[i] * 2 % MOD;
if (last[x] >= 0)
dp[i + 1] -= dp[last[x]];
dp[i + 1] %= MOD;
last[x] = i;
}
dp[N]--;
if (dp[N] < 0)
dp[N] += MOD;
return dp[N];
}
X. https://www.geeksforgeeks.org/count-distinct-subsequences/
The problem of counting distinct subsequences is easy if all characters of input string are distinct. The count is equal to nC0 + nC1 + nC2 + … nCn = 2n.
How to count distinct subsequences when there can be repetition in input string?
A Simple Solution to count distinct subsequences in a string with duplicates is to generate all subsequences. For every subsequence, store it in a hash table if it doesn’t exist already. Time complexity of this solution is exponential and it requires exponential extra space.
A Simple Solution to count distinct subsequences in a string with duplicates is to generate all subsequences. For every subsequence, store it in a hash table if it doesn’t exist already. Time complexity of this solution is exponential and it requires exponential extra space.
An Efficient Solution doesn’t require generation of subsequences.
Let countSub(n) be count of subsequences of first n characters in input string. We can recursively write it as below. countSub(n) = 2*Count(n-1) - Repetition If current character, i.e., str[n-1] of str has not appeared before, then Repetition = 0 Else: Repetition = Count(m) Here m is index of previous occurrence of current character. We basically remove all counts ending with previous occurrence of current character.
How does this work?
If there are no repetitions, then count becomes double of count for n-1 because we get count(n-1) more subsequences by adding current character at the end of all subsequences possible with n-1 length.
If there repetitions, then we find count of all distinct subsequences ending with previous occurrence. This count can be obtained be recursively calling for index of previous occurrence.
If there are no repetitions, then count becomes double of count for n-1 because we get count(n-1) more subsequences by adding current character at the end of all subsequences possible with n-1 length.
If there repetitions, then we find count of all distinct subsequences ending with previous occurrence. This count can be obtained be recursively calling for index of previous occurrence.
Since above recurrence has overlapping subproblems, we can solve it using Dynamic Programming.
https://leetcode.com/problems/distinct-subsequences-ii/discuss/192030/Java-DP-O(N2)-time-greater-O(N)-time-greater-O(1)-space
dp[i]
represents the count of unique subsequence ends with S[i].dp[i]
is initialized to 1
for S[0 ... i]
For each
dp[i]
, we define j
from 0
to i - 1
, we have:- if
s[j] != s[i]
,dp[i] += dp[j]
- if
s[j] == s[i]
, do nothing to avoid duplicates.
Then
Time complexity:
result = sum(dp[0], ... dp[n - 1])
Time complexity:
O(n^2)
public int distinctSubseqII(String S) {
int n = S.length(), M = (int)1e9 + 7, result = 0;
int[] dp = new int[n];
Arrays.fill(dp, 1);
for (int i = 0; i < n; i++) {
for (int j = 0; j < i; j++) {
if (S.charAt(j) != S.charAt(i)) {
dp[i] += dp[j];
dp[i] %= M;
}
}
result += dp[i];
result %= M;
}
return result;
}
X. http://www.noteanddata.com/leetcode-940-Distinct-Subsequences-II-java-solution-note.html- 因为是subsequence,所以数量是比较大的,暴力解已经不可能,通常这个时候需要考虑dp
- 题目要求是unique的,就是要去重。 我们可以先思考不去重应该怎么做
这里dp最多只有两个维度, dp[i], 或者dp[i][j], 我们先看一维的dp[i]能不能解决问题 - 假设dp[i]是以第i个字符串结尾的subsequence的个数
a: dp[0] = 1
ab: dp[0] = 1, dp[1] = 2
abc: dp[0] = 1, dp[1] = 2, dp[2] = 4
abcd: dp[0] = 1, dp[1] = 2, dp[2] = 4, dp[3] = 8
观察规律疑似, dp[i] = dp[0] + … dp[i-1] + 1
因为dp[i]一定是包括下面这些的和
* dp[0] + s.charAt(i)
* dp[1] + s.charAt(i)
* ….
* dp[i-1] + s.charAt(i)
* s.charAt(i)
因为dp[i]一定是包括下面这些的和
* dp[0] + s.charAt(i)
* dp[1] + s.charAt(i)
* ….
* dp[i-1] + s.charAt(i)
* s.charAt(i)
所以, 这个dp递推规则是符合逻辑的。
这时候, 我们要去重, 假设j < i, 并且i和j字符重复, 那么可以看到, 所有的重复都会来自于j之前的,
比如abcc, 所有的重复都来自于第4个c和前面的ab任意选择后组成的结果,以及加上c自己,
而这个结果就是dp[j], 也就是所有下面这些会重复, 把其中的s.charAt(j)换成s.charAt(i)以后是一样的:
* dp[0]+s.charAt(j)
* dp[1]+s.charAt(j)
* …
* dp[j-1]+s.charAt(j)
* s.charAt(j)
这时候, 我们要去重, 假设j < i, 并且i和j字符重复, 那么可以看到, 所有的重复都会来自于j之前的,
比如abcc, 所有的重复都来自于第4个c和前面的ab任意选择后组成的结果,以及加上c自己,
而这个结果就是dp[j], 也就是所有下面这些会重复, 把其中的s.charAt(j)换成s.charAt(i)以后是一样的:
* dp[0]+s.charAt(j)
* dp[1]+s.charAt(j)
* …
* dp[j-1]+s.charAt(j)
* s.charAt(j)
如果这样的话,去重就好办了,对于i和j相等的情况,递推的时候不加上就好了。
public int distinctSubseqII(String s) {
int[] dp = new int[s.length()];
for(int i = 0; i < s.length(); ++i) {
dp[i] = 1;
for(int j = 0; j < i; ++j) {
if(s.charAt(i) != s.charAt(j)) {
dp[i] += dp[j];
dp[i] %= 1_000_000_007;
}
}
}
int sum = 0;
for(int i = 0;i < dp.length; ++i) {
sum += dp[i];
sum %= 1_000_000_007;
}
return sum;
}
数组做了两层循环, 所以时间复杂度是O(N^2)https://www.cnblogs.com/seyjs/p/10412749.html
记dp[i]为以S[i]元素结尾可以组成的子串的个数,很显然dp[0] = 1。显然dp[i]的前一个元素可以是dp[0] ~ dp[i-1]中的任何一个,那么应该有dp[i] = dp[0] + dp[1] +...dp[i-1]。这是对于元素没有重复的情况。假设S[j]是S[0-i]中与S[i]下标最接近的元素并且有S[i] = S[j],那么在以S[i]结尾的子串中,前一个元素是在S[0]~S[j-1]中的任何一个,都会和以S[j]结尾的子串中并且前一个元素是在S[0]~S[j-1]中的任何一个重复,因此这种情况下dp[i] = dp[j]+dp[j+1] + ... dp[i-1]。最后,返回的结果应该为sum(dp)
def distinctSubseqII(self, S): """ :type S: str :rtype: int """ dp = [1] * len(S) for i in range(1,len(S)): for j in range(i-1,-1,-1): if S[i] != S[j]: dp[i] += dp[j] else: dp[i] += dp[j] dp[i] -= 1 break #print dp return sum(dp) % (pow(10,9) + 7)