Massive Algorithms: LeetCode 730 - Count Different Palindromic Subsequences

LeetCode 730 - Count Different Palindromic Subsequences

LeetCode 664 - Strange Printer
https://www.cnblogs.com/grandyang/p/7942040.html

Given a string S, find the number of different non-empty palindromic subsequences in S, and return that number modulo 10^9 + 7.

A subsequence of a string S is obtained by deleting 0 or more characters from S.

A sequence is palindromic if it is equal to the sequence reversed.

Two sequences A_1, A_2, ... and B_1, B_2, ... are different if there is some i for which A_i != B_i.

Example 1:

Input: 
S = 'bccb'
Output: 6
Explanation: 
The 6 different non-empty palindromic subsequences are 'b', 'c', 'bb', 'cc', 'bcb', 'bccb'.
Note that 'bcb' is counted only once, even though it occurs twice.

Example 2:

Input: 
S = 'abcdabcdabcdabcdabcdabcdabcdabcddcbadcbadcbadcbadcbadcbadcbadcba'
Output: 104860361
Explanation: 
There are 3104860382 different non-empty palindromic subsequences, which is 104860361 modulo 10^9 + 7.

Note:

The length of S will be in the range [1, 1000].
Each character S[i] will be in the set {'a', 'b', 'c', 'd'}.

https://leetcode.com/problems/count-different-palindromic-subsequences/discuss/109507/Java-96ms-DP-Solution-with-Detailed-Explanation

I am not able to pass this question one time but struggle a lot in the basic test cases like "a", "aa", "aaa", "aba", "aabb". Those test cases help my early rough idea to be flawless. The basic idea of DP is easy to understand, I maintain DP[i][j] to record in substring from i to j(included), the number of palindrome without duplicate. Then we consider two cases of the DP equation:

when s.charAt(i) != s.charAt(j):
dp[i][j] = dp[i][j] = dp[i][j - 1] + dp[i + 1][j] - dp[i + 1][j - 1];

When s.charAt(i) == s.charAt(j):
the situation get much more complex and I fix a lot the wrong answers. I have comment the branches where which kind of test cases are considered.

    public int countPalindromicSubsequences(String s) {
        int len = s.length();
        int[][] dp = new int[len][len];

        char[] chs = s.toCharArray();
        for(int i = 0; i < len; i++){
            dp[i][i] = 1;   // Consider the test case "a", "b" "c"...
        }

        for(int distance = 1; distance < len; distance++){
            for(int i = 0; i < len - distance; i++){
                int j = i + distance;
                if(chs[i] == chs[j]){
                    int low = i + 1;
                    int high = j - 1;

              /* Variable low and high here are used to get rid of the duplicate*/

                    while(low <= high && chs[low] != chs[j]){
                        low++;
                    }
                    while(low <= high && chs[high] != chs[j]){
                        high--;
                    }
                    if(low > high){
                        // consider the string from i to j is "a...a" "a...a"... where there is no character 'a' inside the leftmost and rightmost 'a'
                       /* eg:  "aba" while i = 0 and j = 2:  dp[1][1] = 1 records the palindrome{"b"}, 
                         the reason why dp[i + 1][j  - 1] * 2 counted is that we count dp[i + 1][j - 1] one time as {"b"}, 
                         and additional time as {"aba"}. The reason why 2 counted is that we also count {"a", "aa"}. 
                         So totally dp[i][j] record the palindrome: {"a", "b", "aa", "aba"}. 
                         */ 

                        dp[i][j] = dp[i + 1][j - 1] * 2 + 2;  
                    } 
                    else if(low == high){
                        // consider the string from i to j is "a...a...a" where there is only one character 'a' inside the leftmost and rightmost 'a'
                       /* eg:  "aaa" while i = 0 and j = 2: the dp[i + 1][j - 1] records the palindrome {"a"}.  
                         the reason why dp[i + 1][j  - 1] * 2 counted is that we count dp[i + 1][j - 1] one time as {"a"}, 
                         and additional time as {"aaa"}. the reason why 1 counted is that 
                         we also count {"aa"} that the first 'a' come from index i and the second come from index j. So totally dp[i][j] records {"a", "aa", "aaa"}
                        */
                        dp[i][j] = dp[i + 1][j - 1] * 2 + 1;  
                    }
                    else{
                        // consider the string from i to j is "a...a...a... a" where there are at least two character 'a' close to leftmost and rightmost 'a'
                       /* eg: "aacaa" while i = 0 and j = 4: the dp[i + 1][j - 1] records the palindrome {"a",  "c", "aa", "aca"}. 
                          the reason why dp[i + 1][j  - 1] * 2 counted is that we count dp[i + 1][j - 1] one time as {"a",  "c", "aa", "aca"}, 
                          and additional time as {"aaa",  "aca", "aaaa", "aacaa"}.  Now there is duplicate :  {"aca"}, 
                          which is removed by deduce dp[low + 1][high - 1]. So totally dp[i][j] record {"a",  "c", "aa", "aca", "aaa", "aaaa", "aacaa"}
                          */
                        dp[i][j] = dp[i + 1][j - 1] * 2 - dp[low + 1][high - 1]; 
                    }
                }
                else{
                    dp[i][j] = dp[i][j - 1] + dp[i + 1][j] - dp[i + 1][j - 1];  //s.charAt(i) != s.charAt(j)
                }
                dp[i][j] = dp[i][j] < 0 ? dp[i][j] + 1000000007 : dp[i][j] % 1000000007;
            }
        }

        return dp[0][len - 1];
    }

https://leetcode.com/problems/count-different-palindromic-subsequences/discuss/112757/Java-solution-using-simple-DP.-O(n2)-run-time-and-O(n2)-space

For a substring abcba, the extreme characters ('a') can contribute following ways:

Step 1: If 'bcb' is pre computed to 3 ('b', 'c', 'bcb') then considering 'a's will make it 6 ('b', 'c', 'bcb', 'aba', 'aca', 'abcba'). This has been taken care by the following code

dp[beg][end] = 2 * dp[beg + 1][end - 1];

Step 2: In addition it will also add 'a' and 'aa'. However we need to check if 'a' exists in the current substring. If there is no occurence of 'a'

if(leftIdx > rightIdx){
      dp[beg][end] += 2;
}

Step 3: If there is one occurrence of 'a' then we will add just 1 to the count since 'aa' will be duplicated in that case.

else if (leftIdx == rightIdx){
    dp[beg][end] += 1;
}

Step 4: Lastly, if there are two occurrences of 'a' then we need to remove the duplicate palindrome added in step 1. For example, 'aabaa' the substring 'aba' will have 'a', 'b', 'aba' palindromes.
After step 1 we will have 'a', 'b', 'aba', 'aaa', 'aba', 'aabaa'. To remove the additional 'aba' this code has been added

else{
    dp[beg][end] -= dp[leftIdx + 1][rightIdx - 1];
}

The full code :

    private static long MOD = 1000000007;
    public int countPalindromicSubsequences(String S) {
        int len = S.length();
        int[][] dp = new int[len][len];

        // all single letters are a palindrome
        for(int i=0; i<len; i++){
            dp[i][i] = 1;
        }
        
        for(int i=1; i<len; i++){
            // iterating over each substring.
            for(int beg = 0, end = i; end < len; end++, beg++){
                // if the extreme character of current substring match then it contributes to palindrome count
                if(S.charAt(beg) == S.charAt(end)){
                    //eg :  aba, 'b' = 1. Contribution of 'b' in  'abc' are 'b', 'aba' so (contribution of 'b') * 2
                    dp[beg][end] = 2 * dp[beg + 1][end - 1];
                    
                    // idea is to find occurrence of extreme characters with in the substring (excluding the extreme characters)
                    int leftIdx = beg + 1, rightIdx = end - 1;
                    char ch = S.charAt(beg);
                    while(leftIdx <= rightIdx && ch != S.charAt(leftIdx)){
                        leftIdx++;
                    }
                    
                    while(leftIdx <= rightIdx && ch != S.charAt(rightIdx)){
                        rightIdx--;
                    }
                    
                    // if there is no occurrence then we need to add max possible unique palindrome count of 2 characters, which is 2
                    // e,g 'a','a' can form 'a' and 'aa'
                    if(leftIdx > rightIdx){
                        dp[beg][end] += 2;
                    }
                    
                    // if there is one occurrence then we need to add max possible unique palindrome count of 1 character, which is 1
                    else if (leftIdx == rightIdx){
                        dp[beg][end] += 1;
                    }
                    
                    // if there are two occurrence then we need to remove the repetitive palindromes.
                    else{
                        dp[beg][end] -= dp[leftIdx + 1][rightIdx - 1];
                    }
                }
                // else preserve the max count so far. 
                else{
                    dp[beg][end] = dp[beg][end - 1] + dp[beg+1][end] - dp[beg + 1][end - 1];
                }
                
                // this is the modulus logic to prevent overflow.
                dp[beg][end] = (int) ((dp[beg][end] + MOD) % MOD);
            }
        }
        
        return dp[0][len - 1];
    }

X. DFS+Cache
https://leetcode.com/problems/count-different-palindromic-subsequences/discuss/109509/Accepted-Java-Solution-using-memoization

    int div=1000000007;
    public int countPalindromicSubsequences(String S) {    
        TreeSet[] characters = new TreeSet[26];
        int len = S.length();
        
        for (int i = 0; i < 26; i++) characters[i] = new TreeSet<Integer>();
        
        for (int i = 0; i < len; ++i) {
            int c = S.charAt(i) - 'a';
            characters[c].add(i);
        }
        Integer[][] dp = new Integer[len+1][len+1];
         return memo(S,characters,dp, 0, len);
    }
    
    public int memo(String S,TreeSet<Integer>[] characters,Integer[][] dp,int start,int end){
        if (start >= end) return 0;
        if(dp[start][end]!=null) return dp[start][end];
       
            long ans = 0;
            
            for(int i = 0; i < 26; i++) {
                Integer new_start = characters[i].ceiling(start);
                Integer new_end = characters[i].lower(end);
              if (new_start == null || new_start >= end) continue;
                 ans++;
                if (new_start != new_end) ans++;
                ans+= memo(S,characters,dp,new_start+1,new_end);
                
            }
            dp[start][end] = (int)(ans%div);
            return dp[start][end];
    }

这道题给了给了我们一个字符串，让我们求出所有的非空回文子序列的个数，虽然这题限制了字符只有四种，但是我们还是按一般的情况来解吧，可以有26个字母。然后说最终结果要对一个很大的数字取余，这就暗示了结果会是一个很大的值，那么对于这种问题一般都是用DP或者是带记忆数组memo的递归来解，二者的本质其实是一样的。我们先来看带记忆数组memo的递归解法，这种解法的思路是一层一层剥洋葱，比如"bccb"，按照字母来剥，先剥字母b，确定最外层"b _ _ b"，这会产生两个回文子序列"b"和"bb"，然后递归进中间的部分，把中间的回文子序列个数算出来加到结果res中，然后开始剥字母c，找到最外层"cc"，此时会产生两个回文子序列"c"和"cc"，然后由于中间没有字符串了，所以递归返回0，按照这种方法就可以算出所有的回文子序列了。

我们建立一个二维数组chars，外层长度为26，里面放一个空数组。这是为了统计每个字母在原字符串中出现的位置，然后定义一个二维记忆数组memo，其中memo[i][j]表示第i个字符到第j个字符之间的子字符串中的回文子序列的个数，初始化均为0。然后我们遍历字符串S，将每个字符的位置加入其对应的数组中，比如对于"bccb"，那么有：

b -> {0, 3}

c -> {1, 2}

然后在[0, n]的范围内调用递归函数，在递归函数中，首先判断如果start大于等于end，返回0。如果当前位置在memo的值大于0，说明当前情况已经计算过了，直接返回memo数组中的值。否则进行所有字母的遍历，如果某个字母对应的数组中没有值，说明该字母不曾在字符串中出现，跳过。然后我们在字母数组中查找第一个不小于start的位置，查找第一个小于end的位置，当前循环中，start为0，end为4，当前处理字母b，我们的new_start指向0，new_end指向3，如果当前new_start指向了end()，或者其指向的位置大于end，说明当前范围内没有字母b，直接跳过，否则结果res自增1，因为此时new_start存在，至少有个单个的字母b，也可以当作回文子序列，然后看new_start和new_end如果不相同，说明两者各指向了不同的b，此时res应自增1，因为又增加了一个新的回文子序列"bb"，下面就是对中间部分调用递归函数了，把返回值加到结果res中。此时字母b就处理完了，现在处理字母c，此时的start还是0，end还是4，new_start指向1，new_end指向2，跟上面的分析相同，new_start在范围内，结果自增1，因为加上了"c"，然后new_start和new_end不同，结果res再自增1，因为加上了"cc"，其中间没有字符了，调用递归的结果是0，for循环结束，我们将memo[start][end]的值对超大数取余，将该值返回即可，

我们再来看一种迭代的写法，使用一个二维的dp数组，其中dp[i][j]表示子字符串[i, j]中的不同回文子序列的个数，我们初始化dp[i][i]为1，因为任意一个单个字符就是一个回文子序列，其余均为0。这里的更新顺序不是正向，也不是逆向，而是斜着更新，对于"bccb"的例子，其最终dp数组如下，我们可以看到其更新顺序分别是红-绿-蓝-橙。

这样更新的好处是，更新当前位置时，其左，下，和左下位置的dp值均已存在，而当前位置的dp值需要用到这三个位置的dp值。我们观察上面的dp数组，可以发现当S[i]不等于S[j]的时候，dp[i][j] = dp[i][j - 1] + dp[i + 1][j] - dp[i + 1][j - 1]，即当前的dp值等于左边值加下边值减去左下值，因为算左边值的时候包括了左下的所有情况，而算下边值的时候也包括了左下值的所有情况，那么左下值就多算了一遍，所以要减去。而当S[i]等于S[j]的时候，情况就比较复杂了，需要分情况讨论，因为我们不知道中间还有几个和S[i]相等的值。举个简单的例子，比如"aba"和"aaa"，当i = 0, j = 2的时候，两个字符串均有S[i] == S[j]，此时二者都新增两个子序列"a"和"aa"，但是"aba"中间的"b"就可以加到结果res中，而"aaa"中的"a"就不能加了，因为和外层的单独"a"重复了。我们的目标就要找到中间重复的"a"。所以我们让left = i + 1, right = j - 1，然后对left进行while循环，如果left <= right, 且S[left] != S[i]的时候，left向右移动一个；同理，对right进行while循环，如果left <= right, 且S[right] != S[i]的时候，left向左移动一个。这样最终left和right值就有三种情况：

1. 当left > righ时，说明中间没有和S[i]相同的字母了，就是"aba"这种情况，那么就有dp[i][j] = dp[i + 1][j - 1] * 2 + 2，其中dp[i + 1][j - 1]是中间部分的回文子序列个数，为啥要乘2呢，因为中间的所有子序列可以单独存在，也可以再外面包裹上字母a，所以是成对出现的，要乘2。加2的原因是外层的"a"和"aa"也要统计上。

2. 当left = right时，说明中间只有一个和S[i]相同的字母，就是"aaa"这种情况，那么有dp[i][j] = dp[i + 1][j - 1] * 2 + 1，其中乘2的部分跟上面的原因相同，加1的原因是单个字母"a"的情况已经在中间部分算过了，外层就只能再加上个"aa"了。

3. 当left < right时，说明中间至少有两个和S[i]相同的字母，就是"aabaa"这种情况，那么有dp[i][j] = dp[i + 1][j - 1] * 2 - dp[left + 1][right - 1]，其中乘2的部分跟上面的原因相同，要减去left和right中间部分的子序列个数的原因是其被计算了两遍，要将多余的减掉

讨论：这道题确实是一道很难的题，和它类似的题目还有几道，虽然那些题有的还有非DP解法，但是DP解法始终是核心的，也是我们最应该掌握的方法。首先我们要分清子串和子序列的题，个人感觉子序列要更难一些。在之前那道Longest Palindromic Subsequence中要我们求最长的回文子序列，我们需要逆向遍历dp数组，当s[i]和s[j]相同时，长度为中间部分的dp值加2，否则就是左边值和下边值中的较大值，因为是子序列，不匹配就可以忽略当前字符。而对于回文子串的问题，比如Longest Palindromic Substring和Palindromic Substrings，一个是求最长的回文子串，一个是求所有的回文子串个数，他们的dp定义是看子串[i, j]是否是回文串，求最长回文子串就是维护一个最大值，不停用当前回文子串的长度更新这个最大值，同时更新最大值的左右边界。而求所有回文子串的个数就是如果当前dp[i][j]判断是回文串，计数器就自增1。而判断当前dp[i][j]是否是回文串的核心就是s[i]==s[j]，且i，j中间没有字符了，或者中间的dp值为true。

X. Videos
花花酱 LeetCode 730. Count Different Palindromic Subsequences - 刷题找工作 EP114

LeetCode 730 - Count Different Palindromic Subsequences

Labels

Popular Posts