Massive Algorithms: LeetCode 828 - Unique Letter String

LeetCode 828 - Unique Letter String

https://leetcode.com/problems/unique-letter-string/

A character is unique in string S if it occurs exactly once in it.

For example, in string S = "LETTER", the only unique characters are "L" and "R".

Let's define UNIQ(S) as the number of unique characters in string S.

For example, UNIQ("LETTER") = 2.

Given a string S with only uppercases, calculate the sum of UNIQ(substring) over all non-empty substrings of S.

If there are two or more equal substrings at different positions in S, we consider them different.

Since the answer can be very large, return the answer modulo 10 ^ 9 + 7.

Example 1:

Input: "ABC"
Output: 10
Explanation: All possible substrings are: "A","B","C","AB","BC" and "ABC".
Evey substring is composed with only unique letters.
Sum of lengths of all substring is 1 + 1 + 1 + 2 + 2 + 3 = 10

Example 2:

Input: "ABA"
Output: 8
Explanation: The same as example 1, except uni("ABA") = 1.

Note: 0 <= S.length <= 10000

肯定不能一个个substring来做，换个角度，求每个字符的有效substring个数

https://www.cnblogs.com/f91og/p/9710808.html

首先以字符串XAXAXXAX为例，如果将第二个 A 变成唯一的character的话，只可能时某个唯一的substring包含了这个A。比如：

We can take "XA(XAXX)AX" and between "()" is our substring.

利用这种方式，可以使得第二个A成为唯一的character，那么从上可知我们要做的是：

We can see here, to make the second "A" counted as a uniq character, we need to:

insert "(" somewhere between the first and second A
insert ")" somewhere between the second and third A

For step 1 we have "A(XA" and "AX(A", 2 possibility.
For step 2 we have "A)XXA", "AX)XA" and "AXX)A", 3 possibilities.

So there are in total 2 * 3 = 6 ways to make the second A a unique character in a substring.
In other words, there are only 6 substring, in which this A contribute 1 point as unique string.

现在逆转下思维，一开始的思维是在原字符串的所有子串中寻找可能的唯一的character，我们现在就直接在S中来计算每个字符，看看对于每个unique char总公有多少种不同的找法。换而言之就是，对于S中的每个字符，如果他作为unique char的话，那么包含他的substring的范围是在前一个相同字符以及后一个相同字符这个区间内找的（参见上面的A），将所有可能的substring数量加起来即可，很有趣的逆向思维。

https://leetcode.com/problems/unique-letter-string/discuss/128952/One-pass-O(N)-Straight-Forward

Let's think about how a character can be found as a unique character.

Think about string "XAXAXXAX" and focus on making the second "A" a unique character.
We can take "XA(XAXX)AX" and between "()" is our substring.
We can see here, to make the second "A" counted as a uniq character, we need to:

insert "(" somewhere between the first and second A
insert ")" somewhere between the second and third A

For step 1 we have "A(XA" and "AX(A", 2 possibility.
For step 2 we have "A)XXA", "AX)XA" and "AXX)A", 3 possibilities.

So there are in total 2 * 3 = 6 ways to make the second A a unique character in a substring.
In other words, there are only 6 substring, in which this A contribute 1 point as unique string.

Instead of counting all unique characters and struggling with all possible substrings,
we can count for every char in S, how many ways to be found as a unique char.
We count and sum, and it will be out answer.

Explanation:

index[26][2] record last two occurrence index for every upper characters.
Initialise all values in index to -1.
Loop on string S, for every character c, update its last two occurrence index to index[c].
Count when loop. For example, if "A" appears twice at index 3, 6, 9 seperately, we need to count:
- For the first "A": (6-3) * (3-(-1))"
- For the second "A": (9-6) * (6-3)"
- For the third "A": (N-9) * (9-6)"

Complexity:
One pass, time complexity O(N).

    public int uniqueLetterString(String S) {
        int[][] index = new int[26][2];
        for (int i = 0; i < 26; ++i) Arrays.fill(index[i], -1);
        int res = 0, N = S.length(), mod = (int)Math.pow(10, 9) + 7;
        for (int i = 0; i < N; ++i) {
            int c = S.charAt(i) - 'A';
            res = (res + (i - index[c][1]) * (index[c][1] - index[c][0]) % mod) % mod;
            index[c] = new int[] {index[c][1], i};
        }
        for (int c = 0; c < 26; ++c)
            res = (res + (N - index[c][1]) * (index[c][1] - index[c][0]) % mod) % mod;
        return res;
    }

X.
https://leetcode.com/problems/unique-letter-string/discuss/129021/O(N)-Java-Solution-DP-Clear-and-easy-to-Understand

n each loop, We caculate cur[i], which represent the sum of Uniq() for all substrings whose last char is S.charAt(i).

For example,
S = 'ABCBD'
When i = 2, cur[2] = Uniq('ABC') + Uniq('BC') + Uniq('C')
When i = 3, cur[3] = Uniq('ABCB') + Uniq('BCB') + Uniq('CB') + Uniq('B')

Notice, we append char 'B' into each previous substrings. Only in substrings 'CB' and 'B', the char 'B' can be identified as uniq. The contribution of 'B' from cur[2] to cur[3] is i - showLastPosition['B']. At the same time, in substrings 'ABCB', 'BCB', the char 'B' can‘t’ be identified as uniq any more, the previous contribution of 'B' should be removed.

So we have'cur[i] = cur[i - 1] - contribution[S.charAt(i)] + (i - showLastPosition[S.charAt(i)])
Then the new contribution[S.charAt(i)] = i - showLastPosition[S.charAt(i)]

The final result is the sum of all cur[i].

The showLastPosition[x] represent the last showing positon of char x befor i. (The initial showLastPosition[x] for all chars should be '-1'. For convenience, I shift it to '0'. It may cause some confusion, notice I update "showLastPosition[x] = i + 1" for each loop.)

    public int uniqueLetterString(String S) {
        
        int res = 0;
        if (S == null || S.length() == 0)
            return res;    
        int[] showLastPosition = new int[128];
        int[] contribution = new int[128];
        int cur = 0;
        for (int i = 0; i < S.length(); i++) {
            char x = S.charAt(i);
            cur -= contribution[x]; 
            contribution[x] = (i - (showLastPosition[x] - 1));
            cur += contribution[x]; 
            showLastPosition[x] = i + 1;
            res += cur;
        }   
        return res;
    }

X.
public int uniqueLetterString(String S) { //上次出现位置 int[] lastAppearPos = new int[26]; //上上次出现位置 int[] lastLastAppearPos = new int[26]; long[] res = new long[S.length() + 1]; for (int i = 1; i < res.length; i++) { int letter = S.charAt(i - 1) - 'A'; res[i] = res[i - 1]; res[i] += (i - lastAppearPos[letter]); res[i]-=(lastAppearPos[letter]-lastLastAppearPos[letter]); lastLastAppearPos[letter]=lastAppearPos[letter]; lastAppearPos[letter]=i; if(i>>8==0) { res[i]%=1000000007; } } long ans=0; for(int i=0;i<res.length;i++)ans+=res[i]; return (int) (ans%1000000007); }

https://leetcode.com/problems/unique-letter-string/discuss/158378/Concise-DP-O(n)-solution

Let dp[i] is sum of unique char in all substring ending at i, then the answer is sum(dp[i]), i=[0..n-1]. It's not difficult to get the recursive formula:
dp[i] = dp[i-1] + (i - first_from_i(s[i])) - (first_from_i(s[i]) - second_from_i(s[i]))

Take the below example:

BBBBBBBBBBBBBBBBBOABCDOABCOABC
                 ^    ^   ^
                 s    f   i

dp[i] = dp[i-1] + (i-f) - (f-s)

When extending s[i] to all substrings ending with s[i-1], there are (i-f) more unique char s[i], and (f-s) less unique char because of duplicate of s[i].

The implementation is quite simple. Here is Java version:

    public int uniqueLetterString(String s) {
            final int MOD = 1000000007;
            int res = 0, dp = 0;
            int[] first = new int[26], second = new int[26];

            for (int i = 0; i < s.length(); i++) {
                int ci = s.charAt(i) - 'A';
                dp = dp + 1 + i - first[ci] * 2 + second[ci];
                res = (res + dp) % MOD;

                second[ci] = first[ci];
                first[ci] = i + 1;
            }

            return res;
    }

X. http://bookshadow.com/weblog/2018/05/06/leetcode-unique-letter-string/

字符统计

分别统计每个字母出现的下标

假设字母letter的下标数组为idx，将-1和len(S)插入idx的头部和尾部

则sum((idx[i] - idx[i - 1]) * (idx[i + 1] - idx[i]))为letter出现的总次数

X. https://leetcode.com/articles/unique-letter-string/
Approach #2: Split by Character [Accepted]

As in Approach #1, we have

U(x) = \sum_{c \in \mathcal{A}} U_c(x)

, where

\mathcal{A} = \{ \text{&quot;A&quot;}, \text{&quot;B&quot;}, \dots \}

is the alphabet, and we only need to answer the following question (26 times, one for each character): how many substrings have exactly one

\text{&quot;A&quot;}

Consider how many substrings have a specific

\text{&quot;A&quot;}

. For example, let's say S only has three "A"'s, at 'S[10] = S[14] = S[20] = "A"; and we want to know the number of substrings that contain S[14]. The answer is that there are 4 choices for the left boundary of the substring (11, 12, 13, 14), and 6 choices for the right boundary (14, 15, 16, 17, 18, 19). So in total, there are 24 substrings that have S[14] as their unique "A".

Continuing our example, if we wanted to count the number of substrings that have S[10], this would be 10 * 4 - note that when there is no more "A" characters to the left of S[10], we have to count up to the left edge of the string.

We can add up all these possibilities to get our final answer.
Time Complexity:

O(N)

, where

N

is the length of S

public int uniqueLetterString(String S) {

Map<Character, List<Integer>> index = new HashMap();

for (int i = 0; i < S.length(); ++i) {

char c = S.charAt(i);

index.computeIfAbsent(c, x -> new ArrayList<Integer>()).add(i);

}

long ans = 0;

for (List<Integer> A : index.values()) {

for (int i = 0; i < A.size(); ++i) {

long prev = i > 0 ? A.get(i - 1) : -1;

long next = i < A.size() - 1 ? A.get(i + 1) : S.length();

ans += (A.get(i) - prev) * (next - A.get(i));

}

return (int) ans % 1_000_000_007;

}

https://leetcode.com/problems/unique-letter-string/discuss/172041/Very-simple-O(N)-with-Prev-and-Next

Just save the prev and next occurrence of the char at the current index for all indices and then with simple math just calculate the number of substrings in which the current char is unique.

int uniqueLetterString(string S) {
        int n = S.size();
        unordered_map<char, int> Last; //index of last occurrence of char
        
        vector<int> Next(n, n); //index of next occurrence of S[i]
        vector<int> Prev(n, 0); //index of prev occurrence of S[i]
        
        for (int i = 0; i < n; ++i) {
            char c = S[i];
            if (Last.count(c)) {
                Prev[i] = Last[c] + 1;
                Next[Last[c]] = i;   
            }
            Last[c] = i;
        }
        
        int res = 0;
        for (int i = 0; i < n; ++i) {
        // the following is the number of substrings
        // in which the current char is unique
            res += ((i + 1) - Prev[i]) * (Next[i] - i);
        }
        return res % 1000000007;
    }

X. Left, right array
https://www.acwing.com/solution/leetcode/content/623/
预处理每个位置 i，左边第一个和自己一样字符的位置 left，右边第一个和自己一样字符的位置 right。该位置 i 贡献的答案就是 (i - left) * (right - i)。
对每个位置进行统计。
int uniqueLetterString(string S) {
const int mod = 1000000007;

vector<int> last(26, -1);
int n = S.length();

vector<int> left(n, -1), right(n, n);

for (int i = 0; i < n; i++) {
left[i] = last[S[i] - 'A'];
last[S[i] - 'A'] = i;
}

last = vector<int>(26, n);

for (int i = n - 1; i >= 0; i--) {
right[i] = last[S[i] - 'A'];
last[S[i] - 'A'] = i;
}

int ans = 0;
for (int i = 0; i < n; i++)
ans = (ans + (LL)(i - left[i]) * (right[i] - i)) % mod;

return ans;
}
X. http://tashi711.xyz/programming/reports/leetcode/leetcode-828/

比较简单的一道题目，考虑每个单独的字母对于最终答案的贡献。如果某个字母a对答案有贡献，那么包含他的子串一定在从这个a往两边延伸到边界或者另一个a之内位置的范围内。那么总共可能的子串有u*v个，其中u、v分别为延伸到边界或者另一个a之内位置的长度。对每个’A’到’Z’的字母a，找到每个a的位置，计算往外延伸的长度，乘起来累加到答案即可。

复杂度分析

　　时间复杂度为

O (N M)

，其中M为字母个数，可以认为是常数26。
　　空间复杂度为

O (N)

。

 int uniqueLetterString(string S) {
  return work(S);
 }

 int work(const string& s) {
  int n = static_cast<int>(s.size());
  long long ans = 0;
  for (char i = 'A'; i <= 'Z'; ++i) {
   for (int j = 0; j < n; ++j) {
    if (s[j] == i) {
     int cnt_pre = j + 1, cnt_next = 0;
     for (int k = j + 1; k < n; ++k) {
      ++cnt_next;
      if (s[k] == i) {
       ans = (ans + cnt_pre * cnt_next) % kM;
       cnt_pre = cnt_next;
       cnt_next = 0;
      }
     }
     ans = (ans + cnt_pre * (cnt_next + 1)) % kM;
     break;
    }
   }
  }
  return ans;
 }

X. https://www.gjxhlan.me/2018/05/09/leetcode-contest-83-solution/

因为只可能有大写字母，用三个指针数组 left, mid, right[26] 来保存某一个字母连续的三个位置，从左到右扫一遍，每次遇到一个字母，更新一次 left = mid，mid = right, right = i，并且将 mid 所在位置的字母的贡献存入 ans，因为 mid 所在位置的字母无法对包含 right 位置右边字符的子串产生影响（unique）。注意最后需要更新一次 ans。

    public int uniqueLetterString(String S) {
        if (S == null || S.length() == 0) return 0; 
        
        long ans = 0;
        final long cons = (long)(1e9 + 7);
        
        int[] left = new int[26], mid = new int[26];
        int[] right = new int[26];
        
        int l = S.length();
        for (int i = 0; i < left.length; i++) left[i] = mid[i] = right[i] = -1;
        
        for (int i = 0; i < l; i++) {
            int c = S.charAt(i) - 'A';
            left[c] = mid[c];
            mid[c] = right[c];
            right[c] = i;
            if (mid[c] != -1) {
                ans = (ans + (mid[c] - left[c]) * (right[c] - mid[c])) % cons;
            }
        }
        
        for (int i = 0; i < 26; i++) {
            if (right[i] != -1) {
                ans = (ans + (l - right[i]) * (right[i] - mid[i])) % cons;
            }
        }
        
        return (int)ans;
    }

X. Two Pointers
https://leetcode.com/problems/unique-letter-string/discuss/129246/Simple-Java-2-Pointer

    int MOD = 1_000_000_007;
    
    public int uniqueLetterString(String s) {
        
        int result = 0;
        
        for(char ch = 'A';ch <= 'Z'; ++ch){
            result = (result + uniqueSubstrings(s,ch))%MOD;
        }
        
        return result;
                
    }
    
    int uniqueSubstrings(String s, char ch){
        int l = 0;
        int prevChIndex = -1;
        int cnt = 0;
        int result = 0;
        
        for(int r=0;r<s.length();++r){
            if(s.charAt(r) == ch){                
                prevChIndex = r;
                cnt++;
            }
            
           
            while(cnt==2){
                if(s.charAt(l)==ch){
                    cnt--;                        
                }
                l++;
            }
        
            
            if(cnt==1){
                result = (result + (prevChIndex + 1 - l)) % MOD;
            }
        }
                
        
        return result;
    }

X. DP

https://leetcode.com/problems/unique-letter-string/discuss/158378/Concise-DP-O(n)-solution

Take the below example:

BBBBBBBBBBBBBBBBBOABCDOABCOABC
                 ^    ^   ^
                 s    f   i

dp[i] = dp[i-1] + (i-f) - (f-s)

When extending s[i] to all substrings ending with s[i-1], there are (i-f) more unique char s[i], and (f-s) less unique char because of duplicate of s[i]

    public int uniqueLetterString(String s) {
            final int MOD = 1000000007;
            int res = 0, dp = 0;
            int[] first = new int[26], second = new int[26];

            for (int i = 0; i < s.length(); i++) {
                int ci = s.charAt(i) - 'A';
                dp = dp + 1 + i - first[ci] * 2 + second[ci];
                res = (res + dp) % MOD;

                second[ci] = first[ci];
                first[ci] = i + 1;
            }

            return res;
    }

Approach #1: Maintain Answer of Suffix [Accepted]

LeetCode 828 - Unique Letter String

复杂度分析

Labels

Popular Posts