Massive Algorithms: LeetCode 466 - Count The Repetitions

LeetCode 466 - Count The Repetitions

https://leetcode.com/problems/count-the-repetitions/

Define S = [s,n] as the string S which consists of n connected strings s. For example, ["abc", 3] ="abcabcabc".

On the other hand, we define that string s1 can be obtained from string s2 if we can remove some characters from s2 such that it becomes s1. For example, “abc” can be obtained from “abdbec” based on our definition, but it can not be obtained from “acbbe”.

You are given two non-empty strings s1 and s2 (each at most 100 characters long) and two integers 0 ≤ n1 ≤ 106 and 1 ≤ n2 ≤ 106. Now consider the strings S1 and S2, where S1=[s1,n1] and S2=[s2,n2]. Find the maximum integer M such that [S2,M] can be obtained from S1.

Input:
s1="acb", n1=4
s2="ab", n2=2

Return:
2

https://discuss.leetcode.com/topic/70667/c-0ms-o-str1-length-str2-length

Given a str2, for each str, we can give a value v to this str such that, after greedily looking through str, our imaginary next step is to find str2[v].
In our problem, str is always (str1,n), with a given str1, so, we can take one more step and say that for each n, there is a unique v associated to n(i.e t0 (str,n)).

define a division and a modulo between two strings as follow:

str/str2=argmax{i, (str2,i) can be obtained by str}
str%str2=the v mentioned above associated with str.

All possible values of v is less than str2.size(),
so (str1,n)%str2 will begin to repeat a pattern after a certain n less than str2.size().
(the pattern is the same because in the cases with the same v, our situations are exactly the same),
so is (str1,n)/str2-(str1,n+1)/str2 for the same reason.
We can therefore precompute a table for all these values with O(str1.length*str2.length).

(str1,n) can be divided in three parts:

sth before pattern(A) + pattern parts(B) + sth after pattern(C)

The pattern does not necessarily begin in the first str1, we shall see if n is great enough so that there can be a pattern.

The last pattern(C) is not necessarily complete, we need to calculate it separately.

We can finish in just looking to the precomputed table and doing some simple maths.

https://discuss.leetcode.com/topic/70887/very-clean-and-short-7ms-java-solution-based-on-70664914-s-idea

    public int getMaxRepetitions(String s1, int n1, String s2, int n2) {
        int[] reps = new int[102];
        int[] rests = new int[102];
        int posRest=0, repTime=0;
        int i=0, k=0;
        if(n1 <= 0) return 0;
        while(k==i) {
            i++;
            for(int j=0; j<s1.length(); j++) {
                if(s2.charAt(posRest) == s1.charAt(j)) {
                    posRest++;
                    if(posRest == s2.length()) {
                        repTime++;
                        posRest=0;
                    }
                }
            }
            if(i >= n1)
                return repTime / n2;
            for(k=0; k<i; k++){
                if(posRest == rests[k])
                    break;
            }
            reps[i] = repTime;
            rests[i] = posRest;
        }
        int interval = i-k;
        int repeatCount = (n1-k) / interval;
        int repeatTimes = repeatCount * (reps[i]-reps[k]);
        int remainTimes = reps[(n1-k) % interval + k];
        return (repeatTimes + remainTimes) / n2;
    }

https://discuss.leetcode.com/topic/71256/easy-understanding-java-solution-with-detailed-explanation-21ms

http://www.cnblogs.com/grandyang/p/6149294.html

我们需要使用重复模式来优化我们的方法，我们知道：

如果s2在S1中出现了N次，那么S2肯定在S1中出现了N/n2次，注意这里的大写表示字符串加上重复次数组成的大字符串。

所以我们得出结论，我们只要算出s2出现的次数，然后除以n2，就可以得出S2出现的次数了。

那么问题就是我们表示重复，我们遍历s1字符串n1次，表示每个s1字符串为一段，对于每段，我们有：

1. 出现在该段的s2字符串的累计出现次数

2. 一个nextIndex，其中s2[nextIndex]表示在下一段s1中你所要寻找的第一个字符。(比如说s1="abc", s2="bac", 由于要寻找s2的第一个字符b，而b在s1中的位置是1，所以nextIndex=1；同理，比如s1="abca", s2="bac"，那么nextIndex=2)

表示重复关键就在于nextIndex，比如对于下面这个例子：

Input:
s1="abacb", n1=6
s2="bcaa", n2=1

Return:
3

j --------------->  1 2    3 0 1      2    3 0 1      2    3 0   
S1 --------------> abacb | abacb | abacb | abacb | abacb | abacb 

repeatCount ----->    0  |   1   |   1   |   2   |   2   |   3

nextIndex ------->    2  |   1   |   2   |   1   |   2   |   1

nextIndex的范围从0到s2.size()-1，根据鸽巢原理(又称抽屉原理)，你一定会找到相同的两个nextIndex在遍历s1段s2.size()+1次之后。在上面的例子中，重复的nextIndex出现在第三段，和第一段一样都为2，那么重复的pattern就找到了，是第二段和第三段中的aabc，而且从第四段开始，每两段就有一个aabc，现在我们的目标就是统计出整个S1中有多少个s2。

由于pattern占用了两段，所以interval为2，我们然后看整个S1中有多少个这样的两段，repeat = (n1 - start) / interval。start表示pattern的起始段数，之前的不是pattern，然后我们算在整个S1中有多少个pattern出现，patternCnt = (repeatCnt[k] - repeatCnt[start]) * repeat，注意这里的repeatCnt[k] - repeatCnt[start]表示一个pattern中有多少个字符串s2，个人感觉一半来说都是1个。然后我们算出剩下的非pattern的字符串里能包含几个s2，remainCnt = repeatCnt[start + (n1 - start) % interval]，然后我们把patternCnt + remainCnt之和算出来除以n2就是需要的结果啦。如果pattern未曾出现，那么我们直接用repeatCnt[n1] / n2也能得到正确的结果

    int getMaxRepetitions(string s1, int n1, string s2, int n2) {
        vector<int> repeatCnt(s2.size() + 1, 0);
        vector<int> nextIdx(s2.size() + 1, 0);
        int j = 0, cnt = 0;
        for (int k = 1; k <= n1; ++k) {
            for (int i = 0; i < s1.size(); ++i) {
                if (s1[i] == s2[j]) {
                    ++j;
                    if (j == s2.size()) {  
                        j = 0;
                        ++cnt;
                    }
                }
            }
            repeatCnt[k] = cnt;
            nextIdx[k] = j;
            for (int start = 0; start < k; ++start) {
                if (nextIdx[start] == j) {
                    int interval = k - start;
                    int repeat = (n1 - start) / interval;
                    int patternCnt = (repeatCnt[k] - repeatCnt[start]) * repeat;
                    int remainCnt = repeatCnt[start + (n1 - start) % interval];
                    return (patternCnt + remainCnt) / n2;
                }
            }
        }
        return repeatCnt[n1] / n2;
    }

http://bookshadow.com/weblog/2016/12/04/leetcode-count-the-repetitions/

贪心算法 + 寻找循环节

利用贪心算法计算s1与s2对应字符的匹配位置，由于s1与s2的循环匹配呈现周期性规律，因此可以通过辅助数组dp进行记录

记l1, l2为s1, s2的长度；x1, x2为s1, s2的字符下标

令y1, y2 = x1 % l1, x2 % l2

当s1[y1] == s2[y2]时：

  若dp[y1][y2]不存在，则令dp[y1][y2] = x1, x2

  否则，记dx1, dx2 = dp[y1][y2]，循环节为s1[dx1 ... x1], s2[dx2 ... x2]

另请参阅LeetCode 418. Sentence Screen Fitting的解答。

def getMaxRepetitions(self, s1, n1, s2, n2): """ :type s1: str :type n1: int :type s2: str :type n2: int :rtype: int """ if not set(s2) <= set(s1): return 0 l1, l2 = len(s1), len(s2) dp = collections.defaultdict(dict) x1 = x2 = 0 while x1 < l1 * n1: while s1[x1 % l1] != s2[x2 % l2]: x1 += 1 if x1 >= l1 * n1: break y1, y2 = x1 % l1, x2 % l2 if y2 not in dp[y1]: dp[y1][y2] = x1, x2 else: dx1, dx2 = dp[y1][y2] round = (l1 * n1 - dx1) / (x1 - dx1) x1 = dx1 + round * (x1 - dx1) x2 = dx2 + round * (x2 - dx2) if x1 < l1 * n1: x1 += 1 x2 += 1 return x2 / (n2 * l2)

https://leetcode.com/articles/count-the-repetitions/

In Approach #1, we simply checked for repetition over the entire

[s1,n1]

. However,

n1

could be quiet large and thus, is inefficient to iterate over complete

S1

. We can take advantage of the fact that

s1

is repeating and hence, we could find a pattern of repetition of

s2

S1

. Once, we get the repetition pattern, we can easy calculate how many times the pattern repeats in

n2

O(1)

But what's the pattern!

In approach #1, we kept

\text{index}

which tells the index to search in

s2

. We try to see in the below illustration if this

\text{index}

repeats itself after some fixed iterations of

s1

or not and if so, then how can we leverage it.

After finding the repitition pattern, we can calculate the sum of repeating pattern, part before repitition and part left after repitition as the result in

O(1)

But will this repitition always take place?

Yes! By Pigeonhole principle, which states that if

n

items are put into

m

containers, with

n &gt; m

, then at least one container must contain more than one item. So, according to this, we are sure to find 2 same

index

after scanning at max

\text{size(s2)}

blocks of

s1

Algorithm

Intialize $count=0$ nd $index=0$ , which are same as in Approach #1.
Initialize 2 arrays, say $\text{indexr}$ and $\text{countr}$ of size $(\text{size(s2)}+1)$ , initialized with 0. The size $(\text{size(s2)}+1)$ is based on the Pigeonhole principle as discussed above. The 2 arrays specifies the $\text{index}$ and $\text{count}$ at the start of each $s1$ block.
Iterate over $i$ from $0$ to $n1-1$ :
- Iterate over $j$ from $0$ to $\text{size(s1)}-1$ :
  - If $\text{s1[j]} == \text{s2[index]}$ , increment $\text{index}$ .
  - If $\text{index}$ is equal to $\text{size(s2)}$ , set $\text{index} = 0$ and increment $\text{count}$ .
- Set $\text{countr[i]}=\text{count}$ and $\text{indexr[i]}=\text{index}$
- Iterate over $k$ from $0$ to $i-1$ :
  - If we find the repitition, i.e. current $\text{index} = \text{indexr[k]}$ , we calculate the count for block before the repitition starts, the repeating block and the block left after repitition pattern, which can be calculated as:
  $\begin{aligned} prev_count & = countr [k] \\ pattern_count & = (countr [i] - countr [k]) * \frac{n 1 - 1 - k}{i - k} \\ remain_count & = countr [k + (n 1 - 1 - k) % (i - k)] - countr [k] \end{aligned}$
  - Sum the 3 counts and return the sum divided by $n2$ , since $\text{S2 = [s2,n2]}$
  - If no repetition is found, return $\text{countr[n1-1]/n2}$ .

Time complexity: $\text{O(size(s1)*size(s2))}$ .
- According to the Pigeonhole principle, we need to iterate over $s1$ only $(\text{size(s2)+1})$ times at max.
Space complexity: $O(\text{size(s2)})$ extra space for $\text{indexr}$ and $\text{countr}$ string.

int getMaxRepetitions(string s1, int n1, string s2, int n2)

{

if (n1 == 0)

return 0;

int indexr[s2.size() + 1] = { 0 }; // index at start of each s1 block

int countr[s2.size() + 1] = { 0 }; // count of repititions till the present s1 block

int index = 0, count = 0;

for (int i = 0; i < n1; i++) {

for (int j = 0; j < s1.size(); j++) {

if (s1[j] == s2[index])

++index;

if (index == s2.size()) {

index = 0;

++count;

}

countr[i] = count;

indexr[i] = index;

for (int k = 0; k < i; k++) {

if (indexr[k] == index) {

int prev_count = countr[k];

int pattern_count = (countr[i] - countr[k]) * (n1 - 1 - k) / (i - k);

int remain_count = countr[k + (n1 - 1 - k) % (i - k)] - countr[k];

return (prev_count + pattern_count + remain_count) / n2;

}

return countr[n1 - 1] / n2;

}

X. Brute Force
https://discuss.leetcode.com/topic/70707/ugly-java-brute-force-solution-but-accepted-1088ms

How do we know "string s2 can be obtained from string s1"? Easy, use two pointers iterate through s2 and s1. If chars are equal, move both. Otherwise only move pointer1.
We repeat step 1 and go through s1 for n1 times and count how many times can we go through s2.
Answer to this problem is times go through s2 divide by n2.

    public int getMaxRepetitions(String s1, int n1, String s2, int n2) {
        char[] array1 = s1.toCharArray(), array2 = s2.toCharArray();
        int count1 = 0, count2 = 0, i = 0, j = 0;
        
        while (count1 < n1) {
            if (array1[i] == array2[j]) {
                j++;
                if (j == array2.length) {
                    j = 0;
                    count2++;
                }
            }
            i++;
            if (i == array1.length) {
                i = 0;
                count1++;
            }
        }
        
        return count2 / n2;
    }

https://leetcode.com/articles/count-the-repetitions/

Time complexity:

O(n1*size(s1))

We iterate over the entire length of string $s1$ for $n1$ times.

https://leetcode.com/problems/count-the-repetitions/discuss/95401/Ugly-Java-brute-force-solution-but-accepted.-1088ms.

How do we know "string s2 can be obtained from string s1"? Easy, use two pointers iterate through s2 and s1. If chars are equal, move both. Otherwise only move pointer1.
We repeat step 1 and go through s1 for n1 times and count how many times can we go through s2.
Answer to this problem is times go through s2 divide by n2.

int getMaxRepetitions(string s1, int n1, string s2, int n2)

{

int index = 0, repeat_count = 0;

int s1_size = s1.size(), s2_size = s2.size();

for (int i = 0; i < n1; i++) {

for (int j = 0; j < s1_size; j++) {

if (s1[j] == s2[index])

++index;

if (index == s2_size) {

index = 0;

++repeat_count;

}

return repeat_count / n2;

}

LeetCode 466 - Count The Repetitions

Labels

Popular Posts