Massive Algorithms: LeetCode 686 - Repeated String Match

LeetCode 686 - Repeated String Match

https://leetcode.com/problems/repeated-string-match/description/

Given two strings A and B, find the minimum number of times A has to be repeated such that B is a substring of it. If no such solution, return -1.

For example, with A = "abcd" and B = "cdabcdab".

Return 3, because by repeating A three times (“abcdabcdabcd”), B is a substring of it; and B is not a substring of A repeated two times ("abcdabcd").

Note:
The length of A and B will be between 1 and 10000.

As in Approach #1, we've reduced the problem to deciding whether B is a substring of some A * k. Using the following technique, we can decide whether B is a substring in

O(len(A) * k)

time.

Algorithm

For strings

S

, consider each

S[i]

as some integer ASCII code. Then for some prime

p

, consider the following function modulo some prime modulus

\mathcal{M}

\text{hash}(S) = \sum_{0 \leq i < len(S)} p^i * S[i]

Notably,

\text{hash}(S[1:] + x) = \frac{(\text{hash}(S) - S[0])}{p} + p^{n-1} x

. This shows we can get the hash of every substring of A * q in time complexity linear to it's size. (We will also use the fact that

p^{-1} = p^{\mathcal{M}-2} \mod \mathcal{M}

However, hashes may collide haphazardly. To be absolutely sure in theory, we should check the answer in the usual way. The expected number of checks we make is in the order of

1 + \frac{s}{\mathcal{M}}

where

s

is the number of substrings we computed hashes for (assuming the hashes are equally distributed), which is effectively 1.

Time Complexity: $O(M+N)$ (at these sizes), where $M, N$ are the lengths of strings A, B. As in Approach #1, we justify that A * (q+1) will be of length $O(M + N)$ , and computing the rolling hashes was linear work. We will also do a linear $O(N)$ final check of our answer $1 + O(M) / \mathcal{M}$ times. In total, this is $O(M+N + N(1 + \frac{M}{\mathcal{M}}))$ work. Since $M \leq 10000 < \mathcal{M} = 10^9 + 7$ , we can consider this to be linear behavior.
Space complexity: $O(1)$ . Only integers were stored with additional memory.

public boolean check(int index, String A, String B) {

for (int i = 0; i < B.length(); i++) {

if (A.charAt((i + index) % A.length()) != B.charAt(i)) {

return false;

}

return true;

}

public int repeatedStringMatch(String A, String B) {

int q = (B.length() - 1) / A.length() + 1;

int p = 113, MOD = 1_000_000_007;

int pInv = BigInteger.valueOf(p).modInverse(BigInteger.valueOf(MOD)).intValue();

long bHash = 0, power = 1;

for (int i = 0; i < B.length(); i++) {

bHash += power * B.codePointAt(i);

bHash %= MOD;

power = (power * p) % MOD;

}

long aHash = 0; power = 1;

for (int i = 0; i < B.length(); i++) {

aHash += power * A.codePointAt(i % A.length());

aHash %= MOD;

power = (power * p) % MOD;

}

if (aHash == bHash && check(0, A, B)) return q;

power = (power * pInv) % MOD;

for (int i = B.length(); i < (q + 1) * A.length(); i++) {

aHash -= A.codePointAt((i - B.length()) % A.length());

aHash *= pInv;

aHash += power * A.codePointAt(i % A.length());

aHash %= MOD;

if (aHash == bHash && check(i - B.length() + 1, A, B)) {

return i < q * A.length() ? q : q + 1;

}

return -1;

}

http://www.cnblogs.com/grandyang/p/7631434.html
这道题让我们用字符串B来匹配字符串A，问字符串A需要重复几次，如果无法匹配，则返回-1。那么B要能成为A的字符串，那么A的长度肯定要大于等于B，所以当A的长度小于B的时候，我们可以先进行重复A，直到A的长度大于等于B，并且累计次数cnt。那么此时我们用find来找，看B是否存在A中，如果存在直接返回cnt。如果不存在，我们再加上一个A，再来找，这样可以处理这种情况A="abc", B="cab"，如果此时还找不到，说明无法匹配

https://leetcode.com/problems/repeated-string-match/discuss/108086/Java-Solution-Just-keep-building-(OJ-Missing-Test-Cases)

The idea is to keep string builder and appending until the length A is greater or equal to B.

The question can be summarized as "What is the smallest k for which B is a substring of A * k?" We can just try every k.

Algorithm

Imagine we wrote S = A+A+A+.... If B is to be a substring of S, we only need to check whether some S[0:], S[1:], ..., S[len(A) - 1:] starts with B, as S is long enough to contain B, and S has period at most len(A).

Now, suppose q is the least number for which len(B) <= len(A * q). We only need to check whether B is a substring of A * q or A * (q+1). If we try k < q, then B has larger length than A * q and therefore can't be a substring. When k = q+1, A * k is already big enough to try all positions for B; namely, A[i:i+len(B)] == B for i = 0, 1, ..., len(A) - 1.

Time Complexity: $O(N*(N+M))$ , where $M, N$ are the lengths of strings A, B. We create two strings A * q, A * (q+1) which have length at most O(M+N). When checking whether B is a substring of A, this check takes naively the product of their lengths.
Space complexity: As justified above, we created strings that used $O(M+N)$ space.

 public int repeatedStringMatch(String A, String B) {

    int count = 0;
    StringBuilder sb = new StringBuilder();
    while (sb.length() < B.length()) {
        sb.append(A);
        count++;
    }
    if(sb.toString().contains(B)) return count;
    if(sb.append(A).toString().contains(B)) return ++count;
    return -1;
}

https://leetcode.com/problems/repeated-string-match/discuss/108090/Intuitive-Python-2-liner

Let n be the answer, the minimum number of times A has to be repeated.

For B to be inside A, A has to be repeated sufficient times such that it is at least as long as B (or one more), hence we can conclude that the theoretical lower bound for the answer would be length of B / length of A.

Let x be the theoretical lower bound, which is ceil(len(B)/len(A)).

The answer n can only be x or x + 1 (in the case where len(B) is a multiple of len(A) like in A = "abcd" and B = "cdabcdab") and not more. Because if Bis already in A * n, B is definitely in A * (n + 1).

Hence we only need to check whether B in A * x or B in A * (x + 1), and if both are not possible return -1.

X. KMP

https://leetcode.com/problems/repeated-string-match/discuss/108084/C++-4-lines-O(m-*-n)-or-O(1)-and-7-lines-KMP-O(m-+-n)-or-O(n)

This is basically a modified version of string find, which does not stop at the end of A, but continue matching by looping through A.

int repeatedStringMatch(string A, string B) {
    for (auto i = 0, j = 0; i < A.size(); ++i) {
        for (j = 0; j < B.size() && A[(i + j) % A.size()] == B[j]; ++j);
        if (j == B.size()) return (i + j) / A.size() + ((i + j) % A.size() != 0 ? 1 : 0);
    }
    return -1;
}

As suggested by @k_j, I am also providing O(n + m) version that uses a prefix table (KMP). We first compute the prefix table using the suffix and prefix pointers. Then we are going through A only once, shifting B using the prefix table.

This solution requires O(n) extra memory for the prefix table, but it's the fastest out there (OJ runtime is 3 ms). However, we do not need extra memory to append A multiple times, as in many other solutions.

Also thanks to @mylzsd for the review and the optimization to move i faster (skip multiple search characters) rather incrementing it one by one.

int repeatedStringMatch(string a, string b) {
    vector<int> prefTable(b.size() + 1); // 1-based to avoid extra checks.
    for (auto sp = 1, pp = 0; sp < b.size(); prefTable[++sp] = pp) {
        pp = b[pp] == b[sp] ? pp + 1 :  prefTable[pp];
    }
    for (auto i = 0, j = 0; i < a.size(); i += max(1, j - prefTable[j]), j = prefTable[j]) {
        while (j < b.size() && a[(i + j) % a.size()] == b[j]) ++j;
        if (j == b.size()) return (i + j) / a.size() + ((i + j) % a.size() != 0 ? 1 : 0);
    }
    return -1;
}

Since several users had this question, I am copying here a simple example demonstrating how the KMP algorithm uses the prefix table to skip previously matched prefixes.

A: aabaabaac
B: aabaac

We will start matching from A[i = 0] and B[j = 0], match "aabaa" and stop on B[j = 5] ('c' != 'b'). Without the prefix table, we will have to match again starting from A[i = 1] and B[j = 0].

The prefix table for A looks like this (it's 1-based to avoid extra checks): [0, 0, 1, 0, 1, 2, 3, 4, 5, 2].It tells us that we already matched "aa", and we can continue matching from A[i + j = 5] and B[j = 2].

i += max(1, j - prefTable[j]); i = 0 + 5 - prefTable[5] = 0 + 5 - 2 = 3;
j = prefTable[5] = 2;

A: ~~aab~~aabaac

B: ___aabaac

https://leetcode.com/problems/repeated-string-match/discuss/108128/O(orAor-%2B-orBor)-time-Solution

    public int repeatedStringMatch(String A, String B) {
        StringBuilder builder = new StringBuilder(A);
        while (builder.length() < B.length())
            builder.append(A);

        int[] next = new int[B.length()];
        for (int i = 0, j = -1; i < B.length(); i++) {
            while (j != -1 && B.charAt(i) != B.charAt(j + 1))
                j = next[j];
            next[i] = i > 0 && B.charAt(i) == B.charAt(j + 1) ? j + 1 : -1;
            j = next[i];
        }

        for (int i = 0; i < 3; i++) {
            if (kmpMatch(builder.toString(), B, next))
                return builder.length() / A.length();
            builder.append(A);
        }
        return -1;
    }

    private boolean kmpMatch(String s, String pattern, int[] next) {
        for (int i = 0, j = -1; i < s.length(); i++) {
            while (j != -1 && s.charAt(i) != pattern.charAt(j + 1))
                j = next[j];
            if (s.charAt(i) == pattern.charAt(j + 1))
                j++;
            if (j == pattern.length() - 1)
                return true;
        }
        return false;
    }

下面这种解法还是由热心网友edyyy提供，没有用到字符串自带的find函数，而是逐个字符进行比较，循环字符串A中的所有字符，然后用个变量j，初始化为0，用来循环字符串B中的字符，每个字符和A中对应的字符进行比较，此时从A中取字符就要把A当作一个循环数组来处理，用(i+j)%m来取字符，还要确保j小于n，避免越界，如果字符匹配的话，j自增1。while 循环结束后，如果j等于n了，说明B中所有的字符都成功匹配了，那么我们来计算A的重复次数，通过(i+j-1)/m + 1来得到，注意i+j要减1再除以m，之后再加上一次。因为当i+j正好等于m时，说明此时不用重复A，那么(i+j-1)/m + 1还是等于1，当i+j>m的时候，需要重复A了，(i+j-1)/m + 1也可以得到正确的结构

    int repeatedStringMatch(string A, string B) {
        int m = A.size(), n = B.size();
        for (int i = 0; i < m; ++i) {
            int j = 0;
            while (j < n && A[(i + j) % m] == B[j]) ++j;
            if (j == n) return (i + j - 1) / m + 1;
        }
        return -1;
    }

LeetCode 686 - Repeated String Match

Labels

Popular Posts