Massive Algorithms: LeetCode 459 - Repeated Substring Pattern

LeetCode 459 - Repeated Substring Pattern

http://bookshadow.com/weblog/2016/11/13/leetcode-repeated-substring-pattern/

Given a non-empty string check if it can be constructed by taking a substring of it and appending multiple copies of the substring together. You may assume the given string consists of lowercase English letters only and its length will not exceed 10000.

Example 1:

Input: "abab"

Output: True

Explanation: It's the substring "ab" twice.

Example 2:

Input: "aba"

Output: False

Example 3:

Input: "abcabcabcabc"

Output: True

Explanation: It's the substring "abc" four times. (And the substring "abcabc" twice.)

https://leetcode.com/problems/repeated-substring-pattern/discuss/94334/Easy-python-solution-with-explaination

First char of input string is first char of repeated substring
Last char of input string is last char of repeated substring
Let S1 = S + S (where S in input string)
Remove 1 and last char of S1. Let this be S2
If S exists in S2 then return true else false
Let i be index in S2 where S starts then repeated substring length i + 1 and repeated substring S[0: i+1]

The maximum length of a "repeated" substring that you could get from a string would be half it's length
For example, s = "abcdabcd", "abcd" of len = 4, is the repeated substring.
You cannot have a substring >(len(s)/2), that can be repeated.

So, when ss = s + s , we will have atleast 4 parts of "repeated substring" in ss.
(s+s)[1:-1], With this we are removing 1st char and last char => Out of 4 parts of repeated substring, 2 part will be gone (they will no longer have the same substring).
ss.find(s) != -1, But still we have 2 parts out of which we can make s. And that's how ss should have s, if s has repeated substring

首先证明充分性：
不妨设S = X1,X2...Xn，则X1X2...Xn∈X2...XnX1...Xn-1，令从Xi处开始匹配（2<=i<=n）
则Xi...XnX1...Xi-1 = X1X2...Xn，令Xi...Xn = a， X1...Xi-1 = b，则ab = ba，则a = b，所以S = aa，所以S可由a重复两次得到。
必要性类似可证

One way to think about this one-line Java solution return (s+s).substring(1,2*s.length()-1).indexOf(s)!=-1; is interleaving. Since we get rid of head and tail, if we still find the whole string in s+s, it must start from the left s string, cover the entire intersection of s and s, and end in the right s string since otherwise the length would not be enough.

Then we could move the whole piece parallel to left and right correspondingly to cover the start and end which we hided. Then the left uncovered part must = the first part in the whole string we found in s+s. Thus in the original string, we have the first two part the same, then we could use transitive rule to generalize this pattern to the whole string (the back to start has the same logic)

https://sikasjc.github.io/2017/09/10/RepeatedSubstring/

Let’s say T = S + S.
“S is Repeated => From T[1:-1] we can find S” is obvious.

即只要我们在T[1:-1]中可以找到S，S便为重复的。

下面是关于这句话的证明的翻译：

如果在T[1:-1]中可以找到S，且index为p-1, 那么在T和S中的index为p
令s1 = S[:p], S可以被表示为s1X1, 因为S为重复字符串，所以S = X1s1 = s1X1 (A)
1. len(X1) < len(s1)
  这种情况不存在，以为如果我们发现S在T[1:-1]，则index应为len(X1)，而不是len(s1) = p
2. len(X1) == len(s1)
  由A式可以推出X1 = s1，则S为重复的
3. len(X1) > len(s1)
  因为A式，X1有前缀s1，可以表示为s1X2
  X1 = s1X2，由A式得到s1s1X2 = s1X2s1, 即X2s1 = s1X2 (B)
  这下就简洁明了了，依照前面的步骤递推，最终我们可以得到一个Xn，len(Xn) == len(s1) 并且 Xns1 = s1Xn => Xn = s1 => S = s1X1 = s1s1X2 = ... = s1s1...s1Xn-1 = s1s1...s1s1Xn => S is Repeated.

https://leetcode.com/problems/repeated-substring-pattern/discuss/94344/Simple-Java-solution-2-lines

public boolean repeatedSubstringPattern(String str) {
        String s = str + str;
        return s.substring(1, s.length() - 1).contains(str);
}

https://blog.jing.do/4878

double现在的string
去掉头和尾巴（因为如果有substring，第一个肯定是substring的头部，最后一个肯定是他的尾部）
然后看double的string是否包含老的string（如果重复老string至少有2个substring，新的至少4个，去除掉了头和尾，刚好剩下两个（如果原先三次，则新的会有4个），所以肯定包含）

X. KMP

https://www.jiuzhang.com/solutions/repeated-substring-pattern/

KMP算法对于next数组的应用。
next[i]是指对于字符串s[0,i-1]中，前缀与后缀的最大匹配长度。
例如对于"abcabcabc"来说，其next[8] = 5，也即对于s[0,7]="abcabcab"，前缀与后缀最大匹配的串为"abcab"，长度为5。
用字符串长度减1减去最后一位的next数组值之后得到的应为重复串的长度

    public boolean repeatedSubstringPattern(String s) {
        int l = s.length();
        int[] next = new int[l];
        next[0] = -1;
        int i, j = -1;
        for (i = 1; i < l; i++) {
            while (j >= 0 && s.charAt(i) != s.charAt(j + 1)) {
                j = next[j];
            }
            if (s.charAt(i) == s.charAt(j + 1)) {
                j++;
            }
            next[i] = j;
        }
        int lenSub = l - 1 - next[l - 1];
        return lenSub != l && l % lenSub ==0;
    }
1

下面这种方法是参考的网上的这个帖子，原作者说是用的KMP算法，LeetCode之前也有一道应用KMP算法来解的题Shortest Palindrome，但是感觉那道题才是KMP算法。这道题也称为KMP算法感觉怪怪的(关于KMP的详细介绍请参见从头到尾彻底理解KMP，也可以看博主自己写的一篇KMP Algorithm 字符串匹配算法KMP小结)，KMP算法中的next数组是找当前位置的最大相同前缀后缀的个数，而这道题维护的一位数组dp[i]表示，到位置i-1为止的重复字符串的字符个数，不包括被重复的那个字符串，什么意思呢，我们举个例子，比如"abcabc"的dp数组为[0 0 0 0 1 2 3]，dp数组长度要比原字符串长度多一个。那么我们看最后一个位置数字为3，就表示重复的字符串的字符数有3个。如果是"abcabcabc"，那么dp数组为[0 0 0 0 1 2 3 4 5 6]，我们发现最后一个数字为6，那么表示重复的字符串为“abcabc”，有6个字符。那么怎么通过最后一个数字来知道原字符串是否由重复的子字符串组成的呢，首先当然是最后一个数字不能为0，而且还要满足dp[n] % (n - dp[n]) == 0才行，因为n - dp[n]是一个子字符串的长度，那么重复字符串的长度和肯定是一个子字符串的整数倍

    bool repeatedSubstringPattern(string str) {
        int i = 1, j = 0, n = str.size();
        vector<int> dp(n + 1, 0);
        while (i < n) {
            if (str[i] == str[j]) dp[++i] = ++j;
            else if (j == 0) ++i;
            else j = dp[j];
        }
        return dp[n] && (dp[n] % (n - dp[n]) == 0);
    }d

http://bookshadow.com/weblog/2016/11/13/leetcode-repeated-substring-pattern/

解法I KMP算法

时间复杂度 O(n)

记字符串长度为size

利用KMP算法求next数组，记next数组的最后一个元素为p

若p > 0 并且 size % (size - p) == 0，返回True

next数组具有如下性质：

str[ 0 : next[i] ] == str[ i + 1 - next[i] : i + 1 ]

例如：

a, b, c, d, a, b, c, a, b, c, d, a, b, c, d, c

0, 0, 0, 0, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 4, 0

X. https://blog.csdn.net/Cloudox_/article/details/53240887

首先如果字符串的字母数小于两个，那也不用判断了，一定不可能是多个子字符串重复出来的。

设立两个标记，一个快一点一个慢一点。快的从第二个字母开始往后遍历，找到与第一个字母一样的，就停下来开始判断，慢的从第一个字母开始，两个标记一起往后遍历，看是不是可以完全一致，一直到最后一个字母，如果从后面开始的与从头开始的一模一样，说明是存在重复的字符串，并且由它重复组成的。

如果遍历时又出现不一样了，这时候说明还不是重复的，就要继续当做从头开始找了，继续往后遍历快的标记，找与第一个字母一样的，然后重复上面的过程，注意找到后慢的标记需要回到第一个字母。

不管找没找到，当快的遍历到最后字母了就停止遍历了，这时如果是没找到的状态，那就直接false了，如果是找到的状态，那么有可能是确实重复组成的，也有可能只是最后一小节和前面的一样，中间的一段还是没有重复的，这时候根据慢的标记来判断，如果慢的标记已经走过了整个字符串的一半，就说明至少是二分之一的子字符串重复两次得到的，或者更小的字符串重复多次。如果慢的标记还没走过一半，说明中间还有一部分并没有来得及重复，这时候就依然是false了。在判断慢的是否过半了时，由于存在整个字符串长度可能为基数也可能为偶数的情况，所以用慢标记的位置乘以二来和整体长度作比较进行判断比较合适。

这个方法只需要遍历一次字符串，时间复杂度只有O（n），还是很快的，实际结果也打败了95%的人~

public boolean repeatedSubstringPattern(String str) {

if (str.length() < 2) return false;

char[] arr = str.toCharArray();

int low = 0;

int fast = 1;

boolean match = false;// 匹配到与否的标记

while (fast < arr.length) {

if (arr[fast] == arr[0]) {

// 匹配到了第一个，开始判断是否重复

for (low = 0; fast < arr.length;) {

if (arr[low] == arr[fast]) {// 往后走进行不断判断

match = true;

low++;

fast++;

} else {// 匹配不一致，跳出去重新找

match = false;

break;

}

} else {// 没能匹配首字母

match = false;

fast++;

}

return match && low*2 >= arr.length;

}

X. O(n^2) solution

这道题给了我们一个字符串，问其是否能拆成n个重复的子串。那么既然能拆分成多个子串，那么每个子串的长度肯定不能大于原字符串长度的一半，那么我们可以从原字符串长度的一半遍历到1，如果当前长度能被总长度整除，说明可以分成若干个子字符串，我们将这些子字符串拼接起来看跟原字符串是否相等。如果拆完了都不相等，返回false

https://leetcode.com/problems/repeated-substring-pattern/discuss/94352/Java-Simple-Solution-with-Explanation

public boolean repeatedSubstringPattern(String str) {
 int l = str.length();
 for(int i=l/2;i>=1;i--) {
  if(l%i==0) {
   int m = l/i;
   String subS = str.substring(0,i);
   StringBuilder sb = new StringBuilder();
   for(int j=0;j<m;j++) {
    sb.append(subS);
   }
   if(sb.toString().equals(str)) return true;
  }
 }
 return false;
}

The length of the repeating substring must be a divisor of the length of the input string
Search for all possible divisor of str.length, starting for length/2
If i is a divisor of length, repeat the substring from 0 to i the number of times i is contained in s.length
If the repeated substring is equals to the input str return true

https://starllap.space/2017/02/12/Leetcode-459-Repeated-Substring-Pattern/


    public boolean repeatedSubstringPattern(String str) {

        if(str == null || str.length() <= 1) return false;

        int len = str.length();

        for ( int i=0; i<=len/2; i++) {

            boolean flag = true;

            if (len%(i+1) != 0 || len == i+1) continue;

            String subString = str.substring(0,(i+1));

            int start = i+1;

            int end = start * 2;

            while (end<=len) {

                if (!subString.equals(str.substring(start,end))){

                    flag = false; break;

                }

                start = start + i+1;

                end = end + i+1;

            }

            // System.out.println(flag);

            if (flag) return true;



        }

        return false;

    }

https://zhuanlan.zhihu.com/p/34090604

遍历所有n的约数，然后依次与原字符串进行比较。

我的做法是假设约数为m，那么可以每次比较原字符串的m个字符，看是否一致，如果都一致，则返回True，循环结束。
具体代码如下：

public boolean repeatedSubstringPattern(String s) {
    char[] charS = s.toCharArray();
    int n = charS.length;
    for (int i = n / 2; i >= 1; i--) {
        if (n % i == 0) {
            boolean temp = false;
            for (int j = i; j < charS.length; j = j + i) {
                for (int k = j; k < j + i; k++) {
                    if (charS[k] != charS[k - i]) {
                        temp = true;
                        break;
                    }
                }
                if (temp) {
                    break;
                }
            }
            if (!temp) {
                return true;
            }
        }
    }
    return false;
}

在讨论区看到另一种判断思路，就是得到重复次数n/m，然后依次重复，再与原字符串比较。这种代码实现比较简单。

具体代码如下：

public boolean repeatedSubstringPattern(String s) {
    int n = s.length();
    for (int i = n / 2; i >= 1; i--) {
        if (n % i == 0) {
            int k = n / i;
            String sub = s.substring(0, i);
            StringBuilder sb = new StringBuilder();
            int j;
            for (j = 1; j < k; j++) {
                if (!sub.equals(s.substring(j * i, i + j * i))) {
                    break;
                }
            }
            if (j == k) {
                return true;
            }
        }
    }
    return false;
}

有人一做字符串运算，就想到了正则表达式，ok，确实可行。不解释，直接借用别人的代码。

def repeatedSubstringPattern(self, str):
    """
    :type str: str
    :rtype: bool
    """
    import re
    return bool(re.match(r"^([a-z]+)\1+$", str))

作者：如烟花非花
链接：https://www.jianshu.com/p/4406bf26366e
来源：简书
简书著作权归作者所有，任何形式的转载都请联系作者获得授权并注明出处。

蛮力法（Brute Force）

时间复杂度 O(k * n)，其中n是字符串长度，k是n的约数个数

若字符串可以由其子串重复若干次构成，则子串的起点一定从原串的下标0开始

并且子串的长度一定是原串长度的约数

整数约数的个数可以通过统计其质因子的幂得到，而输入规模10000以内整数的约数个数很少

因此通过蛮力法，枚举子串长度即可

def repeatedSubstringPattern(self, str): """ :type str: str :rtype: bool """ size = len(str) for x in range(1, size / 2 + 1): if size % x: continue if str[:x] * (size / x) == str: return True return False

LeetCode 459 - Repeated Substring Pattern

Labels

Popular Posts