Massive Algorithms: LeetCode 718 - Maximum Length of Repeated Subarray

LeetCode 718 - Maximum Length of Repeated Subarray

https://leetcode.com/problems/maximum-length-of-repeated-subarray/

Given two integer arrays A and B, return the maximum length of an subarray that appears in both arrays.

Example 1:

Input:
A: [1,2,3,2,1]
B: [3,2,1,4,7]
Output: 3
Explanation: 
The repeated subarray with maximum length is [3, 2, 1].

Note:

1 <= len(A), len(B) <= 1000
0 <= A[i], B[i] < 100

Use dynamic programming. dp[i][j] will be the answer for inputs A[i:], B[j:].

X. Bisection
https://leetcode.com/problems/maximum-length-of-repeated-subarray/discuss/156891/Binary-Search-%2B-Rabin-Karp-%2B-Hash-Table-O(N-log-N)-Beats-100

    int findLength(vector<int>& A, vector<int>& B) {
        const int P = 19941229; //Rabin-Karp
        
        vector<int> hash1(A.size() + 1, 0), hash2(B.size() + 1, 0), p(max(A.size(), B.size()) + 1);
        int len = max(A.size(), B.size());
        p[0] = 1;
        for (int i = 1; i <= len; i++)
            p[i] = p[i - 1] * P;
        for (int i = 1; i <= A.size(); i++)
            hash1[i] = hash1[i - 1] + A[i - 1] * p[i];
        for (int i = 1; i <= B.size(); i++)
            hash2[i] = hash2[i - 1] + B[i - 1] * p[i];
        int l = 1, r = min(A.size(), B.size());
        while (l <= r) { //Binary search the answer
            int m = (l + r) / 2;
            unordered_set<int> hash;
            bool ok = false;
            for (int i = 1; i + m - 1 <= A.size(); i++) //Calculate Rabin-Karp hash for all substring s for the first string of length m, insert into hash table
                hash.insert((hash1[i + m - 1] - hash1[i - 1]) * p[len - (i + m - 1)]);
            for (int i = 1; i + m - 1 <= B.size(); i++){ //Calculate Rabin-Karp hash for the second string, query the hash table
                if (hash.count((hash2[i + m - 1] - hash2[i - 1]) * p[len - (i + m - 1)])) {
                    ok = true;
                    break;
                }
            }
            if (ok)
                l = m + 1;
            else
                r = m - 1;
        }
        return r;
    }

https://leetcode.com/articles/maximum-length-of-repeated-subarray/
Approach #4: Binary Search with Rolling Hash

As in Approach #2, we will binary search for the answer. However, we will use a rolling hash (Rabin-Karp algorithm) to store hashes in our set structure.

Algorithm

For some prime

p

, consider the following function modulo some prime modulus

\mathcal{M}

\text{hash}(S) = \sum_{0 \leq i &lt; len(S)} p^i * S[i]

Notably,

\text{hash}(S[1:] + x) = \frac{(\text{hash}(S) - S[0])}{p} + p^{n-1} x

. This shows we can get the hash of all

A[i:i+\text{guess}]

in linear time. We will also use the fact that

p^{-1} = p^{\mathcal{M}-2} \mod \mathcal{M}

For every i >= length - 1, we will want to record the hash of A[i-length+1], A[i-length+2], ..., A[i]. After, we will truncate the first element by h = (h - A[i - (length - 1)]) * Pinv % MOD to get ready to add the next element.

To make our algorithm air tight, we also make a naive check when our work with rolling hashes says that we have found a match.

Time Complexity: $O((M+N) * \log{(\min(M, N))})$ , where $M, N$ are the lengths of A, B. The log factor is contributed by the binary search, while creating the rolling hashes is $O(M + N)$ . The checks for duplicate hashes are $O(1)$ . If we perform a naive check to make sure our answer is correct, it adds a factor of $O(\min(M, N))$ to our cost of check, which keeps the complexity the same.
Space Complexity: $O(M)$ , the space used to store hashes and the subarrays in our final naive check.

    int P = 113;
    int MOD = 1_000_000_007;
    int Pinv = BigInteger.valueOf(P).modInverse(BigInteger.valueOf(MOD)).intValue();

    private int[] rolling(int[] source, int length) {
        int[] ans = new int[source.length - length + 1];
        long h = 0, power = 1;
        if (length == 0) return ans;
        for (int i = 0; i < source.length; ++i) {
            h = (h + source[i] * power) % MOD;
            if (i < length - 1) {
                power = (power * P) % MOD;
            } else {
                ans[i - (length - 1)] = (int) h;
                h = (h - source[i - (length - 1)]) * Pinv % MOD;
                if (h < 0) h += MOD;
            }
        }
        return ans;
    }

    private boolean check(int guess, int[] A, int[] B) {
        Map<Integer, List<Integer>> hashes = new HashMap();
        int k = 0;
        for (int x: rolling(A, guess)) {
            hashes.computeIfAbsent(x, z -> new ArrayList()).add(k++);
        }
        int j = 0;
        for (int x: rolling(B, guess)) {
            for (int i: hashes.getOrDefault(x, new ArrayList<Integer>()))
                if (Arrays.equals(Arrays.copyOfRange(A, i, i+guess),
                                  Arrays.copyOfRange(B, j, j+guess))) {
                    return true;
                }
            j++;
        }
        return false;
    }

    public int findLength(int[] A, int[] B) {
        int lo = 0, hi = Math.min(A.length, B.length) + 1;
        while (lo < hi) {
            int mi = (lo + hi) / 2;
            if (check(mi, A, B)) lo = mi + 1;
            else hi = mi;
        }
        return lo - 1;
    }

X. DP
Approach #3: Dynamic Programming

Since a common subarray of A and B must start at some A[i] and B[j], let dp[i][j] be the longest common prefix of A[i:] and B[j:]. Whenever A[i] == B[j], we know dp[i][j] = dp[i+1][j+1] + 1. Also, the answer is max(dp[i][j]) over all i, j.

We can perform bottom-up dynamic programming to find the answer based on this recurrence. Our loop invariant is that the answer is already calculated correctly and stored in dp for any larger i, j.

Time Complexity: $O(M*N)$ , where $M, N$ are the lengths of A, B.
Space Complexity: $O(M*N)$ , the space used by dp.

    public int findLength(int[] A, int[] B) {
        int ans = 0;
        int[][] memo = new int[A.length + 1][B.length + 1];
        for (int i = A.length - 1; i >= 0; --i) {
            for (int j = B.length - 1; j >= 0; --j) {
                if (A[i] == B[j]) {
                    memo[i][j] = memo[i+1][j+1] + 1;
                    if (ans < memo[i][j]) ans = memo[i][j];
                }
            }
        }
        return ans;
    }

https://leetcode.com/problems/maximum-length-of-repeated-subarray/discuss/109039/Concise-Java-DP%3A-Same-idea-of-Longest-Common-Substring

 * dp[i][j] = a[i] == b[j] ? 1 + dp[i + 1][j + 1] : 0;
 * dp[i][j] : max lenth of common subarray start at a[i] & b[j];

    public int findLength(int[] A, int[] B) {
        if(A == null||B == null) return 0;
        int m = A.length;
        int n = B.length;
        int max = 0;
        //dp[i][j] is the length of longest common subarray ending with nums[i] and nums[j]
        int[][] dp = new int[m + 1][n + 1];
        for(int i = 0;i <= m;i++){
            for(int j = 0;j <= n;j++){
                if(i == 0 || j == 0){
                    dp[i][j] = 0;
                }
                else{
                    if(A[i - 1] == B[j - 1]){
                        dp[i][j] = 1 + dp[i - 1][j - 1];
                        max = Math.max(max,dp[i][j]);
                    }
                }
            }
        }
        return max;
    }

O(n) space

  public int findLength2(int[] A, int[] B) {
        if(A == null || B == null || A.length == 0 || B.length == 0)return 0;
        int Rows = A.length;
        int Cols = B.length;

        // dp[j] is the length of longest common subarray ending with A[i-1], B[j-1]
        int[] dp = new int[Cols + 1];
        int maxLen = 0;
        for (int i = 1; i <=Rows ; i++) {
            for (int j =  Cols; j > 0 ; j--) {
                if(A[i -1] == B[j-1])
                {
                    dp[j] = 1 + dp[j-1] ;
                    maxLen = Math.max(maxLen, dp[j]);
                }else {
                    dp[j] = 0;
                }
            }
        }
        return maxLen;
    }

X. DP, O(1) space
https://leetcode.com/problems/maximum-length-of-repeated-subarray/discuss/109040/Java-O(mn)-time-O(1)-space

maxLen[i][j] --> the max length of common subarray ending at A[i] and B[j]
=> 0 , if A[i] != A[j]
maxLen[i-1][j-1] + 1, if A[i] = B[j]

So maxLen[i][j] only depends on maxLen[i-1][j-1], so you only need to keep maxLen[i-1][j-1] which is one element.

In below code we use maxLenEnding to keep maxLen[i-1][j-1] and traverse the matrix diagonally

public int findLength(int[] A, int[] B) {
        
        int maxLen = 0;
        
        
        for (int j = 0; j < B.length; j++) {
            int maxLenEnding = 0;
            for (int i = 0, k = j; i < A.length && k < B.length; i++, k++) {
                if (A[i] != B[k]) maxLenEnding = 0;
                else {
                    maxLenEnding++;
                    maxLen = Math.max(maxLen, maxLenEnding);
                }
            }
        }
        
        for (int i =1; i < A.length; i++) {
            int maxLenEnding = 0;
            for (int j = 0, k = i; k < A.length && j < B.length; j++, k++) {
                if (A[k] != B[j]) maxLenEnding = 0;
                else {
                    maxLenEnding++;
                    maxLen = Math.max(maxLen, maxLenEnding);
                }
            }
        }
        
        return maxLen;
        
        
        
    }

https://leetcode.com/problems/maximum-length-of-repeated-subarray/discuss/109064/Easy-O(n2)-Time-O(1)-Space-solution-No-DP

Approach #2: Binary Search with Naive Check

If there is a length k subarray common to A and B, then there is a length j <= k subarray as well.

Let check(length) be the answer to the question "Is there a subarray with length length, common to Aand B?" This is a function with range that must take the form [True, True, ..., True, False, False, ..., False] with at least one True. We can binary search on this function.

Algorithm

Focusing on the binary search, our invariant is that check(hi) will always be False. We'll start with hi = min(len(A), len(B)) + 1; clearly check(hi) is False.

Now we perform our check in the midpoint mi of lo and hi. When it is possible, then lo = mi + 1, and when it isn't, hi = mi. This maintains the invariant. At the end of our binary search, hi == lo and lo is the lowest value such that check(lo) is False, so we want lo - 1.

As for the check itself, we can naively check whether any A[i:i+k] == B[j:j+k] using set structures.

Time Complexity: $O((M + N) * \min(M, N) * \log{(\min(M, N))})$ , where $M, N$ are the lengths of A, B. The log factor comes from the binary search. The complexity of our naive check of a given $\text{length}$ is $O((M+N) * \text{length})$ , as we will create the seen strings with complexity $O(M * \text{length})$ , then search for them with complexity $O(N * \text{length})$ , and our total complexity when performing our check is the addition of these two.
Space Complexity: $O(M^2)$ , the space used by seen.

    public boolean check(int length, int[] A, int[] B) {
        Set<String> seen = new HashSet();
        for (int i = 0; i + length <= A.length; ++i) {
            seen.add(Arrays.toString(Arrays.copyOfRange(A, i, i+length)));
        }
        for (int j = 0; j + length <= B.length; ++j) {
            if (seen.contains(Arrays.toString(Arrays.copyOfRange(B, j, j+length)))) {
                return true;
            }
        }
        return false;
    }

    public int findLength(int[] A, int[] B) {
        int lo = 0, hi = Math.min(A.length, B.length) + 1;
        while (lo < hi) {
            int mi = (lo + hi) / 2;
            if (check(mi, A, B)) lo = mi + 1;
            else hi = mi;
        }
        return lo - 1;
    }

Approach #1: Brute Force with Initial Character Map

In a typical brute force, for all starting indices i of A and j of B, we will check for the longest matching subarray A[i:i+k] == B[j:j+k] of length k. This would look roughly like the following psuedocode:

ans = 0
for i in [0 .. A.length - 1]:
    for j in [0 .. B.length - 1]:
        k = 0
        while (A[i+k] == B[j+k]): k += 1 #and i+k < A.length etc.
        ans = max(ans, k)

Our insight is that in typical cases, most of the time A[i] != B[j]. We could instead keep a hashmap Bstarts[A[i]] = all j such that B[j] == A[i], and only loop through those in our j loop.

Time Complexity: $O(M*N*\min(M, N))$ , where $M, N$ are the lengths of A, B. The worst case is when all the elements are equal.
Space Complexity: $O(N)$ , the space used by Bstarts. (Of course, we could amend our algorithm to make this $O(\min(M, N))$ .)

    public int findLength(int[] A, int[] B) {
        int ans = 0;
        Map<Integer, ArrayList<Integer>> Bstarts = new HashMap();
        for (int j = 0; j < B.length; j++) {
            Bstarts.computeIfAbsent(B[j], x -> new ArrayList()).add(j);
        }

        for (int i = 0; i < A.length; i++) if (Bstarts.containsKey(A[i])) {
            for (int j: Bstarts.get(A[i])) {
                int k = 0;
                while (i+k < A.length && j+k < B.length && A[i+k] == B[j+k]) {
                    k++
                }
                ans = Math.max(ans, k);
            }
        }
        return ans;
    }

LeetCode 718 - Maximum Length of Repeated Subarray

Labels

Popular Posts