LeetCode 943 - Find the Shortest Superstring


http://massivealgorithms.blogspot.com/2015/06/travelling-salesman-problem-set-1-naive.html
https://leetcode.com/problems/find-the-shortest-superstring/
Given an array A of strings, find any smallest string that contains each string in A as a substring.
We may assume that no string in A is substring of another string in A.
Example 1:
Input: ["alex","loves","leetcode"]
Output: "alexlovesleetcode"
Explanation: All permutations of "alex","loves","leetcode" would also be accepted.
Example 2:
Input: ["catg","ctaagt","gcta","ttca","atgcatc"]
Output: "gctaagttcatgcatc"
Note:


  1. 1 <= A.length <= 12
  2. 1 <= A[i].length <= 20


Approach 1: Dynamic Programming
We have to put the words into a row, where each word may overlap the previous word. This is because no word is contained in any word.
Also, it is sufficient to try to maximize the total overlap of the words.
Say we have put some words down in our row, ending with word A[i]. Now say we put down word A[j] as the next word, where word j hasn't been put down yet. The overlap increases by overlap(A[i], A[j]).
We can use dynamic programming to leverage this recursion. Let dp(mask, i) be the total overlap after putting some words down (represented by a bitmask mask), for which A[i] was the last word put down. Then, the key recursion is dp(mask ^ (1<<j), j) = max(overlap(A[i], A[j]) + dp(mask, i)), where the jth bit is not set in mask, and iranges over all bits set in mask.
Of course, this only tells us what the maximum overlap is for each set of words. We also need to remember each choice along the way (ie. the specific i that made dp(mask ^ (1<<j), j) achieve a minimum) so that we can reconstruct the answer.
Algorithm
Our algorithm has 3 main components:
  • Precompute overlap(A[i], A[j]) for all possible i, j.
  • Calculate dp[mask][i], keeping track of the "parenti for each j as described above.
  • Reconstruct the answer using parent information.
Please see the implementation for more details about each section.
  • Time Complexity: O(N^2 (2^N + W)), where N is the number of words, and W is the maximum length of each word.
  • Space Complexity: O(N (2^N + W))
  public String shortestSuperstring(String[] A) {
    int N = A.length;

    // Populate overlaps
    int[][] overlaps = new int[N][N];
    for (int i = 0; i < N; ++i)
      for (int j = 0; j < N; ++j)
        if (i != j) {
          int m = Math.min(A[i].length(), A[j].length());
          for (int k = m; k >= 0; --k)
            if (A[i].endsWith(A[j].substring(0, k))) {
              overlaps[i][j] = k;
              break;
            }
        }

    // dp[mask][i] = most overlap with mask, ending with ith element
    int[][] dp = new int[1 << N][N];
    int[][] parent = new int[1 << N][N];
    for (int mask = 0; mask < (1 << N); ++mask) {
      Arrays.fill(parent[mask], -1);

      for (int bit = 0; bit < N; ++bit)
        if (((mask >> bit) & 1) > 0) {
          // Let's try to find dp[mask][bit]. Previously, we had
          // a collection of items represented by pmask.
          int pmask = mask ^ (1 << bit);
          if (pmask == 0)
            continue;
          for (int i = 0; i < N; ++i)
            if (((pmask >> i) & 1) > 0) {
              // For each bit i in pmask, calculate the value
              // if we ended with word i, then added word 'bit'.
              int val = dp[pmask][i] + overlaps[i][bit];
              if (val > dp[mask][bit]) {
                dp[mask][bit] = val;
                parent[mask][bit] = i;
              }
            }
        }
    }

    // # Answer will have length sum(len(A[i]) for i) - max(dp[-1])
    // Reconstruct answer, first as a sequence 'perm' representing
    // the indices of each word from left to right.

    int[] perm = new int[N];
    boolean[] seen = new boolean[N];
    int t = 0;
    int mask = (1 << N) - 1;

    // p: the last element of perm (last word written left to right)
    int p = 0;
    for (int j = 0; j < N; ++j)
      if (dp[(1 << N) - 1][j] > dp[(1 << N) - 1][p])
        p = j;

    // Follow parents down backwards path that retains maximum overlap
    while (p != -1) {
      perm[t++] = p;
      seen[p] = true;
      int p2 = parent[mask][p];
      mask ^= 1 << p;
      p = p2;
    }

    // Reverse perm
    for (int i = 0; i < t / 2; ++i) {
      int v = perm[i];
      perm[i] = perm[t - 1 - i];
      perm[t - 1 - i] = v;
    }

    // Fill in remaining words not yet added
    for (int i = 0; i < N; ++i)
      if (!seen[i])
        perm[t++] = i;

    // Reconstruct final answer given perm
    StringBuilder ans = new StringBuilder(A[perm[0]]);
    for (int i = 1; i < N; ++i) {
      int overlap = overlaps[perm[i - 1]][perm[i]];
      ans.append(A[perm[i]].substring(overlap));
    }

    return ans.toString();

  }
g[i][j] is the cost of appending word[j] after word[i], or weight of edge[i][j].
We would like find the shortest path to visit each node from 0 to n – 1 once and only once this is called the Travelling sells man’s problem which is NP-Complete.
We can solve it with DP that uses exponential time.
dp[s][i] := min distance to visit nodes (represented as a binary state s) once and only once and the path ends with node i.
e.g. dp[7][1] is the min distance to visit nodes (0, 1, 2) and ends with node 1, the possible paths could be (0, 2, 1), (2, 0, 1).
Time complexity: O(n^2 * 2^n)
Space complexity: O(n * 2^n)

(状态压缩动态规划) O(2nn2)O(2nn2)
假设 SS 表示一个集合,设 f(S,i)f(S,i) 表示已经满足了 SS 中的字符串,且结尾的字符串是 A 中的第 ii 个时的最短长度。
转移时,每次枚举一个在 SS 中的字符串 ii 当做结尾 ,然后再枚举一个不在 SS 中的字符串 jj,预处理 ii 的后缀和 jj 的前缀重叠的个数。
初始时,SS 中只加入一个字符串,长度为当前字符串的长度。
答案为以 SS 为全集,以ii 为结尾的字符串中的最小值。
SS 可以用一个二进制数字表示,二进制位为 0 则字符串不存在,为 1 则字符串存在。
最后生成答案是只需要按照动态规划的最优值往回找即可。
时间复杂度
状态数为 O(2nn)O(2nn),转移需要 O(n)O(n) 的时间,故总时间复杂度为 O(2nn2)O(2nn2)。

  string shortestSuperstring(vector<string>& A) {        
    const int n = A.size();
    vector<vector<int>> g(n, vector<int>(n));
    for (int i = 0; i < n; ++i)
      for (int j = 0; j < n; ++j) {
        g[i][j] = A[j].length();
        for (int k = 1; k <= min(A[i].length(), A[j].length()); ++k)
          if (A[i].substr(A[i].size() - k) == A[j].substr(0, k))
            g[i][j] = A[j].length() - k;
      }
    
    vector<vector<int>> dp(1 << n, vector<int>(n, INT_MAX / 2));
    vector<vector<int>> parent(1 << n, vector<int>(n, -1));
    
    for (int i = 0; i < n; ++i) dp[1 << i][i] = A[i].length();
    
    for (int s = 1; s < (1 << n); ++s) {
      for (int j = 0; j < n; ++j) {
        if (!(s & (1 << j))) continue;
        int ps = s & ~(1 << j);
        for (int i = 0; i < n; ++i) {
          if (dp[ps][i] + g[i][j] < dp[s][j]) {
            dp[s][j] = dp[ps][i] + g[i][j];
            parent[s][j] = i;
          }
        }
      }
    }
    
    auto it = min_element(begin(dp.back()), end(dp.back()));
    int j = it - begin(dp.back());
    int s = (1 << n) - 1;
    string ans;
    while (s) {
      int i = parent[s][j];
      if (i < 0) ans = A[j] + ans;
      else ans = A[j].substr(A[j].length() - g[i][j]) + ans;
      s &= ~(1 << j);
      j = i;
    }
    return ans;
  }

https://zhanghuimeng.github.io/post/leetcode-943-find-the-shortest-superstring/
我犯了这些错误:
  • 在搞错了状态变量的范围的同时没有设置好变量的初值
  • 计算两个字符串的overlap的函数少考虑了一种情况
所以就这样了……

我觉得比较简单的方法还是状态压缩DP。[1]dp[mask][i]表示总共包含mask这些字符串,且以A[i]作为结尾的字符串的最小长度(或者最大overlap长度;当字符串都是那么多时,这两者是一样的。然后就可以递推了:dp[mask ^ 1<<j][j] = max(dp[mask][i] + overlap(i, j))。显然,我们事实上可以不用保存具体的字符串(因为有最后一个字符串就够用了),而且可以事先计算出每两个字符串之间的overlap(这样就不需要重复计算)。不过这样就需要最后重建DP过程了……不过字符串处理过程太耗时了,也可以理解……
不过这样做了之后时间效率大大提高了(从1324ms提高到了28ms)
https://zxi.mytechroad.com/blog/searching/leetcode-943-find-the-shortest-superstring/

Solution 1: Search + Pruning

Try all permutations. Pre-process the cost from word[i] to word[j] and store it in g[i][j].
Time complexity: O(n!)
Space complexity: O(n)
  string shortestSuperstring(vector<string>& A) {    
    const int n = A.size();
    g_ = vector<vector<int>>(n, vector<int>(n));
    for (int i = 0; i < n; ++i)
      for (int j = 0; j < n; ++j) {
        g_[i][j] = A[j].length();
        for (int k = 1; k <= min(A[i].length(), A[j].length()); ++k)
          if (A[i].substr(A[i].size() - k) == A[j].substr(0, k))            
            g_[i][j] = A[j].length() - k;
      }
    vector<int> path(n);
    best_len_ = INT_MAX;
    dfs(A, 0, 0, 0, path);    
    string ans = A[best_path_[0]];
    for (int k = 1; k < best_path_.size(); ++k) {
      int i = best_path_[k - 1];
      int j = best_path_[k];
      ans += A[j].substr(A[j].length() - g_[i][j]);
    }
    return ans;
  }
private:
  vector<vector<int>> g_;
  vector<int> best_path_;
  int best_len_;
  void dfs(const vector<string>& A, int d, int used, int cur_len, vector<int>& path) {
    if (cur_len >= best_len_) return;
    if (d == A.size()) {
      best_len_ = cur_len;
      best_path_ = path;
      return;
    }
    
    for (int i = 0; i < A.size(); ++i) {
      if (used & (1 << i)) continue;      
      path[d] = i;
      dfs(A,
          d + 1,
          used | (1 << i),
          d == 0 ? A[i].length() : cur_len + g_[path[d - 1]][i],
          path);
    }
  }




https://www.geeksforgeeks.org/shortest-superstring-problem/
Shortest Superstring Greedy Approximate Algorithm
Shortest Superstring Problem is a NP Hard problem. A solution that always finds shortest superstring takes exponential time. Below is an Approximate Greedy algorithm.
Let arr[] be given set of strings.

1) Create an auxiliary array of strings, temp[].  Copy contents
   of arr[] to temp[]

2) While temp[] contains more than one strings
     a) Find the most overlapping string pair in temp[]. Let this
        pair be 'a' and 'b'. 
     b) Replace 'a' and 'b' with the string obtained after combining
        them.

3) The only string left in temp[] is the result, return it.
Two strings are overlapping if prefix of one string is same suffix of other string or vice verse. The maximum overlap mean length of the matching prefix and suffix is maximum.
// Function to calculate maximum overlap in two given strings
int findOverlappingPair(string str1, string str2, string &str)
{
    // max will store maximum overlap i.e maximum
    // length of the matching prefix and suffix
    int max = INT_MIN;
    int len1 = str1.length();
    int len2 = str2.length();
  
    // check suffix of str1 matches with prefix of str2
    for (int i = 1; i <= min(len1, len2); i++)
    {
        // compare last i characters in str1 with first i
        // characters in str2
        if (str1.compare(len1-i, i, str2, 0, i) == 0)
        {
            if (max < i)
            {
                //update max and str
                max = i;
                str = str1 + str2.substr(i);
            }
        }
    }
  
    // check prefix of str1 matches with suffix of str2
    for (int i = 1; i <= min(len1, len2); i++)
    {
        // compare first i characters in str1 with last i
        // characters in str2
        if (str1.compare(0, i, str2, len2-i, i) == 0)
        {
            if (max < i)
            {
                //update max and str
                max = i;
                str = str2 + str1.substr(i);
            }
        }
    }
  
    return max;
}
  
// Function to calculate smallest string that contains
// each string in the given set as substring.
string findShortestSuperstring(string arr[], int len)
{
    // run len-1 times to consider every pair
    while(len != 1)
    {
        int max = INT_MIN;  // to store  maximum overlap
        int l, r;   // to store array index of strings
        // involved in maximum overlap
        string resStr;  // to store resultant string after
        // maximum overlap
  
        for (int i = 0; i < len; i++)
        {
            for (int j = i + 1; j < len; j++)
            {
                string str;
  
                // res will store maximum length of the matching
                // prefix and suffix str is passed by reference and
                // will store the resultant string after maximum
                // overlap of arr[i] and arr[j], if any.
                int res = findOverlappingPair(arr[i], arr[j], str);
  
                // check for maximum overlap
                if (max < res)
                {
                    max = res;
                    resStr.assign(str);
                    l = i, r = j;
                }
            }
        }
  
        len--;  //ignore last element in next cycle
  
        // if no overlap, append arr[len] to arr[0]
        if (max == INT_MIN)
            arr[0] += arr[len];
        else
        {
            arr[l] = resStr;   // copy resultant string to index l
            arr[r] = arr[len];  // copy string at last index to index r
        }
    }
    return arr[0];
}
https://www.geeksforgeeks.org/shortest-superstring-problem-set-2-using-set-cover/


Labels

LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

Popular Posts