Longest Common Subsequence


Related: LintCode 762 - Longest Common Subsequence II
Dynamic Programming | Set 4 (Longest Common Subsequence) | GeeksforGeeks
Given two sequences, find the length of longest subsequence present in both of them. A subsequence is a sequence that appears in the same relative order, but not necessarily contiguous.

A subsequence is a sequence that appears in the same relative order, but not necessarily contiguous. For example, “abc”, “abg”, “bdf”, “aeg”, ‘”acefg”, .. etc are subsequences of “abcdefg”. So a string of length n has 2^n different possible subsequences.
LCS for input Sequences “ABCDGH” and “AEDFHR” is “ADH” of length 3.
1) Optimal Substructure: 
Let the input sequences be X[0..m-1] and Y[0..n-1] of lengths m and n respectively. And let L(X[0..m-1], Y[0..n-1]) be the length of LCS of the two sequences X and Y. Following is the recursive definition of L(X[0..m-1], Y[0..n-1]).
If last characters of both sequences match (or X[m-1] == Y[n-1]) then
L(X[0..m-1], Y[0..n-1]) = 1 + L(X[0..m-2], Y[0..n-2])
If last characters of both sequences do not match (or X[m-1] != Y[n-1]) then
L(X[0..m-1], Y[0..n-1]) = MAX ( L(X[0..m-2], Y[0..n-1]), L(X[0..m-1], Y[0..n-2])
Following is a tabulated implementation for the LCS problem.
http://www.csl.mtu.edu/cs2321/www/pdfslides/Goodrich_6e_Ch13_DynamicProgramming-handouts.pdf


int lcs( char *X, char *Y, int m, int n )
{
   int L[m+1][n+1];
   int i, j;
  
   /* Following steps build L[m+1][n+1] in bottom up fashion. Note
      that L[i][j] contains length of LCS of X[0..i-1] and Y[0..j-1] */
   for (i=0; i<=m; i++)
   {
     for (j=0; j<=n; j++)
     {
       if (i == 0 || j == 0)
         L[i][j] = 0;
  
       else if (X[i-1] == Y[j-1])
         L[i][j] = L[i-1][j-1] + 1;
  
       else
         L[i][j] = max(L[i-1][j], L[i][j-1]);
     }
   }
    
   /* L[m][n] contains length of LCS for X[0..n-1] and Y[0..m-1] */
   return L[m][n];
}

Print Solution:

https://www.jiuzhang.com/solution/longest-common-subsequence/
    public int longestCommonSubsequence(String A, String B) {
        int n = A.length();
        int m = B.length();
        int f[][] = new int[n + 1][m + 1];
        for(int i = 1; i <= n; i++){
            for(int j = 1; j <= m; j++){
                f[i][j] = Math.max(f[i - 1][j], f[i][j - 1]);
                if(A.charAt(i - 1) == B.charAt(j - 1))
                    f[i][j] = f[i - 1][j - 1] + 1;
            }
        }
        return f[n][m];
    }
    http://rosettacode.org/wiki/Longest_common_subsequence#Java
    public static String lcs(String a, String b){
        int aLen = a.length();
        int bLen = b.length();
        if(aLen == 0 || bLen == 0){
            return "";
        }else if(a.charAt(aLen-1) == b.charAt(bLen-1)){
            return lcs(a.substring(0,aLen-1),b.substring(0,bLen-1))
                + a.charAt(aLen-1);
        }else{
            String x = lcs(a, b.substring(0,bLen-1));
            String y = lcs(a.substring(0,aLen-1), b);
            return (x.length() > y.length()) ? x : y;
        }
    }
    DP
    public static String lcs(String a, String b) {
        int[][] lengths = new int[a.length()+1][b.length()+1];
     
        // row 0 and column 0 are initialized to 0 already
     
        for (int i = 0; i < a.length(); i++)
            for (int j = 0; j < b.length(); j++)
                if (a.charAt(i) == b.charAt(j))
                    lengths[i+1][j+1] = lengths[i][j] + 1;
                else
                    lengths[i+1][j+1] =
                        Math.max(lengths[i+1][j], lengths[i][j+1]);
     
        // read the substring out from the matrix
        StringBuffer sb = new StringBuffer();
        for (int x = a.length(), y = b.length();
             x != 0 && y != 0; ) {
            if (lengths[x][y] == lengths[x-1][y])
                x--;
            else if (lengths[x][y] == lengths[x][y-1])
                y--;
            else {
                assert a.charAt(x-1) == b.charAt(y-1);
                sb.append(a.charAt(x-1));
                x--;
                y--;
            }
        }
     
        return sb.reverse().toString();
    }
    http://www.geeksforgeeks.org/printing-longest-common-subsequence/
       int i = m, j = n;
       while (i > 0 && j > 0)
       {
          // If current character in X[] and Y are same, then
          // current character is part of LCS
          if (X[i-1] == Y[j-1])
          {
              lcs[index-1] = X[i-1]; // Put current character in result
              i--; j--; index--;     // reduce values of i, j and index
          }
          // If not same, then find the larger of two and
          // go in the direction of larger value
          else if (L[i-1][j] > L[i][j-1])
             i--;
          else
             j--;
       }
    Space Optimization:
    Use one array O(N):  backward
    http://www.devhui.com/2015/04/21/Longest-Common-Substring/
    int LCS_dp(char * X, int xlen, char * Y, int ylen) {
    memset(dp,0,sizeof(dp));
    int maxlen =0, maxindex = 0;
    for(int i = 0; i < xlen; ++i) {
    for(int j = ylen-1; j >= 0; --j) {
    if(X[i] == Y[j]) {
    if(i && j) {
    dp[j] = dp[j-1] + 1;
    }
    if(i == 0 || j == 0) {
    dp[j] = 1;
    }
    if(dp[j] > maxlen) {
    maxlen = dp[j];
    maxindex = i + 1 - maxlen;
    }
    } else {
    dp[j] = 0;
    }
    }
    }
    int maxlen;
    }
    Forward, use tmp variable:
    http://www.jurieo.com/1064.html
            int lenS1=strlen(s1), lenS2=strlen(s2);
            for(int i=0; i<lenS1; i++){
                old=0;
                //若s1[i]==s2[j], dp[i][j] = dp[i-1][j-1]+1
                //否则,dp[i][j] = max(dp[i-1][j], dp[i][j-1])
                //此处进行了空间优化,old 代表 dp[i-1][j-1]
                //dp[j-1] 代表 dp[i][j-1], dp[j] 代表 dp[i-1][j]
                for(int j=0; j<lenS2; j++){
                    tmp = dp[j];
                    if(s1[i]==s2[j])
                        dp[j] = old+1;
                    else
                        if(dp[j-1]>dp[j])dp[j]=dp[j-1];
                    old = tmp;
                }
            }

    Use two O(N) arrays:
    http://www.ics.uci.edu/~eppstein/161/960229.html
        int lcs_length(char * A, char * B)
        {
     allocate storage for one-dimensional arrays X and Y
     for (i = m; i >= 0; i--)
     {
         for (j = n; j >= 0; j--)
         {
      if (A[i] == '\0' || B[j] == '\0') X[j] = 0;
      else if (A[i] == B[j]) X[j] = 1 + Y[j+1];
      else X[j] = max(Y[j], X[j+1]);
         }
         Y = X;
     }
     return X[0];
        }
    http://ideone.com/R3lp88
    1. int lcs(string x, string y)
    2. {
    3. int m = x.length();
    4. int n = y.length();
    5. int curr[n+1], prev[n+1]; // take two array of shorter length to keep track of previous rows
    6. for (int i = 0; i <= m; i++) {
    7. for (int j = 0; j <= n; j++) {
    8. if (i == 0 || j == 0)
    9. curr[j] = 0;
    10. else {
    11. if (x[i-1] == y[j-1])
    12. curr[j]= 1 + prev[j-1];
    13. else
    14. curr[j] = max(prev[j], prev[j-1]);
    15. }
    16. }
    17. for (int start = 0; start <= n; start++)
    18. prev[start] = curr[start];
    19. }
    20. return curr[n];
    21. }
    22. int wrapperLCS(string x, string y) // to keep a check that always the shorter length string is passed as second parameter
    23. {
    24. if (x.length() >= y.length())
    25. return lcs(x, y);
    26. else
    27. return lcs(y, x);
    28. }
    http://www.programgo.com/article/96392147605/
    1. int lcs[2][MAXSIZE];  
    2. char s1[MAXSIZE],s2[MAXSIZE];  
    3.   
    4. int LCS()  
    5. {  
    6.     int i,j,len1,len2;  
    7.     int tag=1;  
    8.     len1 = strlen(s1);  
    9.     len2 = strlen(s2);  
    10.     memset(lcs[0],0,len2*sizeof(int));  
    11.     lcs[1][0] = 0;  
    12.     for(i=1;i<=len1;i++)  
    13.     {  
    14.         for(j=1;j<=len2;j++)  
    15.         {  
    16.             if(s1[i-1] == s2[j-1])  
    17.                 lcs[tag][j] = lcs[1-tag][j-1] + 1;  
    18.             else  
    19.                 lcs[tag][j] = max(lcs[1-tag][j],lcs[tag][j-1]);  
    20.         }  
    21.         tag = 1 - tag;  
    22.     }  
    23.     return lcs[1-tag][len2];  

    Memorizing LCS:
    http://www.ics.uci.edu/~eppstein/161/960229.html
        int lcs_length(char * AA, char * BB)
        {
     A = AA; B = BB;
     allocate storage for L;
     for (i = 0; i <= m; i++)
         for (j = 0; j <= m; j++)
      L[i,j] = -1;
    
     return subproblem(0, 0);
        }
    
        int subproblem(int i, int j)
        {
     if (L[i,j] < 0) {
         if (A[i] == '\0' || B[j] == '\0') L[i,j] = 0;
         else if (A[i] == B[j]) L[i,j] = 1 + subproblem(i+1, j+1);
         else L[i,j] = max(subproblem(i+1, j), subproblem(i, j+1));
     }
     return L[i,j];
        }

    http://www.hankcs.com/program/algorithm/implementation-and-application-of-nlp-longest-common-subsequence-longest-common-subsequence-of-java.html
    Backward to fronend:
    public class LongestCommonSubsequence
    {
        public static int compute(char[] str1, char[] str2)
        {
            int substringLength1 = str1.length;
            int substringLength2 = str2.length;
            // 构造二维数组记录子问题A[i]和B[j]的LCS的长度
            int[][] opt = new int[substringLength1 + 1][substringLength2 + 1];
            // 从后向前,动态规划计算所有子问题。也可从前到后。
            for (int i = substringLength1 - 1; i >= 0; i--)
            {
                for (int j = substringLength2 - 1; j >= 0; j--)
                {
                    if (str1[i] == str2[j])
                        opt[i][j] = opt[i + 1][j + 1] + 1;// 状态转移方程
                    else
                        opt[i][j] = Math.max(opt[i + 1][j], opt[i][j + 1]);// 状态转移方程
                }
            }
            int i = 0, j = 0;
            while (i < substringLength1 && j < substringLength2)
            {
                if (str1[i] == str2[j])
                {
                    System.out.print(str1[i]);
                    i++;
                    j++;
                }
                else if (opt[i + 1][j] >= opt[i][j + 1])
                    i++;
                else
                    j++;
            }
            System.out.println();
            return opt[0][0];
        }
        public static int compute(String str1, String str2)
        {
            return compute(str1.toCharArray(), str2.toCharArray());
        }
    }

    http://blog.csdn.net/v_july_v/article/details/6695482
    ,该算法主要是把求解公共字符串问题转化为求解矩阵L(p,m)的问题,在利用定理求解矩阵的元素过程中(1)while(i<k),L(k,i)=null,
                      (2)while(L(k,i)=k),L(k,i+1)=L(k,i+2)=…L(k,m)=k;
        求出每列元素,一直到发现第p+1 行都为null 时退出循环,得出矩阵L(k,m)后,B[L(1,m-p+1)]B[L(2,m-p+2)]…B[L(p,m)]即为A 和B 的LCS,其中p 为LCS 的长度。
    http://www.cnblogs.com/ider/p/longest-common-substring-problem-optimization.html
    暴力解法
    int longestCommonSubstring_n3(const string& str1, const string& str2)
     2 {   
     3     size_t size1 = str1.size();
     4     size_t size2 = str2.size();
     5     if (size1 == 0 || size2 == 0) return 0;
    
     7     // the start position of substring in original string
     8     int start1 = -1;
     9     int start2 = -1;
    10     // the longest length of common substring 
    11     int longest = 0; 
    13     // record how many comparisons the solution did;
    14     // it can be used to know which algorithm is better
    15     int comparisons = 0;
    16  
    17     for (int i = 0; i < size1; ++i)
    18     {
    19         for (int j = 0; j < size2; ++j)
    20         {
    21             // find longest length of prefix 
    22             int length = 0;
    23             int m = i;
    24             int n = j;
    25             while(m < size1 && n < size2)
    26             {
    27                 ++comparisons;
    28                 if (str1[m] != str2[n]) break;
    29  
    30                 ++length;
    31                 ++m;
    32                 ++n;
    33             }
    34  
    35             if (longest < length)
    36             {
    37                 longest = length;
    38                 start1 = i;
    39                 start2 = j;
    40             }
    41         }
    42     }
    48     return longest;
    49 }

    Todo: http://www.csie.ntnu.edu.tw/~u91029/LongestCommonSubsequence.html
    Read full article from Dynamic Programming | Set 4 (Longest Common Subsequence) | GeeksforGeeks

    Labels

    LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

    Popular Posts