Related: LeetCode 161 - One Edit Distance
http://www.geeksforgeeks.org/dynamic-programming-set-5-edit-distance/
https://www.dreamxu.com/books/dsa/dp/edit-distance.html
https://discuss.leetcode.com/topic/6965/standard-dp-solution
https://segmentfault.com/a/1190000003741294
http://www.jiuzhang.com/solutions/edit-distance/
https://discuss.leetcode.com/topic/5809/my-accepted-java-solution
假设
http://n00tc0d3r.blogspot.com/2013/03/edit-distance.html
In computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.
Let dp[i][j] stands for the edit distance between two strings with length i and j, i.e., word1[0,...,i-1] and word2[0,...,j-1].
There is a relation between dp[i][j] and dp[i-1][j-1]. Let's say we transform from one string to another.
X. 滚动数组: O(n) space
https://discuss.leetcode.com/topic/17639/20ms-detailed-explained-c-solutions-o-n-space/
https://discuss.leetcode.com/topic/3136/my-o-mn-time-and-o-n-space-solution-using-dp-with-explanation/5
http://gongxuns.blogspot.com/2012/12/edit-distance-given-two-words-word1.html
http://comproguide.blogspot.com/2013/12/finding-edit-distance-between-two.html
http://www.8bitavenue.com/2011/12/dynamic-programming-edit-distance/
http://blog.csdn.net/u014688145/article/details/71180132
dp[i][j] : 表示word1[0,i)和word2[0,j)的最短编辑距离 如: dp[1][0] = 1, "a"," " dp[1][1] = ?, "a","a" 相等,? = 0,"a","b" 不相等, ? = 1 初始化:(字符串为空串的情况下,最短编辑距离等于字符串长度) dp[i][0] = i; dp[0][j] = j; 递推式: dp[i][j] = dp[i-1][j-1]; if word1[i] == word2[j]; dp[i][j] = min(dp[i-1][j] + dp[i][j-1] + dp[i-1][j-1]) + 1; if word1[i] != word2[j]; 举个简单的例子说明它们的含义:(word1不去操作,所有操作在word2上进行) word1 : "a" word2 : "ab" dp[0][0] = 0; " " == " " dp[0][1] = 1; edit -> "&a", "&b" dp[0][2] = 2; edit -> "&&a", "&&" dp[1][1] = dp[0][0] = 0; "a" == "a", no edit dp[1][2] = ?; "a" != "b" 存在三种情况: a. dp[1][2] = dp[0][1] = 2; "&a","&b" edit -> "&a","&a" 我们已知道了dp[0][1]表示为"&a","&b",所以只能进行替换操作。 即:dp[i][j] = dp[i-1][j-1] dp[i-1][j-1]表示了最短编辑距离,也可以表示成两字符串经过编辑后已然相等,而此时为了能够更新到dp[i][j],只能对word2的第j个字符做替换操作。 b. dp[1][2] = dp[1][1] = 1; "a","ab" edit -> "a","a" 即:dp[i][j] = dp[i][j-1] 显然,dp[1][1] = 0,没有操作。而此时为了能够更新到dp[i][j],让两字符串相等,只能删除word2的第j个字符。 c. dp[1][2] = dp[0][2] = 3; "&&a","&&" edit -> "&&a","&&a" 即:dp[i][j] = dp[i-1][j]; 同理,dp[0][2] = 2, 说明了word2 = "&&",而为了能够更新到dp[i][j],只能添加到第j个字符后。 最后在这三种操作中,取最小的即可。
”&”空字符串占位符,所以有”&&” = “&”,且我们所有操作只针对一个对象
public int minDistance(String word1, String word2) { int m = word1.length(); int n = word2.length(); int[][] cost = new int[m + 1][n + 1]; for (int i = 0; i <= m; i++) cost[i][0] = i; for (int i = 1; i <= n; i++) cost[0][i] = i; for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { if (word1.charAt(i) == word2.charAt(j)) cost[i + 1][j + 1] = cost[i][j]; else { int a = cost[i][j]; int b = cost[i][j + 1]; int c = cost[i + 1][j]; cost[i + 1][j + 1] = a < b ? (a < c ? a : c) : (b < c ? b : c); cost[i + 1][j + 1]++; } } } return cost[m][n]; }
Application:
http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Edit/
File Revision
Remote Screen Update Problem
Spelling Correction
Plagiarism Detection
Molecular Biology
Speech Recognition
Read full article from Edit Distance in Java
http://www.geeksforgeeks.org/dynamic-programming-set-5-edit-distance/
Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)
You have the following 3 operations permitted on a word:
a) Insert a character
b) Delete a character
c) Replace a character
b) Delete a character
c) Replace a character
用分治的思想解决比较简单,将复杂的问题分解成相似的子问题
假设字符串 a, 共 m 位,从
字符串 b, 共 n 位,从
a[1]
到 a[m]
字符串 b, 共 n 位,从
b[1]
到 b[n]
d[i][j]
表示字符串 a[1]-a[i]
转换为 b[1]-b[j]
的编辑距离
那么有如下递归规律(
a[i]
和 b[j]
分别是字符串 a 和 b 的最后一位):- 当
a[i]
等于b[j]
时,d[i][j] = d[i-1][j-1]
, 比如 fxy -> fay 的编辑距离等于 fx -> fa 的编辑距离 - 当
a[i]
不等于b[j]
时,d[i][j]
等于如下 3 项的最小值:d[i-1][j]
+ 1(删除a[i]
), 比如 fxy -> fab 的编辑距离 = fx -> fab 的编辑距离 + 1d[i][j-1]
+ 1(插入b[j]
), 比如 fxy -> fab 的编辑距离 = fxyb -> fab 的编辑距离 + 1 = fxy -> fa 的编辑距离 + 1d[i-1][j-1]
+ 1(将a[i]
替换为b[j]
), 比如 fxy -> fab 的编辑距离 = fxb -> fab 的编辑距离 + 1 = fx -> fa 的编辑距离 + 1
递归边界:
a[i][0] = i
, b 字符串为空,表示将a[1]-a[i]
全部删除,所以编辑距离为 ia[0][j] = j
, a 字符串为空,表示 a 插入b[1]-b[j]
,所以编辑距离为 j
This is a classic problem of Dynamic Programming. We define the state
dp[i][j]
to be the minimum number of operations to convert word1[0..i - 1]
to word2[0..j - 1]
. The state equations have two cases: the boundary case and the general case. Note that in the above notations, both i
and j
take values starting from 1
.
For the boundary case, that is, to convert a string to an empty string, it is easy to see that the mininum number of operations to convert
word1[0..i - 1]
to ""
requires at least i
operations (deletions). In fact, the boundary case is simply:dp[i][0] = i
;dp[0][j] = j
.
Now let's move on to the general case, that is, convert a non-empty
word1[0..i - 1]
to another non-empty word2[0..j - 1]
. Well, let's try to break this problem down into smaller problems (sub-problems). Suppose we have already known how to convert word1[0..i - 2]
to word2[0..j - 2]
, which is dp[i - 1][j - 1]
. Now let's consider word[i - 1]
and word2[j - 1]
. If they are euqal, then no more operation is needed and dp[i][j] = dp[i - 1][j - 1]
. Well, what if they are not equal?
If they are not equal, we need to consider three cases:
- Replace
word1[i - 1]
byword2[j - 1]
(dp[i][j] = dp[i - 1][j - 1] + 1 (for replacement)
); - Delete
word1[i - 1]
andword1[0..i - 2] = word2[0..j - 1]
(dp[i][j] = dp[i - 1][j] + 1 (for deletion)
); - Insert
word2[j - 1]
toword1[0..i - 1]
andword1[0..i - 1] + word2[j - 1] = word2[0..j - 1]
(dp[i][j] = dp[i][j - 1] + 1 (for insertion)
).
Make sure you understand the subtle differences between the equations for deletion and insertion. For deletion, we are actually converting
word1[0..i - 2]
to word2[0..j - 1]
, which costs dp[i - 1][j]
, and then deleting the word1[i - 1]
, which costs 1
. The case is similar for insertion.
Putting these together, we now have:
dp[i][0] = i
;dp[0][j] = j
;dp[i][j] = dp[i - 1][j - 1]
, ifword1[i - 1] = word2[j - 1]
;dp[i][j] = min(dp[i - 1][j - 1] + 1, dp[i - 1][j] + 1, dp[i][j - 1] + 1)
, otherwise.
Let following be the function definition :-
f(i, j) := minimum cost (or steps) required to convert first i characters of word1 to first j characters of word2
Case 1: word1[i] == word2[j], i.e. the ith the jth character matches.
f(i, j) = f(i - 1, j - 1)
Case 2: word1[i] != word2[j], then we must either insert, delete or replace, whichever is cheaper
f(i, j) = 1 + min { f(i, j - 1), f(i - 1, j), f(i - 1, j - 1) }
- f(i, j - 1) represents insert operation
- f(i - 1, j) represents delete operation
- f(i - 1, j - 1) represents replace operation
Here, we consider any operation from word1 to word2. It means, when we say insert operation, we insert a new character after word1 that matches the jth character of word2. So, now have to match i characters of word1 to j - 1 characters of word2. Same goes for other 2 operations as well.
Note that the problem is symmetric. The insert operation in one direction (i.e. from word1 to word2) is same as delete operation in other. So, we could choose any direction.
Above equations become the recursive definitions for DP.
Base Case:
f(0, k) = f(k, 0) = k
Below is the direct bottom-up translation of this recurrent relation. It is only important to take care of 0-based index with actual code :-
public int minDistance(String word1, String word2) {
int m = word1.length();
int n = word2.length();
int[][] cost = new int[m + 1][n + 1];
for(int i = 0; i <= m; i++)
cost[i][0] = i;
for(int i = 1; i <= n; i++)
cost[0][i] = i;
for(int i = 0; i < m; i++) {
for(int j = 0; j < n; j++) {
if(word1.charAt(i) == word2.charAt(j))
cost[i + 1][j + 1] = cost[i][j];
else {
int a = cost[i][j];
int b = cost[i][j + 1];
int c = cost[i + 1][j];
cost[i + 1][j + 1] = a < b ? (a < c ? a : c) : (b < c ? b : c);
cost[i + 1][j + 1]++;
}
}
}
return cost[m][n];
}
https://segmentfault.com/a/1190000003741294
http://www.jiuzhang.com/solutions/edit-distance/
https://discuss.leetcode.com/topic/5809/my-accepted-java-solution
假设
dp[i-1][j-1]
表示一个长为i-1的字符串str1变为长为j-1的字符串str2的最短距离 public int minDistance(String word1, String word2) {
int m = word1.length(), n = word2.length();
int[][] dp = new int[m + 1][n + 1];
// 初始化空字符串的情况
for(int i = 1; i <= m; i++){
dp[i][0] = i;
}
for(int i = 1; i <= n; i++){
dp[0][i] = i;
}
for(int i = 1; i <= m; i++){
for(int j = 1; j <= n; j++){
// 增加操作:str1a变成str2后再加上b,得到str2b
int insertion = dp[i][j-1] + 1;
// 删除操作:str1a删除a后,再由str1变为str2b
int deletion = dp[i-1][j] + 1;
// 替换操作:先由str1变为str2,然后str1a的a替换为b,得到str2b
int replace = dp[i-1][j-1] + (word1.charAt(i - 1) == word2.charAt(j - 1) ? 0 : 1);
// 三者取最小
dp[i][j] = Math.min(replace, Math.min(insertion, deletion));
}
}
return dp[m][n];
}
Edit Distance in Javahttp://n00tc0d3r.blogspot.com/2013/03/edit-distance.html
In computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.
Let dp[i][j] stands for the edit distance between two strings with length i and j, i.e., word1[0,...,i-1] and word2[0,...,j-1].
There is a relation between dp[i][j] and dp[i-1][j-1]. Let's say we transform from one string to another.
- if x == y, then dp[i][j] == dp[i-1][j-1]
- if x != y, and we insert y for word1, then dp[i][j] = dp[i][j-1] + 1
- if x != y, and we delete x for word1, then dp[i][j] = dp[i-1][j] + 1
- if x != y, and we replace x with y for word1, then dp[i][j] = dp[i-1][j-1] + 1
- When x!=y, dp[i][j] is the min of the three situations.
Initial condition:
dp[i][0] = i, dp[0][j] = j
From http://codercareer.blogspot.com/2011/12/no-25-edit-distance.html
Edit Distance - Algorithmist
dp[i][0] = i, dp[0][j] = j
From http://codercareer.blogspot.com/2011/12/no-25-edit-distance.html
Edit Distance - Algorithmist
int count(string s1, string s2) { int m = s1.length(); int n = s2.length(); for (int i = 0; i <= m; i++) { v[i][0] = i; } for (int j = 0; j <= n; j++) { v[0][j] = j; } for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) { if (s1[i-1] == s2[j-1]) v[i][j] = v[i-1][j-1]; else v[i][j] = 1 + min(min(v[i][j-1],v[i-1][j]),v[i-1][j-1]); } } return v[m][n]; }
public static int minDistance(String word1, String word2) { int len1 = word1.length(); int len2 = word2.length(); // len1+1, len2+1, because finally return dp[len1][len2] int[][] dp = new int[len1 + 1][len2 + 1]; for (int i = 0; i <= len1; i++) { dp[i][0] = i; } for (int j = 0; j <= len2; j++) { dp[0][j] = j; } //iterate though, and check last char for (int i = 0; i < len1; i++) { char c1 = word1.charAt(i); for (int j = 0; j < len2; j++) { char c2 = word2.charAt(j); //if last two chars equal if (c1 == c2) { //update dp value for +1 length dp[i + 1][j + 1] = dp[i][j]; } else { int replace = dp[i][j] + 1; int insert = dp[i][j + 1] + 1; int delete = dp[i + 1][j] + 1; int min = replace > insert ? insert : replace; min = delete > min ? min : delete; dp[i + 1][j + 1] = min; } } } return dp[len1][len2]; }
public int minDistance(String word1, String word2) {
int[][] distance = new int[word1.length() + 1][word2.length() + 1];
for (int i = 0; i < word1.length() + 1; i++)
for (int j = 0; j < word2.length() + 1; j++)
distance[i][j] = Integer.MAX_VALUE;
for (int i = 0; i < word2.length() + 1; i++)
distance[0][i] = i;
for (int i = 0; i < word1.length() + 1; i++)
distance[i][0] = i;
for (int i = 0; i < word1.length(); i++) {
for (int j = 0; j < word2.length(); j++) {
int min = Math.min(distance[i + 1][j], distance[i][j + 1]) + 1;
if (word1.charAt(i) == word2.charAt(j))
min = Math.min(min, distance[i][j]);
else
min = Math.min(min, distance[i][j] + 1);
distance[i + 1][j + 1] = min;
}
}
return distance[word1.length()][word2.length()];
}
X. 滚动数组: O(n) space
https://discuss.leetcode.com/topic/17639/20ms-detailed-explained-c-solutions-o-n-space/
Well, you may have noticed that each time when we update
dp[i][j]
, we only need dp[i - 1][j - 1], dp[i][j - 1], dp[i - 1][j]
. In fact, we need not maintain the full m*n
matrix. Instead, maintaing one column is enough. The code can be optimized to O(m)
or O(n)
space, depending on whether you maintain a row or a column of the original matrix. public int minDistance(String word1, String word2) {
int opt[] = new int[word2.length()+1];
// base case
for(int j = 0;j <= word2.length();j++) opt[j] = j;
// iteration
for(int i = 1;i <= word1.length();i++){
int pre = i, corner = i-1;
for(int j = 1;j <= word2.length();j++){
int temp = corner;
corner = opt[j];
temp += (word1.charAt(i-1)==word2.charAt(j-1)?0:1);
opt[j] = Math.min(temp,Math.min(opt[j],pre)+1);
pre = opt[j];
}
opt[word2.length()] = pre;
}
return opt[word2.length()];
}
https://discuss.leetcode.com/topic/11723/my-clean-java-solution-with-o-n-space-in-17-lines public int minDistance(String word1, String word2) {
int[] d = new int[word2.length() + 1];
for (int i = 0; i <= word2.length(); ++i) d[i] = i;
for (int i = 1; i <= word1.length(); ++i) {
int prev = d[0];
d[0] = i;
for (int j = 1; j <= word2.length(); ++j) {
int tmp = d[j];
d[j] = Math.min(d[j - 1], d[j]) + 1;
d[j] = Math.min(d[j], prev + (word1.charAt(i -1) == word2.charAt(j - 1) ? 0: 1));
prev = tmp;
}
}
return d[word2.length()];
}
https://leetcode.com/discuss/10426/my-o-mn-time-and-o-n-space-solution-using-dp-with-explanationhttp://gongxuns.blogspot.com/2012/12/edit-distance-given-two-words-word1.html
public int minDistance(String word1, String word2) { int m = word1.length(), n = word2.length(); if(m==0) return n; if(n==0) return m; int[] distances = new int[n+1],old = new int[n+1]; for(int i=0;i<=n;i++) old[i]=i; for(int i=1;i<=m;i++){ for(int j=1;j<=n;j++){ if(word1.charAt(i-1)==word2.charAt(j-1)) distances[j] = j>1?old[j-1]:(i-1); else distances[j] = 1+Math.min(j>1?old[j-1]:(i-1),Math.min(old[j],j>1?distances[j-1]:i)); } for(int j=1;j<=n;j++) old[j]=distances[j];// no need to copy, just old=distances } return old[n]; }
http://n00tc0d3r.blogspot.com/2013/03/edit-distance.html
Furthermore, since we only need values of (i-1, j), (i, j-1), and (i-1, j-1), we don't need to maintain the entire table. We only need a table of two rows to maintain the results from last row. Plus, use bit manipulation to select a row can be way faster than mod operation.
The algorithm runs in time O(l1*l2) and uses O(l1) spaces.
public int minDistance(String word1, String word2) {
int l1=word1.length(), l2=word2.length();
if (l1 == 0) return l2;
if (l2 == 0) return l1;
int[][] distTable = new int[l1][2];
distTable[0][0] = (word1.charAt(0)==word2.charAt(0) ? 0 : 1);
for (int i=1; i<l1; ++i) {
distTable[i][0] = Math.min(distTable[i-1][0] + 1,
(word1.charAt(i)==word2.charAt(0) ? i : i+1));
}
for (int j=1; j<l2; ++j) {
int last = (j-1)&1, cur = j&1;
distTable[0][cur] = Math.min(distTable[0][last] + 1,
(word1.charAt(0)==word2.charAt(j) ? j : j+1));
for (int i=1; i<l1; ++i) {
distTable[i][cur] = Math.min(
Math.min(distTable[i][last] + 1, distTable[i-1][cur] + 1),
distTable[i-1][last] + (word1.charAt(i)==word2.charAt(j) ? 0 : 1));
}
}
return distTable[l1-1][(l2-1)&1];
}
A small optimization is to find the shorter string and reduce the space usage to O(min(l1, l2)).
public int minDistance(String word1, String word2) {
String w1, w2;
// find the shorter one
if (word1.length() < word2.length()) {
w1 = word1; w2 = word2;
} else {
w1 = word2; w2 = word1;
}
int l1 = w1.length(); l2 = w2.length();
if (l1 == 0) return l2;
int[][] dist = new int[l1][2];
int row = 0;
dist[0][row] = (w1.charAt(0) == w2.charAt(0)) ? 0 : 1;
// initialize first row
for (int j=1; j<l1; ++j) {
dist[j][row] = Math.min(dist[j-1][row] + 1, (w1.charAt(j) == w2.charAt(0)) ? j : j+1);
}
// the rest rows
for (int i=1; i<l2; ++i) {
int last = row;
row = 1 - row;
dist[0][row] = Math.min(dist[0][last] + 1, (w1.charAt(0) == w2.charAt(i)) ? i : i+1);
for (int j=1; j<l1; ++j) {
dist[j][row] = dist[j-1][last] + (w1.charAt(j) == w2.charAt(i) ? 0 : 1);
dist[j][row] = Math.min(dist[j][row], dist[j][last] + 1);
dist[j][row] = Math.min(dist[j][row], dist[j-1][row] + 1);
}
}
return dist[l1-1][row];
}
http://jlordiales.me/2014/03/01/dynamic-programming-edit-distance/
X. Recursive + Cache
int _minDistance(string word1, string word2, vector<vector<int> > &m, int l1, int l2) {
if (m[l1][l2] != -1)
return m[l1][l2];
if (l1 == 0 || l2 == 0) {
m[l1][l2] = l1 > l2 ? l1 : l2;
return m[l1][l2];
}
int d1 = _minDistance(word1, word2, m, l1 - 1, l2) + 1;
int d2 = _minDistance(word1, word2, m, l1, l2 - 1) + 1;
int d3 = _minDistance(word1, word2, m, l1 - 1, l2 - 1) + (word1[l1 - 1] != word2[l2 - 1] ? 1 : 0);
m[l1][l2] = d1 < d2 ? d1 : d2;
m[l1][l2] = m[l1][l2] < d3 ? m[l1][l2] : d3;
return m[l1][l2];
}
int minDistance(string word1, string word2) {
int nr = word1.size();
int nc = word2.size();
vector<vector<int> > matrix;
vector<int> *curr;
int i, j;
for (i = 0; i <= nr; ++i) {
curr = new vector<int>(nc + 1);
for (j = 0; j <= nc; ++j)
(*curr)[j] = -1;
matrix.push_back(*curr);
}
return _minDistance(word1, word2, matrix, nr, nc);
}
Recusive Version
public int minDistance(String word1, String word2) {
return minDist(word1, word2, word1.length(), word2.length());
}
private int minDist(String w1, String w2, int l1, int l2) {
if (l1 == 0) return l2;
if (l2 == 0) return l1;
return Math.min(
Math.min(minDist(w1, w2, l1-1, l2) + 1,
minDist(w1, w2, l1, l2-1) + 1),
minDist(w1, w2, l1-1, l2-1) + (w1.charAt(l1-1)==w2.charAt(l2-1) ? 0 : 1));
}
public int editDistance(String word1, String word2) {
if (word1.isEmpty()) return word2.length();
if (word2.isEmpty()) return word1.length();
int replace = editDistance(word1.substring(1), word2.substring(1)) + Util.replaceCost(word1, word2, 0, 0);
int delete = editDistance(word1.substring(1), word2) + 1;
int insert = editDistance(word1, word2.substring(1)) + 1;
return Util.min(replace, delete, insert);
}
public static int replaceCost(String w1, String w2, int w1Index, int w2Index) {
return (w1.charAt(w1Index) == w2.charAt(w2Index)) ? 0 : 1;
}
http://huntfor.iteye.com/blog/2077940
- public int minDistanceWithRecurse(String word1, String word2) {
- if(word1.length() == 0){
- return word2.length();
- }
- if(word2.length() == 0){
- return word1.length();
- }
- if(word1.charAt(0) == word2.charAt(0)){
- return minDistance(word1.substring(1), word2.substring(1));
- }else{
- int t1 = minDistance(word1.substring(1), word2);
- int t2 = minDistance(word1, word2.substring(1));
- int t3 = minDistance(word1.substring(1), word2.substring(1));
- return Math.min(t1, Math.min(t2, t3)) + 1;
- }
- }
The time complexity of dynamic programming method is O(mn) as we need to construct the table fully. The space complexity is also O(mn). If we need only the cost of edit, we just need O(min(m, n)) space as it is required only to keep track of the current row and previous row.
Usually the costs D, I and R are not same. In such case the problem can be represented as an acyclic directed graph (DAG) with weights on each edge, and finding shortest path gives edit distance.
int
editDistDP(string str1, string str2,
int
m,
int
n)
{
// Create a table to store results of subproblems
int
dp[m+1][n+1];
// Fill d[][] in bottom up manner
for
(
int
i=0; i<=m; i++)
{
for
(
int
j=0; j<=n; j++)
{
// If first string is empty, only option is to
// isnert all characters of second string
if
(i==0)
dp[i][j] = j;
// Min. operations = j
// If second string is empty, only option is to
// remove all characters of second string
else
if
(j==0)
dp[i][j] = i;
// Min. operations = i
// If last characters are same, ignore last char
// and recur for remaining string
else
if
(str1[i-1] == str2[j-1])
dp[i][j] = dp[i-1][j-1];
// If last character are different, consider all
// possibilities and find minimum
else
dp[i][j] = 1 + min(dp[i][j-1],
// Insert
dp[i-1][j],
// Remove
dp[i-1][j-1]);
// Replace
}
}
return
dp[m][n];
}
- If last characters of two strings are same, nothing much to do. Ignore last characters and get count for remaining strings. So we recur for lengths m-1 and n-1.
- Else (If last characters are not same), we consider all operations on ‘str1′, consider all three operations on last character of first string, recursively compute minimum cost for all three operations and take minimum of three values.
- Insert: Recur for m and n-1
- Remove: Recur for m-1 and n
- Replace: Recur for m-1 and n-1
int
editDist(string str1 , string str2 ,
int
m ,
int
n)
{
// If first string is empty, the only option is to
// insert all characters of second string into first
if
(m == 0)
return
n;
// If second string is empty, the only option is to
// remove all characters of first string
if
(n == 0)
return
m;
// If last characters of two strings are same, nothing
// much to do. Ignore last characters and get count for
// remaining strings.
if
(str1[m-1] == str2[n-1])
return
editDist(str1, str2, m-1, n-1);
// If last characters are not same, consider all three
// operations on last character of first string, recursively
// compute minimum cost for all three operations and take
// minimum of three values.
return
1 + min ( editDist(str1, str2, m, n-1),
// Insert
editDist(str1, str2, m-1, n),
// Remove
editDist(str1, str2, m-1, n-1)
// Replace
);
}
http://www.8bitavenue.com/2011/12/dynamic-programming-edit-distance/
http://blog.csdn.net/u014688145/article/details/71180132
dp[i][j] : 表示word1[0,i)和word2[0,j)的最短编辑距离 如: dp[1][0] = 1, "a"," " dp[1][1] = ?, "a","a" 相等,? = 0,"a","b" 不相等, ? = 1 初始化:(字符串为空串的情况下,最短编辑距离等于字符串长度) dp[i][0] = i; dp[0][j] = j; 递推式: dp[i][j] = dp[i-1][j-1]; if word1[i] == word2[j]; dp[i][j] = min(dp[i-1][j] + dp[i][j-1] + dp[i-1][j-1]) + 1; if word1[i] != word2[j]; 举个简单的例子说明它们的含义:(word1不去操作,所有操作在word2上进行) word1 : "a" word2 : "ab" dp[0][0] = 0; " " == " " dp[0][1] = 1; edit -> "&a", "&b" dp[0][2] = 2; edit -> "&&a", "&&" dp[1][1] = dp[0][0] = 0; "a" == "a", no edit dp[1][2] = ?; "a" != "b" 存在三种情况: a. dp[1][2] = dp[0][1] = 2; "&a","&b" edit -> "&a","&a" 我们已知道了dp[0][1]表示为"&a","&b",所以只能进行替换操作。 即:dp[i][j] = dp[i-1][j-1] dp[i-1][j-1]表示了最短编辑距离,也可以表示成两字符串经过编辑后已然相等,而此时为了能够更新到dp[i][j],只能对word2的第j个字符做替换操作。 b. dp[1][2] = dp[1][1] = 1; "a","ab" edit -> "a","a" 即:dp[i][j] = dp[i][j-1] 显然,dp[1][1] = 0,没有操作。而此时为了能够更新到dp[i][j],让两字符串相等,只能删除word2的第j个字符。 c. dp[1][2] = dp[0][2] = 3; "&&a","&&" edit -> "&&a","&&a" 即:dp[i][j] = dp[i-1][j]; 同理,dp[0][2] = 2, 说明了word2 = "&&",而为了能够更新到dp[i][j],只能添加到第j个字符后。 最后在这三种操作中,取最小的即可。
”&”空字符串占位符,所以有”&&” = “&”,且我们所有操作只针对一个对象
public int minDistance(String word1, String word2) { int m = word1.length(); int n = word2.length(); int[][] cost = new int[m + 1][n + 1]; for (int i = 0; i <= m; i++) cost[i][0] = i; for (int i = 1; i <= n; i++) cost[0][i] = i; for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { if (word1.charAt(i) == word2.charAt(j)) cost[i + 1][j + 1] = cost[i][j]; else { int a = cost[i][j]; int b = cost[i][j + 1]; int c = cost[i + 1][j]; cost[i + 1][j + 1] = a < b ? (a < c ? a : c) : (b < c ? b : c); cost[i + 1][j + 1]++; } } } return cost[m][n]; }
Application:
http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Edit/
File Revision
Remote Screen Update Problem
Spelling Correction
Plagiarism Detection
Molecular Biology
Speech Recognition
Read full article from Edit Distance in Java