Related: LeetCode 161 - One Edit Distance
LeetCode 72 - Edit Distance
Buttercola: Airbnb: K Edit Distance
https://github.com/jiadaizhao/LintCode/tree/master/0623-K%20Edit%20Distance
Given a set of strings which just has lower case letters and a target string, output all the strings for each the edit distance with the target no greater than k.
https://www.jiuzhang.com/qa/4572/
https://www.jiuzhang.com/qa/9656/
假设最长的单词长度是L,那么节点最多有26^L个,而转移每个节点都需要O(n)的操作
所以总的时间复杂度大概是O(n*26^L)
https://github.com/chendddong/LintCode/blob/master/623.%20K%20Edit%20Distance.java
https://www.jiuzhang.com/qa/7009/
http://lhearen.site/2016/10/16/K-Edit-Distance/
这题是将所有的原串放进一棵字典树,然后遍历这棵字典树(相当于遍历所有的前缀),dp[i] 表示从Trie的root节点走到当前node节点,形成的Prefix和 target的前i个字符的最小编辑距离。最后通过dp[i]就可以得到和target的最小距离。
Given a list of word and a target word, output all the words for each the edit distance with the target no greater than k.
e.g. [abc, abd, abcd, adc], target "ac", k = 1,
output = [abc, adc]
Naive Solution:
A naive solution would be, for each word in the list, calculate the edit distance with the target word. If it is equal or less than k, output to the result list.
If we assume that the average length of the words is m, and the total number of words in the list is n, the total time complexity is O(n * m^2).
A better solution with a trie:
The problem with the previous solution is if the given list of the words is like ab, abc, abcd, each time we need to repeatedly calculate the edit distance with the target word. If we can combine the prefix of all words together, we can save lots of time.
https://robinliu.gitbooks.io/leetcode/content/trie.html
http://elise.co.nf/?p=955
和LC的 edit distance一样,只不过加入了trie和dfs,与dp结合,难度更上一层
X. Brute Force
https://github.com/awangdev/LintCode/blob/master/Java/K%20Edit%20Distance.java
We can calculate min moves needed to reach target with dp
dp[i][j] represents # steps to convert S[0, i - 1] to T[0, j - 1]
dp[m][n] is the minimum moves
It's double sequence dp, initialize with [m+1][n+1]
dp[0][0] = 0. no sequence to compare/edit.
init: if i == 0, take j steps to become T. if j == 0, takes i steps to become S.
Apply the dp for all words
Some pre-validations:
- length diff > k, skip
- equal to target: just return
http://cs.au.dk/~cstorm/courses/StrAlg_f14/slides/Wu-Mamber.pdf
https://github.com/allaboutjst/airbnb/blob/master/src/main/java/k_edit_distance/KEditDistance.java
LeetCode 72 - Edit Distance
Buttercola: Airbnb: K Edit Distance
https://github.com/jiadaizhao/LintCode/tree/master/0623-K%20Edit%20Distance
Given a set of strings which just has lower case letters and a target string, output all the strings for each the edit distance with the target no greater than k.
You have the following 3 operations permitted on a word:
- Insert a character
- Delete a character
- Replace a character
Example
Given words = ["abc", "abd", "abcd", "adc"] and target = "ac", k = 1 Return ["abc", "adc"]
http://cyjump.com/2017/09/13/K-edit-distance/Given words = ["abc", "abd", "abcd", "adc"] and target = "ac", k = 1 Return ["abc", "adc"]
当然这一题也可以,就要借助到trie树,trie树就不细说了吧。大概意思就是,abc abd 不用重复计算ab
以 abc abd 为例,构建的trie树就是
|
|
等同于我们要构建动态规划的表格就是如下的两个
0 | a | b | c | 0 | a | b | d | ||
---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | |
a | 1 | a | 1 | ||||||
c | 2 | c | 2 |
于是我们从a,b,c,d 以此开始比较,递归调用,每次传递前一列的结果
从 0 与 ac开始,生成 a 列,a 列用于计算b列,b列用于计算 c列与d列,如此递归下去
从 0 与 ac开始,生成 a 列,a 列用于计算b列,b列用于计算 c列与d列,如此递归下去
理解这个问题首先要理解 edit distance的状态转移方程。
这题本质是在Trie树上做动态规划,类似于每一个节点上,表示从root走到这个节点形成的前缀与字符串的每一个前缀的编辑距离,这是基本思路。
这题本质是在Trie树上做动态规划,类似于每一个节点上,表示从root走到这个节点形成的前缀与字符串的每一个前缀的编辑距离,这是基本思路。
dp[i]是当前结点对应的单词到target第i位第最小花费
next与dp一个含义,相当于dp'接下来的查找如果还用dp,那么会对上一层有影响,所以需要另一个数组
next与dp一个含义,相当于dp'接下来的查找如果还用dp,那么会对上一层有影响,所以需要另一个数组
a. 首先是
我们都知道在最一般的edit distance问题里面,
捋清了这个,再来理解
我们注意到上面的转移方程任意时候都只需要至多前一步的信息,所以我们只需要两个数组,一个标记前一步的情形,一个标记现在的情形 i.e. 用
dp
和next
。我们都知道在最一般的edit distance问题里面,
dp[i][j]
代表着 word1[0:i]
到word2[0:j]
的编辑距离,并且很容易知道这样的转移方程:if (word1.charAt(i - 1) == word2.char(j - 1)) {
dp[i][j] = dp[i - 1][j - 1]; // 字符相同,不需要编辑
} else {
dp[i][j] = 1 + Math.min(dp[i - 1][j - 1]
, Math.min(dp[i][j - 1], dp[i - 1][j]));
// 字符不一,那么存在insertion或subsitution,
// 意味着编辑距离为之前最小可能的编辑距离+1
}
捋清了这个,再来理解
dp
和next
就很容易:我们注意到上面的转移方程任意时候都只需要至多前一步的信息,所以我们只需要两个数组,一个标记前一步的情形,一个标记现在的情形 i.e. 用
dp[j]
标记 以前的dp[i - 1][j]
, 以next[j]
代表以前的dp[i][j]
。 那么也必然有:if (word1.charAt(i - 1) == word2.char(j - 1)) {
next[j] = dp[j - 1]; // 老的dp[i][j] => next[j],
//老的dp[i - 1][j - 1] => dp[j - 1]
} else {
next[j]= 1 + Math.min(dp[j - 1],
Math.min(next[j - 1], dp[j]));
// 老的dp[i][j-1] => next[j - 1],
//老的dp[i - 1][j] => dp[j]
}
b. 其次是具体到本题标答的转移方程问题。
我也不太理解在九章给出的答案里为什么字符相同时是
我也不太理解在九章给出的答案里为什么字符相同时是
next[j] = min(dp[j-1], next(j-1)+1, dp[j]+1)
, 我的体会是用next[j] = dp[j - 1]
就足够了,也可以通过OJ。事实上,next[j] = dp[j - 1]
意味着不进行任何编辑,那么必然是要比next(j-1)+1
和 dp[j]+1
都要小https://www.jiuzhang.com/qa/9656/
假设最长的单词长度是L,那么节点最多有26^L个,而转移每个节点都需要O(n)的操作
所以总的时间复杂度大概是O(n*26^L)
https://github.com/chendddong/LintCode/blob/master/623.%20K%20Edit%20Distance.java
class TrieNode {
/* Attributes in Trie */
public TrieNode[] children;
public boolean hasWord;
public String str;
/* Initialize the children and the hasWord */
public TrieNode() {
children = new TrieNode[26];
for (int i = 0; i < 26; ++i)
children[i] = null;
hasWord = false;
}
/* Adds a word into the data structure. */
static public void addWord(TrieNode root, String word) {
TrieNode now = root; /* Traverse pointer */
for (int i = 0; i < word.length(); i++) { /* traverse every char */
Character c = word.charAt(i);
if (now.children[c - 'a'] == null) {
now.children[c - 'a'] = new TrieNode(); /* Create new child */
}
now = now.children[c - 'a']; /* Or get the child */
}
now.str = word; /* The whole word */
now.hasWord = true; /* Mark the word */
}
}
public List<String> kDistance(String[] words, String target, int k) {
TrieNode root = new TrieNode(); /* Need a virtual root */
for (int i = 0; i < words.length; i++)
TrieNode.addWord(root, words[i]); /* Add words to the Trie */
List<String> result = new ArrayList<String>();
int n = target.length();
/*
* State: dp[i] means walking down the trie from the virtual root to the current node, the
* min edit distance between the formed prefix and the target's 0 to ith characters.
*/
int[] dp = new int[n + 1];
for (int i = 0; i <= n; ++i)
dp[i] = i;
find(root, result, k, target, dp);
return result;
}
public void find(TrieNode node, List<String> result, int k, String target, int[] dp) {
int n = target.length();
if (node.hasWord && dp[n] <= k) { /* Where the answer satisfies */
result.add(node.str);
}
/* Each search we need to create a new dp */
int[] next = new int[n + 1];
for (int i = 0; i <= n; ++i)
next[i] = 0;
for (int i = 0; i < 26; ++i) /* 26 Characters */
if (node.children[i] != null) {
next[0] = dp[0] + 1; /* Init */
for (int j = 1; j <= n; j++) {
if (target.charAt(j - 1) - 'a' == i) { /* Matches */
next[j] = Math.min(dp[j - 1], Math.min(next[j - 1] + 1,
dp[j] + 1)); /* Check two j - 1 check one dp j */
} else { /* Does not match */
next[j] = Math.min(dp[j - 1] + 1, Math.min(next[j - 1] + 1, dp[j] + 1));
/* Check two j - 1 check one dp j */
}
}
find(node.children[i], result, k, target, next);
}
}
https://github.com/jiadaizhao/LintCode/blob/master/0623-K%20Edit%20Distance/0623-K%20Edit%20Distance.cpphttps://www.jiuzhang.com/qa/7009/
http://lhearen.site/2016/10/16/K-Edit-Distance/
这题是将所有的原串放进一棵字典树,然后遍历这棵字典树(相当于遍历所有的前缀),dp[i] 表示从Trie的root节点走到当前node节点,形成的Prefix和 target的前i个字符的最小编辑距离。最后通过dp[i]就可以得到和target的最小距离。
Given a list of word and a target word, output all the words for each the edit distance with the target no greater than k.
e.g. [abc, abd, abcd, adc], target "ac", k = 1,
output = [abc, adc]
Naive Solution:
A naive solution would be, for each word in the list, calculate the edit distance with the target word. If it is equal or less than k, output to the result list.
If we assume that the average length of the words is m, and the total number of words in the list is n, the total time complexity is O(n * m^2).
A better solution with a trie:
The problem with the previous solution is if the given list of the words is like ab, abc, abcd, each time we need to repeatedly calculate the edit distance with the target word. If we can combine the prefix of all words together, we can save lots of time.
public
List<String> getKEditDistance(String[] words, String target,
int
k) {
List<String> result =
new
ArrayList<>();
if
(words ==
null
|| words.length ==
0
||
target ==
null
|| target.length() ==
0
||
k <
0
) {
return
result;
}
Trie trie =
new
Trie();
for
(String word : words) {
trie.add(word);
}
TrieNode root = trie.root;
int
[] prev =
new
int
[target.length() +
1
];
for
(
int
i =
0
; i < prev.length; i++) {
prev[i] = i;
}
getKEditDistanceHelper(
""
, target, k, root, prev, result);
return
result;
}
private
void
getKEditDistanceHelper(String curr, String target,
int
k, TrieNode root,
int
[] prevDist, List<String> result) {
if
(root.isLeaf) {
if
(prevDist[target.length()] <= k) {
result.add(curr);
}
else
{
return
;
}
}
for
(
int
i =
0
; i <
26
; i++) {
if
(root.children[i] ==
null
) {
continue
;
}
int
[] currDist =
new
int
[target.length() +
1
];
currDist[
0
] = curr.length() +
1
;
for
(
int
j =
1
; j < prevDist.length; j++) {
if
(target.charAt(j -
1
) == (
char
) (i +
'a'
)) {
currDist[j] = prevDist[j -
1
];
}
else
{
currDist[j] = Math.min(Math.min(prevDist[j -
1
], prevDist[j]),
currDist[j -
1
]) +
1
;
}
}
getKEditDistanceHelper(curr + (
char
) (i +
'a'
), target, k,
root.children[i], currDist, result);
}
}
public
static
void
main(String[] args) {
Solution solution =
new
Solution();
String[] input =
new
String[]{
"abcd"
,
"abc"
,
"abd"
,
"ad"
};
String target =
"ac"
;
int
k =
1
;
List<String> result = solution.getKEditDistance(input, target, k);
for
(String s : result) {
System.out.println(s);
}
}
class
TrieNode {
TrieNode[] children;
boolean
isLeaf;
public
TrieNode() {
children =
new
TrieNode[
26
];
}
}
class
Trie {
TrieNode root;
public
Trie() {
root =
new
TrieNode();
}
// Add a word into trie
public
void
add(String s) {
if
(s ==
null
|| s.length() ==
0
) {
return
;
}
TrieNode p = root;
for
(
int
i =
0
; i < s.length(); i++) {
char
c = s.charAt(i);
if
(p.children[c -
'a'
] ==
null
) {
p.children[c -
'a'
] =
new
TrieNode();
}
if
(i == s.length() -
1
) {
p.children[c -
'a'
].isLeaf =
true
;
}
p = p.children[c -
'a'
];
}
}
}
直接用edit distance挨个遍历一遍也能做,但是如果word list很大,那重复计算会非常多,这时候可以用trie来优化。 下面举个例,假设字典为["cs", "ct", "cby"],target word为"cat",k=1。 首先建Trie:
c
/ | \
b s t
/
y
从根节点开始搜索,遍历的单词分别为:c -> cb -> (cby) -> cs -> ct。 与普通的Edit distance动态规划算法一样,我们对每一个“路过”的单词记录一个DP table。那么所遍历的单词的DP table应该为(假设当前遍历单词,也就是下面代码中的path为”c”):
c a t
[ 0 1 2 3 ] <- prev_dist
-> c [ 1 0 1 2 ] <- cur_dist
cb [ 2 1 1 2 ]
cs [ 2 1 1 2 ]
ct [ 2 1 1 1 ]
每一行的最后一位数字即为当前单词与target word的edit distance。显然,如果该值小于等于k,则加入到结果中。最终返回的结果只有一个单词,即ct。 注意,遍历到单词cb时,edit distance已经为2,且长度已经与cat相等,也就意味着该节点的子树中所包含的单词与target word的edit distance无论如何都不可能小于等于k,因此直接从该子树返回。所以单词cby是没有被遍历到的。这也就是Trie做这题的便利之处,当字典较大时会明显提高效率。
http://ninefu.github.io/blog/KEditDistance/http://elise.co.nf/?p=955
和LC的 edit distance一样,只不过加入了trie和dfs,与dp结合,难度更上一层
public List<String> findKEditDistance(String[] strs, String target, int k) { List<String> res = new ArrayList<>(); if(strs.length==0) return res; Trie trie = new Trie(); TrieNode root = trie.buildTrie(strs); int[] distance = new int[target.length()+1]; for(int i = 0;i<distance.length;i++) distance[i] = i; helper(target,res,root,distance,k,""); return res; } private void helper(String target, List<String> res, TrieNode root, int[] distance, int k, String tmp) { if(root.isWord) { if(distance[target.length()]<=k) res.add(tmp); return; } for(int i = 0;i<26;i++) { if(root.children[i]==null) continue; int[] curdistance = new int[target.length()+1]; //this +1 is to add the processing character, which is (char)(i+'a') curdistance[0] = tmp.length()+1; //use curdistance and distance to represent edit distance for tmp and target //distance[j] means edit distance between tmp and target[0,j) for(int j = 1;j<distance.length;j++) { if(target.charAt(j-1)==(char)(i+'a')) curdistance[j] = distance[j-1]; // if character matches, no insert, delete or replace. so no increasing edit distance //current edit distance for tmp and target[0,j) is distance[j-1] else { //not match, either delete, insert or replace character on tmp curdistance[j] = Math.min(Math.min(distance[j-1], distance[j]), curdistance[j-1])+1; //+1? //distance[j-1] means replace tmp.charAt(j-1) with target.charAt(j-1) //distance[j] means delete tmp.charAt(j-1) //curdistance[j-1] means add target.charAt(j-1) after tmp.charAt(j-1) } } helper(target,res,root.children[i],curdistance,k,tmp+(char)(i+'a')); } } } class TrieNode { public TrieNode[] children = new TrieNode[26]; public boolean isWord = false; } class Trie { public void insert(TrieNode root, String str) { TrieNode cur = root; for(int i = 0;i<str.length();i++) { char c = str.charAt(i); if(cur.children[c-'a']==null) cur.children[c-'a'] = new TrieNode(); cur = cur.children[c-'a']; } cur.isWord = true; } public TrieNode buildTrie(String[] strs) { TrieNode root = new TrieNode(); for(String str:strs) insert(root,str); return root; } }
X. Brute Force
https://github.com/awangdev/LintCode/blob/master/Java/K%20Edit%20Distance.java
We can calculate min moves needed to reach target with dp
dp[i][j] represents # steps to convert S[0, i - 1] to T[0, j - 1]
dp[m][n] is the minimum moves
It's double sequence dp, initialize with [m+1][n+1]
dp[0][0] = 0. no sequence to compare/edit.
init: if i == 0, take j steps to become T. if j == 0, takes i steps to become S.
Apply the dp for all words
Some pre-validations:
- length diff > k, skip
- equal to target: just return
public List<String> kDistance(String[] words, String target, int k) {
List<String> rst = new ArrayList<>();
if (words == null || words.length == 0 || target == null || k <= 0) {
return rst;
}
for (String word: words) {
if (validate(word, target, k)) {
rst.add(word);
}
}
return rst;
}
private boolean validate(String word, String target, int k) {
if (word.equals(target)) {
return true;
}
if (Math.abs(word.length() - target.length()) > k) {
return false;
}
int m = word.length();
int n = target.length();
int[][] dp = new int[2][n + 1];
for (int i = 0; i <= m; i++) {
for (int j = 0; j <= n; j++) {
if (i == 0) {
dp[i % 2][j] = j;
continue;
}
if (j == 0) {
dp[i % 2][j] = i;
continue;
}
dp[i % 2][j] = Math.min(dp[(i - 1) % 2][j - 1],
Math.min(dp[i % 2][j - 1], dp[(i - 1) % 2][j])) + 1;
if (word.charAt(i - 1) == target.charAt(j - 1)) {
dp[i % 2][j] = Math.min(dp[i % 2][j], dp[(i - 1) % 2][j - 1]);
}
}
}
return dp[m % 2][n] <= k;
}
http://stevehanov.ca/blog/index.php?id=114http://cs.au.dk/~cstorm/courses/StrAlg_f14/slides/Wu-Mamber.pdf
https://github.com/allaboutjst/airbnb/blob/master/src/main/java/k_edit_distance/KEditDistance.java
private void search(String curr, String target, int k, TrieNode root,
int[] prevDist, List<String> result) {
if (root.isLeaf) {
if (prevDist[target.length()] <= k) {
result.add(curr);
} else {
return;
}
}
for (int i = 0; i < 26; i++) {
if (root.children[i] == null) {
continue;
}
int[] currDist = new int[target.length() + 1];
currDist[0] = curr.length() + 1;
for (int j = 1; j < prevDist.length; j++) {
if (target.charAt(j - 1) == (char) (i + 'a')) {
currDist[j] = prevDist[j - 1];
} else {
currDist[j] = Math.min(Math.min(prevDist[j - 1], prevDist[j]), currDist[j - 1]) + 1;
}
}
search(curr + (char) (i + 'a'), target, k, root.children[i], currDist, result);
}
}
public List<String> getKEditDistance(String[] words, String target, int k) {
List<String> res = new ArrayList<>();
if (words == null || words.length == 0 || target == null ||
target.length() == 0 || k < 0) {
return res;
}
Trie trie = new Trie();
for (String word : words) {
trie.insert(word);
}
TrieNode root = trie.root;
// The edit distance from curr to target
int[] prev = new int[target.length() + 1];
for (int i = 0; i < prev.length; i++) {
prev[i] = i;
}
search("", target, k, root, prev, res);
return res;
}
class TrieNode {
TrieNode[] children;
boolean isLeaf;
public TrieNode() {
children = new TrieNode[26];
}
}
class Trie {
TrieNode root;
public Trie() {
root = new TrieNode();
}
// Add a word into trie
public void insert(String s) {
if (s == null || s.length() == 0) {
return;
}
TrieNode p = root;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (p.children[c - 'a'] == null) {
p.children[c - 'a'] = new TrieNode();
}
if (i == s.length() - 1) {
p.children[c - 'a'].isLeaf = true;
}
p = p.children[c - 'a'];
}
}
}
}