Damerau–Levenshtein distance - Wikipedia, the free encyclopedia
The Damerau–Levenshtein distance differs from the classical Levenshtein distance by including transpositions among its allowable operations. The classical Levenshtein distance only allows insertion, deletion, and substitution operations.[5] Modifying this distance by including transpositions of adjacent symbols produces a different distance measure, known as the Damerau–Levenshtein distance
http://www.cnblogs.com/finallyliuyu/archive/2010/09/13/1825258.html
L和D自己对编辑距离的定义是没有问题的,符合泛函理论中对距离定义的三个要素条件。后来一些人就想将L和D的距离定义融合在一起,成为了Damerau–Levenshtein distance(以下简称D-L距离),认为这样就既可以克服了D定义只能识别单一编辑操作引起的错误的局限,又弥补了L定义不包含相邻字符互换操作的遗憾。其实上面的公式1计算的就是D-L距离。但是这个D-L距离并不满足泛函理论中所要求的距离定义的三要素标准,它不满足三角不等式,所以这个定义是有问题的,数学上具有不严谨性。于是也就有了将abc与ca的编辑距离错算为3的情况。但是由于这个错误并不影响工程中的应用,并且这个公式能够给实际工作带来便利,就一直沿用了下来。
http://www.hankcs.com/program/java/several-string-edit-distance-achieved.html
Read full article from Damerau–Levenshtein distance - Wikipedia, the free encyclopedia
The Damerau–Levenshtein distance differs from the classical Levenshtein distance by including transpositions among its allowable operations. The classical Levenshtein distance only allows insertion, deletion, and substitution operations.[5] Modifying this distance by including transpositions of adjacent symbols produces a different distance measure, known as the Damerau–Levenshtein distance
http://www.cnblogs.com/finallyliuyu/archive/2010/09/13/1825258.html
L和D自己对编辑距离的定义是没有问题的,符合泛函理论中对距离定义的三个要素条件。后来一些人就想将L和D的距离定义融合在一起,成为了Damerau–Levenshtein distance(以下简称D-L距离),认为这样就既可以克服了D定义只能识别单一编辑操作引起的错误的局限,又弥补了L定义不包含相邻字符互换操作的遗憾。其实上面的公式1计算的就是D-L距离。但是这个D-L距离并不满足泛函理论中所要求的距离定义的三要素标准,它不满足三角不等式,所以这个定义是有问题的,数学上具有不严谨性。于是也就有了将abc与ca的编辑距离错算为3的情况。但是由于这个错误并不影响工程中的应用,并且这个公式能够给实际工作带来便利,就一直沿用了下来。
http://www.hankcs.com/program/java/several-string-edit-distance-achieved.html
public
static
int
ed(String wrongWord, String rightWord)
{
final
int
m = wrongWord.length();
final
int
n = rightWord.length();
int
[][] d =
new
int
[m +
1
][n +
1
];
for
(
int
j =
0
; j <= n; ++j)
{
d[
0
][j] = j;
}
for
(
int
i =
0
; i <= m; ++i)
{
d[i][
0
] = i;
}
// for (int[] l : d)
// {
// System.out.println(Arrays.toString(l));
// }
for
(
int
i =
1
; i <= m; ++i)
{
char
ci = wrongWord.charAt(i -
1
);
for
(
int
j =
1
; j <= n; ++j)
{
char
cj = rightWord.charAt(j -
1
);
if
(ci == cj)
{
d[i][j] = d[i -
1
][j -
1
];
}
else
if
(i >
1
&& j >
1
&& ci == rightWord.charAt(j -
2
) && cj == wrongWord.charAt(i -
2
))
{
// 交错相等
d[i][j] =
1
+ Math.min(d[i -
2
][j -
2
], Math.min(d[i][j -
1
], d[i -
1
][j]));
}
else
{
// 等号右边的分别代表 将ci改成cj 错串加cj 错串删ci
d[i][j] = Math.min(d[i -
1
][j -
1
] +
1
, Math.min(d[i][j -
1
] +
1
, d[i -
1
][j] +
1
));
}
}
}
// System.out.println();
// for (int[] l : d)
// {
// System.out.println(Arrays.toString(l));
// }
return
d[m][n];
}