Cracking the coding interview--Q20.5
You have a large text file containing words. Given any two words, find the shortest distance (in terms of number of words) between them in the file. Can you make the searching operation in O(1) time? What about the space complexity for your solution?
译文:
有一个很大的文本文件,里面包含许多英文单词。给出两个单词,找到它们的最短距离 (以它们之间隔了多少个单词计数)。你能在O(1)的时间内返回任意两个单词间的最短距离吗? 你的解法空间复杂度是多少?
如果只要找一次就用第一种O(n)解法.
如果要找多次就多用一个Hashtable,把所有的组合都保存起来.
https://tianrunhe.wordpress.com/2012/06/04/shortest-distances-between-two-words-in-a-file/
X. We can build and store the mapping between pairs of words to their shortest distance. That is space complexity. But the query is constant since it’s just one step of look-up.
X. O(N)
https://github.com/careercup/CtCI-6th-Edition/tree/master/Java/Ch%2017.%20Hard/Q17_11_Word_Distance
public static HashMapList<String, Integer> getWordLocations(String[] words) {
HashMapList<String, Integer> locations = new HashMapList<String, Integer>();
for (int i = 0; i < words.length; i++) {
locations.put(words[i], i);
}
return locations;
}
public static LocationPair findMinDistancePair(ArrayList<Integer> array1, ArrayList<Integer> array2) {
if (array1 == null || array2 == null || array1.size() == 0 || array2.size() == 0) {
return null;
}
int index1 = 0;
int index2 = 0;
LocationPair best = new LocationPair(array1.get(0), array2.get(0));
LocationPair current = new LocationPair(array1.get(0), array2.get(0));
while (index1 < array1.size() && index2 < array2.size()) {
current.setLocations(array1.get(index1), array2.get(index2));
best.updateWithMin(current);
if (current.location1 < current.location2) {
index1++;
} else {
index2++;
}
}
return best;
}
public static LocationPair findClosest(String word1, String word2, HashMapList<String, Integer> locations) {
ArrayList<Integer> locations1 = locations.get(word1);
ArrayList<Integer> locations2 = locations.get(word2);
return findMinDistancePair(locations1, locations2);
}
public static LocationPair findClosest(String[] words, String word1, String word2) {
LocationPair best = new LocationPair(-1, -1);
LocationPair current = new LocationPair(-1, -1);
for (int i = 0; i < words.length; i++) {
String word = words[i];
if (word.equals(word1)) {
current.location1 = i;
best.updateWithMin(current);
} else if (word.equals(word2)) {
current.location2 = i;
best.updateWithMin(current);
}
}
return best;
}
Read full article from Cracking the coding interview--Q20.5
You have a large text file containing words. Given any two words, find the shortest distance (in terms of number of words) between them in the file. Can you make the searching operation in O(1) time? What about the space complexity for your solution?
译文:
有一个很大的文本文件,里面包含许多英文单词。给出两个单词,找到它们的最短距离 (以它们之间隔了多少个单词计数)。你能在O(1)的时间内返回任意两个单词间的最短距离吗? 你的解法空间复杂度是多少?
首先,我们遇到的第一个问题是:是否要考虑顺序?我们求的是is和name间的距离, 那么文本中先出现name再出现is的情况要不要算进来。这一点是要和面试官进行交流确认的。 这里我们假设不考虑顺序,并且认为本文中只有单词,没有标点。 为了进一步简化问题,我们可以用一个字符串数组来保存单词, 接下来考虑如何计算两个单词间的最短距离。
如果只要找一次就用第一种O(n)解法.
如果要找多次就多用一个Hashtable,把所有的组合都保存起来.
https://tianrunhe.wordpress.com/2012/06/04/shortest-distances-between-two-words-in-a-file/
X. We can build and store the mapping between pairs of words to their shortest distance. That is space complexity. But the query is constant since it’s just one step of look-up.
private
static
Map<HashSet<String>, Integer> distances =
new
HashMap<HashSet<String>, Integer>();
public
static
int
query(String word1, String word2) {
return
distances.get(
new
Pair<String>(word1, word2));
}
public
static
void
buildMap(String text) {
// clean up the text
String[] temp = text.split(
"[\\s\\p{Punct}]"
);
List<String> words =
new
ArrayList<String>();
for
(String word : temp) {
word = word.trim();
if
(!word.isEmpty()) {
words.add(word);
}
}
// build the mapping between pairs of words to
// their shortest distances
for
(
int
i =
0
; i < words.size(); ++i) {
for
(
int
j = i +
1
; j < words.size(); ++j) {
if
(!words.get(i).equals(words.get(j))) {
HashSet<String> pair =
new
HashSet<String>();
pair.add(words.get(i));
pair.add(words.get(j));
if
(distances.keySet().contains(pair)) {
int
curr = distances.get(pair);
if
(j - i < curr)
distances.put(pair, j - i);
}
else
{
distances.put(pair, j - i);
}
}
}
}
}
方法一
遍历一次文本,用哈希函数将每个单词映射到不同结点,结点后保存该单词出现的位置。 比如对于上面的例子
What is your name My name is Hawstein
0 1 2 3 4 5 6 7
遍历一次并处理后,我们得到每个单词在文本中出现的位置:(哈希值是随便写的,示意用)
单词 哈希值 出现位置
What: 3 0
is: 7 1, 6
your: 13 2
name: 14 3, 5
My: 25 4
Hawstein: 27 7
求两个单词间的最小距离时,首先用O(1)时间通过哈希函数映射到指定结点, 然后对于其中一个单词的每个位置,去与第二个单词的所有位置比较,找到最小的差值。 由于位置是递增的,因此可以修改二分查找进行搜索。
该方法的平均查找复杂度应该是O(1)的,但最坏情况下无法保证O(1)的查找时间, 考虑一种极端情况,文本中的单词就只有is和name,它们的数量各为(½)n, 使用这种算法,我们需要O(nlogn)的时间。
X. O(N)
int ShortestDist(string text[], int n, string word1, string word2){
int min = kMaxInt / 2;
int pos1 = -min;
int pos2 = -min;
for(int pos=0; pos<n; ++pos){
if(text[pos] == word1){
pos1 = pos;
int dist = pos1 - pos2;
if(dist < min)
min = dist;
}
else if(text[pos] == word2){
pos2 = pos;
int dist = pos2 - pos1;
if(dist < min)
min = dist;
}
}
return min;
}
https://github.com/careercup/CtCI-6th-Edition/tree/master/Java/Ch%2017.%20Hard/Q17_11_Word_Distance
public static HashMapList<String, Integer> getWordLocations(String[] words) {
HashMapList<String, Integer> locations = new HashMapList<String, Integer>();
for (int i = 0; i < words.length; i++) {
locations.put(words[i], i);
}
return locations;
}
public static LocationPair findMinDistancePair(ArrayList<Integer> array1, ArrayList<Integer> array2) {
if (array1 == null || array2 == null || array1.size() == 0 || array2.size() == 0) {
return null;
}
int index1 = 0;
int index2 = 0;
LocationPair best = new LocationPair(array1.get(0), array2.get(0));
LocationPair current = new LocationPair(array1.get(0), array2.get(0));
while (index1 < array1.size() && index2 < array2.size()) {
current.setLocations(array1.get(index1), array2.get(index2));
best.updateWithMin(current);
if (current.location1 < current.location2) {
index1++;
} else {
index2++;
}
}
return best;
}
public static LocationPair findClosest(String word1, String word2, HashMapList<String, Integer> locations) {
ArrayList<Integer> locations1 = locations.get(word1);
ArrayList<Integer> locations2 = locations.get(word2);
return findMinDistancePair(locations1, locations2);
}
public static LocationPair findClosest(String[] words, String word1, String word2) {
LocationPair best = new LocationPair(-1, -1);
LocationPair current = new LocationPair(-1, -1);
for (int i = 0; i < words.length; i++) {
String word = words[i];
if (word.equals(word1)) {
current.location1 = i;
best.updateWithMin(current);
} else if (word.equals(word2)) {
current.location2 = i;
best.updateWithMin(current);
}
}
return best;
}