https://gist.github.com/gcrfelix/308dd8e3fa7c88e1852f
url list with relevance score sorted in descending order from two difference sources,
how to organize and return unique url with descending order. (merge sort with de-duplicate)
解法:
用merge sort的思想就完了。 代码里有个list.get(i), get element by index, pre-assume it would be arraylist.
Use merge sort with de-duplication:
In merge sort you take two (or more) already-sorted lists repeatedly apply the following rules:
find the lesser/least of the items of the top of each of the input lists, choosing any of the lowest items if there is a tie
remove that item from its list
add it to your output list
To remove duplicates, you simply modify the rules very slightly:
find the lesser/least of the items of the top of each of the input lists, choosing any of the lowest items if there is a tie
remove that item from its list
if it is the same as the last item you added to your output list, throw it away
otherwise, add it to your output list
This will ensure that no two consecutive items on your output list are the same, and that the items in it are in order, which is what you were after.
这题的话我觉得可以另外建一个HashSet来存url,在merge sort的过程中如果有重复的url就直接忽略
url list with relevance score sorted in descending order from two difference sources,
how to organize and return unique url with descending order. (merge sort with de-duplicate)
解法:
用merge sort的思想就完了。 代码里有个list.get(i), get element by index, pre-assume it would be arraylist.
Use merge sort with de-duplication:
In merge sort you take two (or more) already-sorted lists repeatedly apply the following rules:
find the lesser/least of the items of the top of each of the input lists, choosing any of the lowest items if there is a tie
remove that item from its list
add it to your output list
To remove duplicates, you simply modify the rules very slightly:
find the lesser/least of the items of the top of each of the input lists, choosing any of the lowest items if there is a tie
remove that item from its list
if it is the same as the last item you added to your output list, throw it away
otherwise, add it to your output list
This will ensure that no two consecutive items on your output list are the same, and that the items in it are in order, which is what you were after.
这题的话我觉得可以另外建一个HashSet来存url,在merge sort的过程中如果有重复的url就直接忽略