geometric median - linyx - 博客园
The geometric median of a discrete set of sample points in a Euclidean space is the point minimizing the sum of distances to the sample points. This generalizes the median, which has the property of minimizing the sum of distances for one-dimensional data, and provides a central tendency in higher dimensions.
也就是说,中位数就是一个数组里到所有其他数据点的距离之和达到最小值的点。n维的也一样。
一维的中位数满足这个性质,证明的话可以用反证法。可以证明的到的是,中位数往左一点或者往右一点都会造成距离之和增加,所以中位数是到其他点的距离之和最小。
然后,问题来了。。。
Q:Given set of points in 2d grid space. Find a grid point such that sum of distance from all the points to this common point is minimum.
eg: p1: [0, 0] p2: [3, 0] p3: [0, 3]
ans: r: [0,0]
sum: 0 + 3 + 3 = 6
这题naive 方法就是,求出所有点到其他点的距离之和,再取最小。
这里指的是曼哈顿距离。manhattan distance. 欧式距离不好求,网上人家直接用kmeans。。
参考:
现在我们就能够在O(1)得到所有点到其他点的距离之和(曼哈顿距离)。所以就能够在O(n)中求出最小值了。(最大值都行啊)
http://stackoverflow.com/questions/12905663/given-list-of-2d-points-find-the-point-closest-to-all-other-points/12905913
Read full article from geometric median - linyx - 博客园
The geometric median of a discrete set of sample points in a Euclidean space is the point minimizing the sum of distances to the sample points. This generalizes the median, which has the property of minimizing the sum of distances for one-dimensional data, and provides a central tendency in higher dimensions.
也就是说,中位数就是一个数组里到所有其他数据点的距离之和达到最小值的点。n维的也一样。
一维的中位数满足这个性质,证明的话可以用反证法。可以证明的到的是,中位数往左一点或者往右一点都会造成距离之和增加,所以中位数是到其他点的距离之和最小。
然后,问题来了。。。
Q:Given set of points in 2d grid space. Find a grid point such that sum of distance from all the points to this common point is minimum.
eg: p1: [0, 0] p2: [3, 0] p3: [0, 3]
ans: r: [0,0]
sum: 0 + 3 + 3 = 6
这题naive 方法就是,求出所有点到其他点的距离之和,再取最小。
这里指的是曼哈顿距离。manhattan distance. 欧式距离不好求,网上人家直接用kmeans。。
参考:
- http://stackoverflow.com/questions/12934213/how-to-find-out-geometric-median
- http://stackoverflow.com/questions/12905663/given-list-of-2d-points-find-the-point-closest-to-all-other-points/12905913#12905913
现在我们就能够在O(1)得到所有点到其他点的距离之和(曼哈顿距离)。所以就能够在O(n)中求出最小值了。(最大值都行啊)
Note that in 1-D the point that minimizes the sum of distances to all the points is the median.
In 2-D the problem can be solved in
O(n log n)
as follows:
Create a sorted array of x-coordinates and for each element in the array compute the "horizontal" cost of choosing that coordinate. The horizontal cost of an element is the sum of distances to all the points projected onto the X-axis. This can be computed in linear time by scanning the array twice (once from left to right and once in the reverse direction). Similarly create a sorted array of y-coordinates and for each element in the array compute the "vertical" cost of choosing that coordinate.
Now for each point in the original array, we can compute the total cost to all other points in
O(1)
time by adding the horizontal and vertical costs. So we can compute the optimal point in O(n)
. Thus the total running time is O(n log n)
.1 bool compareByX(const Point &p1, const Point &p2) { 2 return p1.x < p2.x; 3 } 4 5 bool compareByY(const Point &p1, const Point &p2) { 6 return p1.y < p2.y; 7 } 8 9 int maxDistance(vector<Point> &points) { 10 if (points.empty()) return 0; 11 sort(points.begin(), points.end(), compareByX); 12 int n = points.size(); 13 vector<int> xdistances(n, 0), ydistances(n, 0); 14 for (int i = 1; i < n; ++i) { 15 xdistances[i] = xdistances[i - 1] + i * (points[i].x - points[i - 1].x); 16 } 17 int right = 0; 18 for (int i = n - 2; i >= 0; --i) { 19 right = right + (n - i - 1) * (points[i + 1].x - points[i].x); 20 xdistances[i] += right; 21 } 22 23 // preprocessing based on y 24 sort(points.begin(), points.end(), compareByY); 25 for (int i = 1; i < n; ++i) { 26 ydistances[i] = ydistances[i - 1] + i * (points[i].y - points[i - 1].y); 27 } 28 29 int top = 0; 30 for (int i = n - 2; i >= 0; --i) { 31 top = top + (n - i - 1) * (points[i + 1].y - points[i].y); 32 ydistances[i] += top; 33 } 34 35 int max = 0; 36 for (int i = 0; i < n; ++i) { 37 if (xdistances[i] + ydistances[i] > max) { 38 max = xdistances[i] + ydistances[i]; 39 } 40 } 41 return max; 42 }