## Friday, May 20, 2016

### geometric median - linyx - 博客园

geometric median - linyx - 博客园
The geometric median of a discrete set of sample points in a Euclidean space is the point minimizing the sum of distances to the sample points. This generalizes the median, which has the property of minimizing the sum of distances for one-dimensional data, and provides a central tendency in higher dimensions.

$GeometricMedian=\underset{y\in {\mathbb{R}}^{n}}{\mathrm{a}\mathrm{r}\mathrm{g}\phantom{\rule{thinmathspace}{0ex}}\mathrm{m}\mathrm{i}\mathrm{n}}\sum _{i=1}^{m}{\parallel {x}_{i}-y\parallel }_{2}$$Geometric Median =\underset{y \in \mathbb{R}^n}{\operatorname{arg\,min}} \sum_{i=1}^m \left \| x_i-y \right \|_2$

Q:Given set of points in 2d grid space. Find a grid point such that sum of distance from all the points to this common point is minimum.
eg: p1: [0, 0] p2: [3, 0] p3: [0, 3]
ans: r: [0,0]
sum: 0 + 3 + 3 = 6

这里指的是曼哈顿距离。manhattan distance. 欧式距离不好求，网上人家直接用kmeans。。

• http://stackoverflow.com/questions/12934213/how-to-find-out-geometric-median
• http://stackoverflow.com/questions/12905663/given-list-of-2d-points-find-the-point-closest-to-all-other-points/12905913#12905913

http://stackoverflow.com/questions/12905663/given-list-of-2d-points-find-the-point-closest-to-all-other-points/12905913
Note that in 1-D the point that minimizes the sum of distances to all the points is the median.
In 2-D the problem can be solved in O(n log n) as follows:
Create a sorted array of x-coordinates and for each element in the array compute the "horizontal" cost of choosing that coordinate. The horizontal cost of an element is the sum of distances to all the points projected onto the X-axis. This can be computed in linear time by scanning the array twice (once from left to right and once in the reverse direction). Similarly create a sorted array of y-coordinates and for each element in the array compute the "vertical" cost of choosing that coordinate.
Now for each point in the original array, we can compute the total cost to all other points in O(1)time by adding the horizontal and vertical costs. So we can compute the optimal point in O(n). Thus the total running time is O(n log n).
 1 bool compareByX(const Point &p1, const Point &p2) {
2     return p1.x < p2.x;
3 }
4
5 bool compareByY(const Point &p1, const Point &p2) {
6     return p1.y < p2.y;
7 }
8
9 int maxDistance(vector<Point> &points) {
10     if (points.empty()) return 0;
11     sort(points.begin(), points.end(), compareByX);
12     int n = points.size();
13     vector<int> xdistances(n, 0), ydistances(n, 0);
14     for (int i = 1; i < n; ++i) {
15         xdistances[i] = xdistances[i - 1] + i * (points[i].x - points[i - 1].x);
16     }
17     int right = 0;
18     for (int i = n - 2; i >= 0; --i) {
19         right = right + (n - i - 1) * (points[i + 1].x - points[i].x);
20         xdistances[i] += right;
21     }
22
23     // preprocessing based on y
24     sort(points.begin(), points.end(), compareByY);
25     for (int i = 1; i < n; ++i) {
26         ydistances[i] = ydistances[i - 1] + i * (points[i].y - points[i - 1].y);
27     }
28
29     int top = 0;
30     for (int i = n - 2; i >= 0; --i) {
31         top = top + (n - i - 1) * (points[i + 1].y - points[i].y);
32         ydistances[i] += top;
33     }
34
35     int max = 0;
36     for (int i = 0; i < n; ++i) {
37         if (xdistances[i] + ydistances[i] > max) {
38             max = xdistances[i] + ydistances[i];
39         }
40     }
41     return max;
42 }