Massive Algorithms: Closest Pair of Points

Closest Pair of Points

https://www.geeksforgeeks.org/closest-pair-of-points-using-divide-and-conquer-algorithm/
We are given an array of n points in the plane, and the problem is to find out the closest pair of points in the array.
https://www.hackerearth.com/practice/math/geometry/line-sweep-technique/tutorial/
A sweep line is an imaginary vertical line which is swept across the plane rightwards. That's why, the algorithms based on this concept are sometimes also called plane sweep algorithms. We sweep the line based on some events, in order to discretize the sweep.
The events are based on the problem we are considering , we'll see them in the algorithms discussed below. Other than events, we maintain a data structure which stores the events generally sorted by y coordinates (the criteria for ordering of data structure may vary sometimes) which is helpful in the processing when we encounter some event. At any instance, the data structure stores only the active events.
One other thing to note is that the efficiency of this technique depends on the data structures we use. Generally , we can use set in C++ but sometimes we require some extra information to be stored, so we go for balanced binary tree.

For this problem, we can consider the points in the array as our events.
And in a set, we store the already visited points sorted by y coordinate.
So, first we sort the points in x direction as we want our line to move towards right.
Now, suppose we have processed the points from 1 to N-1, and let h be the shortest distance we have got so far. For Nth point, we want to find points whose distance from Nth point is less than or equal to h.
Now, we know we can only go till h distance from

x_{N}

to find such point, and in the y direction we can go in h distance upwards and h distance downwards. So, all such points whose x coordinate lie in [

x_{N} - h, x_{N}

] and y coordinates lie in [

y_{N} - h, y_{N} + h

] are what we are concerned with and these form the active events of the set . All points in the set with x coordinates less than

x_{N} - h

are to be deleted . After this processing, we'll add the Nth point to the set.
One thing to note is that at any instance, the number of points which are active events is O(1)(there can be atmost 5 points around a point which are active excluding the point itself).
Okay, that's a lot of theory, let's see this algo running
enter image description here

Initialize h as the distance between first two points Update the value of h if present distance between points is less than h
enter image description here

The red region in the image is the region containing points which are to be evaluated with the current point.The points left to this region are removed from the set.

Let's break down the points in the code:
1. First ,we have sorted the array of points on x coordinates.
2. Then we inserted the first point in the pnts array to the set box. Note we have defined py as the first in the pair, so set will be sorted by y coordinates.
3. In the loop , for each point in pnts, we are removing the points to the left of the current point whose x coordinate has more distance than h(the current minimum distance) from

x_{N}

.This loop runs for overall

O (N)

as we have only N elements in the set. The complexity of erase is

O (l o g N)

. So, the overall complexity of this loop is

O (N * l o g N)

4. In the second for loop , we are iterating over all points whose x coordinates lie in [

x_{N} - h, x_{N}

] and y coordinates lie in [

y_{N} - h, y_{N} + h

]. Finding the lower_bound takes

O (l o g N)

and this loop runs for atmost 5 times.
5. For each point, insert into the set. This step takes

O (l o g N)

So, overall time complexity of the algorithm is

O (N * l o g N)

#define px second
#define py first
typedef pair<long long, long long> pairll;
pairll pnts [MAX];
int compare(pairll a, pairll b)
{ 
        return a.px<b.px; 
}
double closest_pair(pairll pnts[],int n)
{
        sort(pnts,pnts+n,compare);
        double best=INF;
        set<pairll> box;
        box.insert(pnts[0]);
        int left = 0;
        for (int i=1;i<n;++i)
        {
            while (left<i && pnts[i].px-pnts[left].px > best)
                box.erase(pnts[left++]);
            for(typeof(box.begin()) it=box.lower_bound(make_pair(pnts[i].py-best, pnts[i].px-best));it!=box.end() && pnts[i].py+best>=it->py;it++)
                best = min(best, sqrt(pow(pnts[i].py - it->py, 2.0)+pow(pnts[i].px - it->px, 2.0)));
            box.insert(pnts[i]);
        }
        return best;
}

https://baptiste-wicht.com/posts/2010/04/closest-pair-of-point-plane-sweep-algorithm.html

With that algorithm, we'll sweep the plane from left to right (right to left is also a possibility) and when we reach a point we'll compute all the interesting candidates (the candidates that can be in the closest pair).

For that we'll make the following operations :

We sort the list of points from left to right in the x axis
And then for each point :
1. We remove from the candidates all the point that are further in x axis that the current min distance
2. We take all the candidates that are located more or less current min distance from the current point in y axis
3. We test for the min distance all the founded candidates with the current point
4. And finally we add the current point to the list of candidates

So when we found a new min distance, we can make the rectangle of candidates smaller in the x axis and smaller in the y axis. So we made a lot less comparisons between the points.

Here is a picture illustrating that :

The red points are the closest pair at this time of the algorithm. The red rectangle is the rectangle of the candidates delimited in right by the current point. And the yellow rectangle contains only the candidates interesting for the current point.

There is always a maximum of 6 points in the yellow rectangle, the 4 vertices, the point with the same coordinates as the current point and finally the point in the same y coordinate and in the limit of the x axis. Even if the maximum is 6, you'll almost never have more than 2 points in that list (the maximum is see in my test was 3 with a collection of 1'000'000 random points). You can see this 6 points here :

If all that stuff is not really clear for you, you can watch it in action here in a Java applet.

The candidates must also be always sorted. For that, we'll use a Binary Search Tree for good performances.

If we look at the complexity :

Sorting all the points in right axis : Cost O(n ln n) with Quick Sort by example
Shrinking the list of candidates take O(n) from start to end of the algorithm because we add n points to the candidates and we can remove only n points. So this is constant for each point : O(1).
Searching all the candidates between two values in y axis cost O(ln n) with binary search
The comparisons with at maximum 6 points are made in O(1)
Add the candidates and keep the list of candidates sorted cost O(ln n).

So the total complexity is O(n ln(n) + n * ( 1 + ln(n) + 1 + ln(n) ) ) = O(n ln n).

It's really easy to solve. We just have to found the number before which the naive algorithm is quicker than the sweeping and when the size of the points collection is smaller than this pivot number we use the naive implementation. On my computer I found that before 75 elements, the naive implementation was faster than the sweeping algorithm, so we can refactor our method :

public static Point[] closestPair(Point[] points) {
    if(points.length < 75){
        return naiveClosestPair(points);
    }

    //No changes
}

public static Point[] closestPair(Point[] points) {
    Point[] closestPair = new Point[2];

    //When we start the min distance is the infinity
    double crtMinDist = Double.POSITIVE_INFINITY;

    //Get the points and sort them
    Point[] sorted = Arrays.copyOf(points, points.length);
    Arrays.sort(sorted, HORIZONTAL_COMPARATOR);

    //When we start the left most candidate is the first one
    int leftMostCandidateIndex = 0;

    //Vertically sorted set of candidates
    SortedSet<Point> candidates = new TreeSet<Point>(VERTICAL_COMPARATOR);

    //For each point from left to right
    for (Point current : sorted) {
        //Shrink the candidates
        while (current.x - sorted[leftMostCandidateIndex].x > crtMinDist) {
            candidates.remove(sorted[leftMostCandidateIndex]);
            leftMostCandidateIndex++;
        }

        //Compute the y head and the y tail of the candidates set
        Point head = new Point(current.x, (int) (current.y - crtMinDist));
        Point tail = new Point(current.x, (int) (current.y + crtMinDist));

        //We take only the interesting candidates in the y axis
        for (Point point : candidates.subSet(head, tail)) {
            double distance = current.distance(point);

            //Simple min computation
            if (distance < crtMinDist) {
                crtMinDist = distance;

                closestPair[0] = current;
                closestPair[1] = point;
            }
        }

        //The current point is now a candidate
        candidates.add(current);
    }

    return closestPair;
}

https://www.cs.mcgill.ca/~cs251/ClosestPair/ClosestPairPS.html

X. Divide and Conquer
Following are the detailed steps of a O(n (Logn)^2) algortihm.

As a pre-processing step, input array is sorted according to x coordinates.

1) Find the middle point in the sorted array, we can take P[n/2] as middle point.

2) Divide the given array in two halves. The first subarray contains points from P[0] to P[n/2]. The second subarray contains points from P[n/2+1] to P[n-1].

3) Recursively find the smallest distances in both subarrays. Let the distances be dl and dr. Find the minimum of dl and dr. Let the minimum be d.

4) From above 3 steps, we have an upper bound d of minimum distance. Now we need to consider the pairs such that one point in pair is from left half and other is from right half. Consider the vertical line passing through passing through P[n/2] and find all points whose x coordinate is closer than d to the middle vertical line. Build an array strip[] of all such points.

5) Sort the array strip[] according to y coordinates. This step is O(nLogn). It can be optimized to O(n) by recursively sorting and merging.

6) Find the smallest distance in strip[]. This is tricky. From first look, it seems to be a O(n^2) step, but it is actually O(n). It can be proved geometrically that for every point in strip, we only need to check at most 7 points after it (note that strip is sorted according to Y coordinate). See this for more analysis.

7) Finally return the minimum of d and distance calculated in above step (step 6)

Time Complexity Let Time complexity of above algorithm be T(n). Let us assume that we use a O(nLogn) sorting algorithm. The above algorithm divides all points in two sets and recursively calls for two sets. After dividing, it finds the strip in O(n) time, sorts the strip in O(nLogn) time and finally finds the closest points in strip in O(n) time. So T(n) can expressed as follows
T(n) = 2T(n/2) + O(n) + O(nLogn) + O(n)
T(n) = 2T(n/2) + O(nLogn)
T(n) = T(n x Logn x Logn)

struct Point 

{

    int x, y; 

};

/* Following two functions are needed for library function qsort(). 

   Refer: http://www.cplusplus.com/reference/clibrary/cstdlib/qsort/ */

// Needed to sort array of points according to X coordinate 

int compareX(const void* a, const void* b) 

{

    Point *p1 = (Point *)a,  *p2 = (Point *)b; 

    return (p1->x - p2->x); 

}

// Needed to sort array of points according to Y coordinate 

int compareY(const void* a, const void* b) 

{

    Point *p1 = (Point *)a,   *p2 = (Point *)b; 

    return (p1->y - p2->y); 

}

// A utility function to find the distance between two points 

float dist(Point p1, Point p2) 

{

    return sqrt( (p1.x - p2.x)*(p1.x - p2.x) + 

                 (p1.y - p2.y)*(p1.y - p2.y) 

);

}

// A Brute Force method to return the smallest distance between two points 

// in P[] of size n 

float bruteForce(Point P[], int n) 

{

    float min = FLT_MAX; 

    for (int i = 0; i < n; ++i) 

        for (int j = i+1; j < n; ++j) 

            if (dist(P[i], P[j]) < min) 

                min = dist(P[i], P[j]); 

    return min; 

}

// A utility function to find minimum of two float values 

float min(float x, float y) 

{

    return (x < y)? x : y; 

}

// A utility function to find the distance beween the closest points of 

// strip of given size. All points in strip[] are sorted accordint to 

// y coordinate. They all have an upper bound on minimum distance as d. 

// Note that this method seems to be a O(n^2) method, but it's a O(n) 

// method as the inner loop runs at most 6 times 

float stripClosest(Point strip[], int size, float d) 

{

    float min = d;  // Initialize the minimum distance as d 

    qsort(strip, size, sizeof(Point), compareY);  

    // Pick all points one by one and try the next points till the difference 

    // between y coordinates is smaller than d. 

    // This is a proven fact that this loop runs at most 6 times 

    for (int i = 0; i < size; ++i) 

        for (int j = i+1; j < size && (strip[j].y - strip[i].y) < min; ++j) 

            if (dist(strip[i],strip[j]) < min) 

                min = dist(strip[i], strip[j]); 

    return min; 

}

// A recursive function to find the smallest distance. The array P contains 

// all points sorted according to x coordinate 

float closestUtil(Point P[], int n) 

{

    // If there are 2 or 3 points, then use brute force 

    if (n <= 3) 

        return bruteForce(P, n); 

    // Find the middle point 

    int mid = n/2; 

    Point midPoint = P[mid]; 

    // Consider the vertical line passing through the middle point 

    // calculate the smallest distance dl on left of middle point and 

    // dr on right side 

    float dl = closestUtil(P, mid); 

    float dr = closestUtil(P + mid, n-mid); 

    // Find the smaller of two distances 

    float d = min(dl, dr); 

    // Build an array strip[] that contains points close (closer than d) 

    // to the line passing through the middle point 

    Point strip[n]; 

    int j = 0; 

    for (int i = 0; i < n; i++) 

        if (abs(P[i].x - midPoint.x) < d) 

            strip[j] = P[i], j++; 

    // Find the closest points in strip.  Return the minimum of d and closest 

    // distance is strip[] 

    return min(d, stripClosest(strip, j, d) ); 

}

// The main functin that finds the smallest distance 

// This method mainly uses closestUtil() 

float closest(Point P[], int n) 

{

    qsort(P, n, sizeof(Point), compareX); 

    // Use recursive function closestUtil() to find the smallest distance 

    return closestUtil(P, n); 

}

https://www.geeksforgeeks.org/closest-pair-of-points-onlogn-implementation/

The great thing about the above approach is, if the array strip[] is sorted according to y coordinate, then we can find the smallest distance in strip[] in O(n) time. In the implementation discussed in previous post, strip[] was explicitly sorted in every recursive call that made the time complexity O(n (Logn)^2), assuming that the sorting step takes O(nLogn) time.
In this post, we discuss an implementation where the time complexity is O(nLogn). The idea is to presort all points according to y coordinates. Let the sorted array be Py[]. When we make recursive calls, we need to divide points of Py[] also according to the vertical line. We can do that by simply processing every point and comparing its x coordinate with x coordinate of middle line.

T(n) = 2T(n/2) + O(n) + O(n) + O(n)
T(n) = 2T(n/2) + O(n)
T(n) = T(nLogn)

// A utility function to find the distance beween the closest points of 

// strip of given size. All points in strip[] are sorted accordint to 

// y coordinate. They all have an upper bound on minimum distance as d. 

// Note that this method seems to be a O(n^2) method, but it's a O(n) 

// method as the inner loop runs at most 6 times 

float stripClosest(Point strip[], int size, float d) 

{ 

    float min = d;  // Initialize the minimum distance as d 

    // Pick all points one by one and try the next points till the difference 

    // between y coordinates is smaller than d. 

    // This is a proven fact that this loop runs at most 6 times 

    for (int i = 0; i < size; ++i) 

        for (int j = i+1; j < size && (strip[j].y - strip[i].y) < min; ++j) 

            if (dist(strip[i],strip[j]) < min) 

                min = dist(strip[i], strip[j]); 

    return min; 

} 

// A recursive function to find the smallest distance. The array Px contains 

// all points sorted according to x coordinates and Py contains all points 

// sorted according to y coordinates 

float closestUtil(Point Px[], Point Py[], int n) 

{ 

    // If there are 2 or 3 points, then use brute force 

    if (n <= 3) 

        return bruteForce(Px, n); 

    // Find the middle point 

    int mid = n/2; 

    Point midPoint = Px[mid]; 

    // Divide points in y sorted array around the vertical line. 

    // Assumption: All x coordinates are distinct. 

    Point Pyl[mid+1];   // y sorted points on left of vertical line 

    Point Pyr[n-mid-1];  // y sorted points on right of vertical line 

    int li = 0, ri = 0;  // indexes of left and right subarrays 

    for (int i = 0; i < n; i++) 

    { 

      if (Py[i].x <= midPoint.x) 

         Pyl[li++] = Py[i]; 

      else

         Pyr[ri++] = Py[i]; 

    } 

    // Consider the vertical line passing through the middle point 

    // calculate the smallest distance dl on left of middle point and 

    // dr on right side 

    float dl = closestUtil(Px, Pyl, mid); 

    float dr = closestUtil(Px + mid, Pyr, n-mid); 

    // Find the smaller of two distances 

    float d = min(dl, dr); 

    // Build an array strip[] that contains points close (closer than d) 

    // to the line passing through the middle point 

    Point strip[n]; 

    int j = 0; 

    for (int i = 0; i < n; i++) 

        if (abs(Py[i].x - midPoint.x) < d) 

            strip[j] = Py[i], j++; 

    // Find the closest points in strip.  Return the minimum of d and closest 

    // distance is strip[] 

    return min(d, stripClosest(strip, j, d) ); 

} 

// The main functin that finds the smallest distance 

// This method mainly uses closestUtil() 

float closest(Point P[], int n) 

{ 

    Point Px[n]; 

    Point Py[n]; 

    for (int i = 0; i < n; i++) 

    { 

        Px[i] = P[i]; 

        Py[i] = P[i]; 

    } 

    qsort(Px, n, sizeof(Point), compareX); 

    qsort(Py, n, sizeof(Point), compareY); 

    // Use recursive function closestUtil() to find the smallest distance 

    return closestUtil(Px, Py, n); 

}

https://algs4.cs.princeton.edu/code/edu/princeton/cs/algs4/ClosestPair.java

    public ClosestPair(Point2D[] points) {
        if (points == null) throw new IllegalArgumentException("constructor argument is null");
        for (int i = 0; i < points.length; i++) {
            if (points[i] == null) throw new IllegalArgumentException("array element " + i + " is null");
        }

        int n = points.length;
        if (n <= 1) return;

        // sort by x-coordinate (breaking ties by y-coordinate)
        Point2D[] pointsByX = new Point2D[n];
        for (int i = 0; i < n; i++)
            pointsByX[i] = points[i];
        Arrays.sort(pointsByX, Point2D.X_ORDER);

        // check for coincident points
        for (int i = 0; i < n-1; i++) {
            if (pointsByX[i].equals(pointsByX[i+1])) {
                bestDistance = 0.0;
                best1 = pointsByX[i];
                best2 = pointsByX[i+1];
                return;
            }
        }

        // sort by y-coordinate (but not yet sorted) 
        Point2D[] pointsByY = new Point2D[n];
        for (int i = 0; i < n; i++)
            pointsByY[i] = pointsByX[i];

        // auxiliary array
        Point2D[] aux = new Point2D[n];

        closest(pointsByX, pointsByY, aux, 0, n-1);
    }

    // find closest pair of points in pointsByX[lo..hi]
    // precondition:  pointsByX[lo..hi] and pointsByY[lo..hi] are the same sequence of points
    // precondition:  pointsByX[lo..hi] sorted by x-coordinate
    // postcondition: pointsByY[lo..hi] sorted by y-coordinate
    private double closest(Point2D[] pointsByX, Point2D[] pointsByY, Point2D[] aux, int lo, int hi) {
        if (hi <= lo) return Double.POSITIVE_INFINITY;

        int mid = lo + (hi - lo) / 2;
        Point2D median = pointsByX[mid];

        // compute closest pair with both endpoints in left subarray or both in right subarray
        double delta1 = closest(pointsByX, pointsByY, aux, lo, mid);
        double delta2 = closest(pointsByX, pointsByY, aux, mid+1, hi);
        double delta = Math.min(delta1, delta2);

        // merge back so that pointsByY[lo..hi] are sorted by y-coordinate
        merge(pointsByY, aux, lo, mid, hi);

        // aux[0..m-1] = sequence of points closer than delta, sorted by y-coordinate
        int m = 0;
        for (int i = lo; i <= hi; i++) {
            if (Math.abs(pointsByY[i].x() - median.x()) < delta)
                aux[m++] = pointsByY[i];
        }

        // compare each point to its neighbors with y-coordinate closer than delta
        for (int i = 0; i < m; i++) {
            // a geometric packing argument shows that this loop iterates at most 7 times
            for (int j = i+1; (j < m) && (aux[j].y() - aux[i].y() < delta); j++) {
                double distance = aux[i].distanceTo(aux[j]);
                if (distance < delta) {
                    delta = distance;
                    if (distance < bestDistance) {
                        bestDistance = delta;
                        best1 = aux[i];
                        best2 = aux[j];
                        // StdOut.println("better distance = " + delta + " from " + best1 + " to " + best2);
                    }
                }
            }
        }
        return delta;
    }

    // stably merge a[lo .. mid] with a[mid+1 ..hi] using aux[lo .. hi]
    // precondition: a[lo .. mid] and a[mid+1 .. hi] are sorted subarrays
    private static void merge(Comparable[] a, Comparable[] aux, int lo, int mid, int hi) {
        // copy to aux[]
        for (int k = lo; k <= hi; k++) {
            aux[k] = a[k];
        }
    
        // merge back to a[] 
        int i = lo, j = mid+1;
        for (int k = lo; k <= hi; k++) {
            if      (i > mid)              a[k] = aux[j++];
            else if (j > hi)               a[k] = aux[i++];
            else if (less(aux[j], aux[i])) a[k] = aux[j++];
            else                           a[k] = aux[i++];
        }
    }

TODO:
https://en.wikipedia.org/wiki/Closest_pair_of_points_problem

The dynamic version for the closest-pair problem is stated as follows:

Given a dynamic set of objects, find algorithms and data structures for efficient recalculation of the closest pair of objects each time the objects are inserted or deleted.

If the bounding box for all points is known in advance and the constant-time floor function is available, then the expected O(n) space data structure was suggested that supports expected-time O(log n) insertions and deletions and constant query time. When modified for the algebraic decision tree model, insertions and deletions would require O(log² n) expected time.^[7] It is worth noting, though, that the complexity of the dynamic closest pair algorithm cited above is exponential in the dimension d, and therefore such an algorithm becomes less suitable for high-dimensional problems.

An algorithm for the dynamic closet-pair problem in d dimensional space was developed by Sergey Bespamyatnikh in 1998 ^[8]. Points can be inserted and deleted in O(logn) time per point (in the worst case).

Closest Pair of Points

Labels

Popular Posts