## Tuesday, May 31, 2016

### Kd-Trees - Princeton Programming Assignment 5

http://coursera.cs.princeton.edu/algs4/assignments/kdtree.html
http://coursera.cs.princeton.edu/algs4/checklists/kdtree.html
Write a data type to represent a set of points in the unit square (all points have x- and y-coordinates between 0 and 1) using a 2d-tree to support efficient range search (find all of the points contained in a query rectangle) and nearest neighbor search (find a closest point to a query point). 2d-trees have numerous applications, ranging from classifying astronomical objects to computer animation to speeding up neural networks to mining data to image retrieval.

Geometric primitives. To get started, use the following geometric primitives for points and axis-aligned rectangles in the plane.

Use the immutable data type Point2D (part of algs4.jar) for points in the plane. Here is the subset of its API that you may use:
```public class Point2D implements Comparable<Point2D> {
public Point2D(double x, double y)              // construct the point (x, y)
public  double x()                              // x-coordinate
public  double y()                              // y-coordinate
public  double distanceTo(Point2D that)         // Euclidean distance between two points
public  double distanceSquaredTo(Point2D that)  // square of Euclidean distance between two points
public     int compareTo(Point2D that)          // for use in an ordered symbol table
public boolean equals(Object that)              // does this point equal that object?
public    void draw()                           // draw to standard draw
public  String toString()                       // string representation
}
```
Use the immutable data type RectHV (part of algs4.jar) for axis-aligned rectangles. Here is the subset of its API that you may use:
```public class RectHV {
public    RectHV(double xmin, double ymin,      // construct the rectangle [xmin, xmax] x [ymin, ymax]
double xmax, double ymax)      // throw a java.lang.IllegalArgumentException if (xmin > xmax) or (ymin > ymax)
public  double xmin()                           // minimum x-coordinate of rectangle
public  double ymin()                           // minimum y-coordinate of rectangle
public  double xmax()                           // maximum x-coordinate of rectangle
public  double ymax()                           // maximum y-coordinate of rectangle
public boolean contains(Point2D p)              // does this rectangle contain the point p (either inside or on boundary)?
public boolean intersects(RectHV that)          // does this rectangle intersect that rectangle (at one or more points)?
public  double distanceTo(Point2D p)            // Euclidean distance from point p to closest point in rectangle
public  double distanceSquaredTo(Point2D p)     // square of Euclidean distance from point p to closest point in rectangle
public boolean equals(Object that)              // does this rectangle equal that object?
public    void draw()                           // draw to standard draw
public  String toString()                       // string representation
}
```
Do not modify these data types.
Brute-force implementation. Write a mutable data type PointSET.java that represents a set of points in the unit square. Implement the following API by using a red-black BST (using either SET from algs4.jar orjava.util.TreeSet).
```public class PointSET {
public         PointSET()                               // construct an empty set of points
public           boolean isEmpty()                      // is the set empty?
public               int size()                         // number of points in the set
public              void insert(Point2D p)              // add the point to the set (if it is not already in the set)
public           boolean contains(Point2D p)            // does the set contain point p?
public              void draw()                         // draw all points to standard draw
public Iterable<Point2D> range(RectHV rect)             // all points that are inside the rectangle
public           Point2D nearest(Point2D p)             // a nearest neighbor in the set to point p; null if the set is empty

public static void main(String[] args)                  // unit testing of the methods (optional)
}
```
Corner cases.  Throw a java.lang.NullPointerException if any argument is null. Performance requirements.  Your implementation should support insert() and contains() in time proportional to the logarithm of the number of points in the set in the worst case; it should support nearest() and range() in time proportional to the number of points in the set.
2d-tree implementation. Write a mutable data type KdTree.java that uses a 2d-tree to implement the same API (but replace PointSET with KdTree). A 2d-tree is a generalization of a BST to two-dimensional keys. The idea is to build a BST with points in the nodes, using the x- and y-coordinates of the points as keys in strictly alternating sequence.

• Search and insert. The algorithms for search and insert are similar to those for BSTs, but at the root we use the x-coordinate (if the point to be inserted has a smaller x-coordinate than the point at the root, go left; otherwise go right); then at the next level, we use the y-coordinate (if the point to be inserted has a smaller y-coordinate than the point in the node, go left; otherwise go right); then at the next level the x-coordinate, and so forth.

 insert (0.7, 0.2) insert (0.5, 0.4) insert (0.2, 0.3) insert (0.4, 0.7) insert (0.9, 0.6)

• Draw. A 2d-tree divides the unit square in a simple way: all the points to the left of the root go in the left subtree; all those to the right go in the right subtree; and so forth, recursively. Your draw() method should draw all of the points to standard draw in black and the subdivisions in red (for vertical splits) and blue (for horizontal splits). This method need not be efficient—it is primarily for debugging.
The prime advantage of a 2d-tree over a BST is that it supports efficient implementation of range search and nearest neighbor search. Each node corresponds to an axis-aligned rectangle in the unit square, which encloses all of the points in its subtree. The root corresponds to the unit square; the left and right children of the root corresponds to the two rectangles split by the x-coordinate of the point at the root; and so forth.

• Range search. To find all points contained in a given query rectangle, start at the root and recursively search for points in both subtrees using the following pruning rule: if the query rectangle does not intersect the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). A subtree is searched only if it might contain a point contained in the query rectangle.
• Nearest neighbor search. To find a closest point to a given query point, start at the root and recursively search in both subtrees using the following pruning rule: if the closest point discovered so far is closer than the distance between the query point and the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). That is, a node is searched only if it might contain a point that is closer than the best one found so far. The effectiveness of the pruning rule depends on quickly finding a nearby point. To do this, organize your recursive method so that when there are two possible subtrees to go down, you always choose the subtree that is on the same side of the splitting line as the query point as the first subtree to explore—the closest point found while exploring the first subtree may enable pruning of the second subtree.
http://www.cs.princeton.edu/courses/archive/spr13/cos226/lectures/99GeometricSearch.pdf
1d range search
Range search: find all keys between k1 and k2.
Range count: number of keys between k1 and k2.
Unordered list. Fast insert, slow range search.
Ordered array. Slow insert, binary search for k1 and k2 to do range search.
N = number of keys
R = number of keys that match
public int size(Key lo, Key hi)
{
if (contains(hi)) return rank(hi) - rank(lo) + 1;
else return rank(hi) - rank(lo);
}

2-d orthogonal range search
Find/count points in a given h-v rectangle

2d orthogonal range search: grid implementation
Grid implementation.
ɾDivide space into M-by-M grid of squares.
ɾCreate list of points contained in each square.
ɾUse 2d array to directly index relevant square.
ɾInsert: add (x, y) to list for corresponding square.
ɾRange search: examine only squares that intersect 2d range query.

ɾSpace: M 2 + N.
ɾTime: 1 + N / M 2 per square examined, on average.
Choose grid square size to tune performance.
ɾToo small: wastes space.
ɾToo large: too many points per square.
ɾRule of thumb: √N-by-√N grid.
Running time. [if points are evenly distributed]
ɾInitialize data structure: N.
ɾInsert point: 1.
ɾRange search: 1 per point in range

Grid implementation. Fast, simple solution for evenly-distributed points.
Problem. Clustering a well-known phenomenon in geometric data.
ɾLists are too long, even though average length is short.
ɾNeed data structure that adapts gracefully to data.

2d tree. Recursively divide space into two halfplanes.
BSP tree. Recursively divide space into two regions.

2d tree implementation
Data structure. BST, but alternate using x- and y-coordinates as key.
ɾSearch gives rectangle containing point.
ɾInsert further subdivides the plane.

Range search in a 2d tree demo
Find all points in a query axis-aligned rectangle.
ɾCheck if point in node lies in given rectangle.
ɾRecursively search left/bottom (if any could fall in rectangle).
ɾRecursively search right/top (if any could fall in rectangle)

https://segmentfault.com/a/1190000005345079

• 节点的奇偶性是一个不小的麻烦，编写需要十分谨慎，我的做法是在Node类中添加了一个boolean用以表示奇偶，相信一定有更好的方法（不存储RectHV），还会回来探索；
• Nearest Neighbor的查找过程需要思考清楚，剪枝的界限十分清晰，尤其是剪裁其中一个子树的条件：如果本子树中找到的最近点与给定点距离 已小于 给定点到另一子树矩形区域的距离（点到点距离比点到矩形距离还近时），才能跳过另一子树的遍历过程。
https://github.com/Revil/algs-kdtree
public class KdTree {

private static class Node {
private Point2D p;
private RectHV  rect;
private Node    left;
private Node    right;

public Node(Point2D p, RectHV rect) {
RectHV r = rect;
if (r == null)
r = new RectHV(0, 0, 1, 1);
this.rect   = r;
this.p      = p;
}
}

private Node    root;
private int     size;

// construct an empty set of points
public KdTree() {
root = null;
size = 0;
}

// is the set empty?
public boolean isEmpty() { return root == null; }

// number of points in the set
public int size() { return size; }

// larger or equal keys go right
private Node insertH(Node x, Point2D p, RectHV rect) {
if (x == null) {
size++;
return new Node(p, rect);
}
if (x.p.equals(p))  return x;

RectHV r;
int cmp = Point2D.Y_ORDER.compare(x.p, p);
if (cmp > 0) {
if (x.left == null)
r = new RectHV(rect.xmin(), rect.ymin(), rect.xmax(), x.p.y());
else
r = x.left.rect;
x.left = insertV(x.left, p, r);
} else {
if (x.right == null)
r = new RectHV(rect.xmin(), x.p.y(), rect.xmax(), rect.ymax());
else
r = x.right.rect;
x.right = insertV(x.right, p, r);
}

return x;
}

// larger or equal keys go right
private Node insertV(Node x, Point2D p, RectHV rect) {
if (x == null) {
size++;
return new Node(p, rect);
}
if (x.p.equals(p))  return x;

RectHV r;
int cmp = Point2D.X_ORDER.compare(x.p, p);
if (cmp > 0) {
if (x.left == null)
r = new RectHV(rect.xmin(), rect.ymin(), x.p.x(), rect.ymax());
else
r = x.left.rect;
x.left = insertH(x.left, p, r);
} else {
if (x.right == null)
r = new RectHV(x.p.x(), rect.ymin(), rect.xmax(), rect.ymax());
else
r = x.right.rect;
x.right = insertH(x.right, p, r);
}

return x;
}

// add the point p to the set (if it is not already in the set)
public void insert(Point2D p) {
if (isEmpty())
root = insertV(root, p, null);
else
root = insertV(root, p, root.rect);
}

// larger or equal keys go right
private boolean contains(Node x, Point2D p, boolean vert) {
if (x == null)      return false;
if (x.p.equals(p))  return true;
int cmp;
if (vert)   cmp = Point2D.X_ORDER.compare(x.p, p);
else        cmp = Point2D.Y_ORDER.compare(x.p, p);
if (cmp > 0)        return contains(x.left, p, !vert);
else                return contains(x.right, p, !vert);
}

// does the set contain the point p?
public boolean contains(Point2D p) {
return contains(root, p, true);
}

private void draw(Node x, boolean vert) {
if (x.left != null)     draw(x.left, !vert);
if (x.right != null)    draw(x.right, !vert);

// draw the point first
StdDraw.setPenColor(StdDraw.BLACK);
StdDraw.point(x.p.x(), x.p.y());

// draw the line
double xmin, ymin, xmax, ymax;
if (vert) {
StdDraw.setPenColor(StdDraw.RED);
xmin = x.p.x();
xmax = x.p.x();
ymin = x.rect.ymin();
ymax = x.rect.ymax();
} else {
StdDraw.setPenColor(StdDraw.BLUE);
ymin = x.p.y();
ymax = x.p.y();
xmin = x.rect.xmin();
xmax = x.rect.xmax();
}
StdDraw.line(xmin, ymin, xmax, ymax);
}

// draw all of the points to standard draw
public void draw() {
// draw the rectangle
StdDraw.rectangle(0.5, 0.5, 0.5, 0.5);
if (isEmpty()) return;
draw(root, true);
}

private void range(Node x, RectHV rect, Queue<Point2D> q) {
if (x == null)
return;
if (rect.contains(x.p))
q.enqueue(x.p);
if (x.left != null && rect.intersects(x.left.rect))
range(x.left, rect, q);
if (x.right != null && rect.intersects(x.right.rect))
range(x.right, rect, q);
}

// all points in the set that are inside the rectangle
public Iterable<Point2D> range(RectHV rect) {
Queue<Point2D> q = new Queue<Point2D>();
range(root, rect, q);
return q;
}

private Point2D nearest(Node x, Point2D p, Point2D mp, boolean vert) {
Point2D min = mp;

if (x == null)    return min;
if (p.distanceSquaredTo(x.p) < p.distanceSquaredTo(min))
min = x.p;

// choose the side that contains the query point first
if (vert) {
if (x.p.x() < p.x()) {
min = nearest(x.right, p, min, !vert);
if (x.left != null
&& (min.distanceSquaredTo(p)
> x.left.rect.distanceSquaredTo(p)))
min = nearest(x.left, p, min, !vert);
} else {
min = nearest(x.left, p, min, !vert);
if (x.right != null
&& (min.distanceSquaredTo(p)
> x.right.rect.distanceSquaredTo(p)))
min = nearest(x.right, p, min, !vert);
}
} else {
if (x.p.y() < p.y()) {
min = nearest(x.right, p, min, !vert);
if (x.left != null
&& (min.distanceSquaredTo(p)
> x.left.rect.distanceSquaredTo(p)))
min = nearest(x.left, p, min, !vert);
} else {
min = nearest(x.left, p, min, !vert);
if (x.right != null
&& (min.distanceSquaredTo(p)
> x.right.rect.distanceSquaredTo(p)))
min = nearest(x.right, p, min, !vert);
}
}
return min;
}

// a nearest neighbor in the set to p: null if set is empty
public Point2D nearest(Point2D p) {
if (isEmpty()) return null;
return nearest(root, p, root.p, true);
}
}

Brute Force:
public class PointSET {

private SET<Point2D> set;

// construct an empty set of points
public PointSET() { set = new SET<Point2D>(); }

// is the set empty?
public boolean isEmpty() { return set.isEmpty(); }

// number of points in the set
public int size() { return set.size(); }

// add the point p to the set (if it is not already in the set)
// proportional to logarithm of N in the worst case
public void insert(Point2D p) { set.add(p); }

// does the set contain the point p?
// proportional to logarithm of N in the worst case
public boolean contains(Point2D p) { return set.contains(p); }

// draw all of the points to standard draw
public void draw() {
StdDraw.setPenColor(StdDraw.BLACK);

for (Point2D p : set)
StdDraw.point(p.x(), p.y());

StdDraw.show(0);
}

// all points in the set that are inside the rectangle
// proportional to N in the worst case
public Iterable<Point2D> range(RectHV rect) {
Stack<Point2D> s = new Stack<Point2D>();

for (Point2D p : set)
if (rect.contains(p))
s.push(p);

return s;
}

// a nearest neighbor in the set to p: null if set is empty
// proportional to N
public Point2D nearest(Point2D p) {
if (isEmpty()) return null;

double dis, minDis = Double.MAX_VALUE;
Point2D q = null;

for (Point2D ip : set) {
dis = p.distanceSquaredTo(ip);
if (dis < minDis) {
minDis = dis;
q = ip;
}
}

return q;
}
}
https://github.com/Revil/algs-kdtree/blob/master/RectHV.java
public double distanceTo(Point2D p) {
return Math.sqrt(this.distanceSquaredTo(p));
}

// distance squared from p to closest point on this axis-aligned rectangle
public double distanceSquaredTo(Point2D p) {
double dx = 0.0, dy = 0.0;
if      (p.x() < xmin) dx = p.x() - xmin;
else if (p.x() > xmax) dx = p.x() - xmax;
if      (p.y() < ymin) dy = p.y() - ymin;
else if (p.y() > ymax) dy = p.y() - ymax;
return dx*dx + dy*dy;
}

// does this axis-aligned rectangle contain p?
public boolean contains(Point2D p) {
return (p.x() >= xmin) && (p.x() <= xmax)
&& (p.y() >= ymin) && (p.y() <= ymax);
}

https://github.com/iman89/princeton/blob/master/KDTrees/src/KdTree.java