## Monday, May 30, 2016

### Percolation - Princeton Algorithm 4th

http://coursera.cs.princeton.edu/algs4/assignments/percolation.html
The model. We model a percolation system using an N-by-N grid of sites. Each site is either open or blocked. A full site is an open site that can be connected to an open site in the top row via a chain of neighboring (left, right, up, down) open sites. We say the system percolates if there is a full site in the bottom row. In other words, a system percolates if we fill all open sites connected to the top row and that process fills some open site on the bottom row. (For the insulating/metallic materials example, the open sites correspond to metallic materials, so that a system that percolates has a metallic path from top to bottom, with full sites conducting. For the porous substance example, the open sites correspond to empty space through which water might flow, so that a system that percolates lets water fill open sites, flowing from top to bottom.)

The problem. In a famous scientific problem, researchers are interested in the following question: if sites are independently set to be open with probability p (and therefore blocked with probability 1 − p), what is the probability that the system percolates? When p equals 0, the system does not percolate; when p equals 1, the system percolates. The plots below show the site vacancy probability p versus the percolation probability for 20-by-20 random grid (left) and 100-by-100 random grid (right).

When N is sufficiently large, there is a threshold value p* such that when p < p* a random N-by-N grid almost never percolates, and when p > p*, a random N-by-N grid almost always percolates. No mathematical solution for determining the percolation threshold p* has yet been derived. Your task is to write a computer program to estimate p*.
Percolation data type. To model a percolation system, create a data type Percolation with the following API:
```public class Percolation {
public Percolation(int N)               // create N-by-N grid, with all sites blocked
public void open(int i, int j)          // open site (row i, column j) if it is not open already
public boolean isOpen(int i, int j)     // is site (row i, column j) open?
public boolean isFull(int i, int j)     // is site (row i, column j) full?
public boolean percolates()             // does the system percolate?

public static void main(String[] args)  // test client (optional)
}
```
Corner cases.  By convention, the row and column indices i and j are integers between 1 and N, where (1, 1) is the upper-left site: Throw a java.lang.IndexOutOfBoundsException if any argument to open()isOpen(), or isFull() is outside its prescribed range. The constructor should throw a java.lang.IllegalArgumentException if N ≤ 0.
Performance requirements.  The constructor should take time proportional to N2; all methods should take constant time plus a constant number of calls to the union-find methods union()find()connected(), andcount().
Monte Carlo simulation. To estimate the percolation threshold, consider the following computational experiment:

• Initialize all sites to be blocked.
• Repeat the following until the system percolates:

• Choose a site (row i, column j) uniformly at random among all blocked sites.
• Open the site (row i, column j).
• The fraction of sites that are opened when the system percolates provides an estimate of the percolation threshold.

For example, if sites are opened in a 20-by-20 lattice according to the snapshots below, then our estimate of the percolation threshold is 204/400 = 0.51 because the system percolates when the 204th site is opened.
By repeating this computation experiment T times and averaging the results, we obtain a more accurate estimate of the percolation threshold. Let xt be the fraction of open sites in computational experiment t. The sample mean μ provides an estimate of the percolation threshold; the sample standard deviation σ measures the sharpness of the threshold.

Assuming T is sufficiently large (say, at least 30), the following provides a 95% confidence interval for the percolation threshold:

To perform a series of computational experiments, create a data type PercolationStats with the following API.

```public class PercolationStats {
public PercolationStats(int N, int T)     // perform T independent experiments on an N-by-N grid
public double mean()                      // sample mean of percolation threshold
public double stddev()                    // sample standard deviation of percolation threshold
public double confidenceLo()              // low  endpoint of 95% confidence interval
public double confidenceHi()              // high endpoint of 95% confidence interval

public static void main(String[] args)    // test client (described below)
}
```

Now, implement the Percolation data type using the weighted quick union algorithm in WeightedQuickUnionUF. Answer the questions in the previous paragraph.
http://www.jyuan92.com/blog/coursera-algorithmprinceton-hw1-percolation/
1) 首先，题目要求通过Weighted Quick Union Union Find来判断两个结点是否Union，而对于每个单独的结点是否open，我们通过isOpen来判断，新建一个boolean matrix[][]数组，来存储当前的结点是否open
2) 根据vedio中所说的，要判断是否连通，可以新建两个虚拟结点，一个top，一个bottom，top连接matrix第一行的所有节点，bottom连接matrix[][]的最后一行结点，这样最后只需要判断top和bottom是否连通，即可以判断是否percolation
Problem: 在这种情况下，会出现backwash的情况，对于一些与bottom连通的结点，即便其于top不连通，但是因为bottom和top连通了，最终会导致这些结点也是full的，与题意不相符

## Improvement1） 采用两个Weighted Quick Union Union Find来解决此问题

– 第一个WQUUF只负责维护top结点
– 第二个WQUUF负责维护top和bottom结点

https://github.com/vgoodvin/princeton-algs4/blob/master/Percolation/Percolation.java
public class Percolation {

private boolean[][] opened;
private int top = 0;
private int bottom;
private int size;
private WeightedQuickUnionUF qf;

/**
* Creates N-by-N grid, with all sites blocked.
*/
public Percolation(int N) {
size = N;
bottom = size * size + 1;
qf = new WeightedQuickUnionUF(size * size + 2);
opened = new boolean[size][size];
}

/**
* Opens site (row i, column j) if it is not already.
*/
public void open(int i, int j) {
opened[i - 1][j - 1] = true;
if (i == 1) {
qf.union(getQFIndex(i, j), top);
}
if (i == size) {
qf.union(getQFIndex(i, j), bottom);
}

if (j > 1 && isOpen(i, j - 1)) {
qf.union(getQFIndex(i, j), getQFIndex(i, j - 1));
}
if (j < size && isOpen(i, j + 1)) {
qf.union(getQFIndex(i, j), getQFIndex(i, j + 1));
}
if (i > 1 && isOpen(i - 1, j)) {
qf.union(getQFIndex(i, j), getQFIndex(i - 1, j));
}
if (i < size && isOpen(i + 1, j)) {
qf.union(getQFIndex(i, j), getQFIndex(i + 1, j));
}
}

/**
* Is site (row i, column j) open?
*/
public boolean isOpen(int i, int j) {
return opened[i - 1][j - 1];
}

/**
* Is site (row i, column j) full?
*/
public boolean isFull(int i, int j) {
if (0 < i && i <= size && 0 < j && j <= size) {
return qf.connected(top, getQFIndex(i , j));
} else {
throw new IndexOutOfBoundsException();
}
}

/**
* Does the system percolate?
*/
public boolean percolates() {
return qf.connected(top, bottom);
}

private int getQFIndex(int i, int j) {
return size * (i - 1) + j;
}
}
https://github.com/vgoodvin/princeton-algs4/blob/master/Percolation/PercolationStats.java
public class PercolationStats {

private int experimentsCount;
private Percolation pr;
private double[] fractions;

/**
* Performs T independent computational experiments on an N-by-N grid.
*/
public PercolationStats(int N, int T) {
if (N <= 0 || T <= 0) {
throw new IllegalArgumentException("Given N <= 0 || T <= 0");
}
experimentsCount = T;
fractions = new double[experimentsCount];
for (int expNum = 0; expNum < experimentsCount; expNum++) {
pr = new Percolation(N);
int openedSites = 0;
while (!pr.percolates()) {
int i = StdRandom.uniform(1, N + 1);
int j = StdRandom.uniform(1, N + 1);
if (!pr.isOpen(i, j)) {
pr.open(i, j);
openedSites++;
}
}
double fraction = (double) openedSites / (N * N);
fractions[expNum] = fraction;
}
}

/**
* Sample mean of percolation threshold.
*/
public double mean() {
return StdStats.mean(fractions);
}

/**
* Sample standard deviation of percolation threshold.
*/
public double stddev() {
return StdStats.stddev(fractions);
}

/**
* Returns lower bound of the 95% confidence interval.
*/
public double confidenceLo() {
return mean() - ((1.96 * stddev()) / Math.sqrt(experimentsCount));
}

/**
* Returns upper bound of the 95% confidence interval.
*/
public double confidenceHi() {
return mean() + ((1.96 * stddev()) / Math.sqrt(experimentsCount));
}

public static void main(String[] args) {
int N = Integer.parseInt(args[0]);
int T = Integer.parseInt(args[1]);
PercolationStats ps = new PercolationStats(N, T);

String confidence = ps.confidenceLo() + ", " + ps.confidenceHi();
StdOut.println("mean                    = " + ps.mean());
StdOut.println("stddev                  = " + ps.stddev());
StdOut.println("95% confidence interval = " + confidence);
}
}
https://segmentfault.com/a/1190000005345079
1. 既没有复杂的语法使用（仅数组操作），又着实比在基础语言层面上升了一个档次；
2. 漂亮的visualizer动画效果激励着初学者完成任务；
4. 针对学习迅速的同学还隐含了一个很大的挑战：在仅使用一个WeightedQuickUnionUF对象的前提下，解决backwash问题

I thought that the solution I coded last weekend was fine because it was correctly computing the percolation thresholds for many different grid sizes.
However, when I tested it today using the provided PercolationVisualizer class, I realized that it was suffering from the "backwash" problem.
It took me a couple of hours but, in the end, I managed to fix it by using two WeightedQuickUnionUF objects instead of one.
1) The easiest way to tackle this problem is to use two different Weighted Quick Union Union Find objects. The only difference between them is that one has only top virtual site (let’s call it uf_A), the other is the normal object with two virtual sites top one and the bottom one (let’s call it uf_B) suggested in the course, uf_A has no backwash problem because we “block” the bottom virtual site by removing it. uf_B is for the purpose to efficiently determine if the system percolates or not as described in the course. So every time we open a site (i, j) by calling Open(i, j), within this method, we need to do  union() twice: uf_A.union() and uf_B.union(). Obviously the bad point of the method is that: semantically we are saving twice the same information which doesn’t seem like a good pattern indeed. The good aspect might be it is the most straightforward and natural approach for people to think of.
public class Percolation {

private WeightedQuickUnionUF grid, auxGrid;
private boolean[]   state;
private int     N;

// create N-by-N grid, with all sites blocked
public Percolation(int N) {

int siteCount = N * N;
this.N = N;

// index 0 and N^2+1 are reserved for virtual top and bottom sites
grid    = new WeightedQuickUnionUF(siteCount + 2);
auxGrid = new WeightedQuickUnionUF(siteCount + 1);
state   = new boolean[siteCount + 2];

// Initialize all sites to be blocked.
for (int i = 1; i <= siteCount; i++)
state[i] = false;
// Initialize virtual top and bottom site with open state
state[0] = true;
state[siteCount+1] = true;
}

// return array index of given row i and column j
private int xyToIndex(int i, int j) {
// Attention: i and j are of range 1 ~ N, NOT 0 ~ N-1.
// Throw IndexOutOfBoundsException if i or j is not valid
if (i <= 0 || i > N)
throw new IndexOutOfBoundsException("row i out of bound");
if (j <= 0 || j > N)
throw new IndexOutOfBoundsException("column j out of bound");

return (i - 1) * N + j;
}

private boolean isTopSite(int index) {
return index <= N;
}

private boolean isBottomSite(int index) {
return index >= (N - 1) * N + 1;
}

// open site (row i, column j) if it is not already
public void open(int i, int j) {
// All input sites are blocked at first.
// Check the state of site before invoking this method.
int idx = xyToIndex(i, j);
state[idx] = true;

// Traverse surrounding sites, connect all open ones.
// Make sure we do not index sites out of bounds.
if (i != 1 && isOpen(i-1, j)) {
grid.union(idx, xyToIndex(i-1, j));
auxGrid.union(idx, xyToIndex(i-1, j));
}
if (i != N && isOpen(i+1, j)) {
grid.union(idx, xyToIndex(i+1, j));
auxGrid.union(idx, xyToIndex(i+1, j));
}
if (j != 1 && isOpen(i, j-1)) {
grid.union(idx, xyToIndex(i, j-1));
auxGrid.union(idx, xyToIndex(i, j-1));
}
if (j != N && isOpen(i, j+1)) {
grid.union(idx, xyToIndex(i, j+1));
auxGrid.union(idx, xyToIndex(i, j+1));
}
// if site is on top or bottom, connect to corresponding virtual site.
if (isTopSite(idx)) {
grid.union(0, idx);
auxGrid.union(0, idx);
}
if (isBottomSite(idx))  grid.union(state.length-1, idx);
}

// is site (row i, column j) open?
public boolean isOpen(int i, int j) {
int idx = xyToIndex(i, j);
return state[idx];
}

// is site (row i, column j) full?
public boolean isFull(int i, int j) {
// Check if this site is connected to virtual top site
int idx = xyToIndex(i, j);
return grid.connected(0, idx) && auxGrid.connected(0, idx);
}

// does the system percolate?
public boolean percolates() {
// Check whether virtual top and bottom sites are connected
return grid.connected(0, state.length-1);
}
}

A final remark: one key observation here is that to test a site is full or not, we only need to know the status of the root of that connected component containing the site, whether a site is full is the same question interpreted by asking whether the connected component is full or not: this make things simple and saves time, someone in the forum proposed to linear scan the whole connected component to update the status of each site which is not necessary and inefficient. each time we only update the status of the root site rather than each site in that component.
2) And there turned out to be more elegant solutions without using 2 WQUUF if we can modify the API or we just wrote our own UF algorithm from scratch. The solution is from “Bobs Notes 1: Union-Find and Percolation (Version 2)“:  Store two bits with each component root. One of these bits is true if the component contains a cell in the top row. The other bit is true if the component contains a cell in the bottom row. Updating the bits after a union takes constant time. Percolation occurs when the component containing a newly opened cell has both bits true, after the unions that result from the cell becoming open. Excellent! However, this one involves the modification of the original API. Based on this and some other discussion from other threads in the discussion forum, I have come up with the following approach which need not to modify the given API but adopt similar idea by associating each connected component root with the information of connection to top and/or bottom sites.
(3) Here we go for the approach based on Bob’s notes while involving no modification of the original given API:
In the original Bob’s notes, it says we have to modify the API to achieve this, but actually it does not have to be like that. We create ONE WQUUF object of size N * N, and allocate a separate array of size N * N to keep the status of each site: blocked, open, connect to top, connect to bottom. I use bit operation for the status so for each site, it could have combined status like Open and connect to top.
The most important operation is open(int i, int j): we need to union the newly opened site (let’s call it site ‘S’) S with the four adjacent neighbor sites if possible. For each possible neighbor site(Let’s call it ‘neighbor’), we first call find(neighbor) to get the root of that connected component, and retrieves the status of that root (Let’s call it ‘status’), next, we do Union(S, neighbor); we do the similar operation for at most 4 times, and we do a 5th find(S) to get the root of the newly (copyright @sigmainfy) generated connected component results from opening the site S, finally we update the status of the new root by combining the old status information into the new root in constant time. I leave the details of how to combine the information to update the the status of the new root to the readers, which would not be hard to think of.
For the isFull(int i, int j), we need to find the the root site in the connected component which contains site (i, j) and check the status of the root.
For the isOpen(int i, int j) we directly return the status.
For percolates(), there is a way to make it constant time even though we do not have virtual top or bottom sites: think about why?
So the most important operation  open(int i, int j) will involve 4 union() and 5 find() API calls.
这道题看起来难度不大，但是实际上做的时候还是需要想一想的。首先的问题就是，这是一个蓄水问题，而不是图连接问题，如何将蓄水问题转换为图连接问题呢？我的方法是这样的：
（1）Open一个Site
（2）分别检查Open的这个Site的上下左右是不是也有Open的Site
如果有，那么将这两个Open的Site链接起来
这样一来，题目就简单多了。

第二个问题是，如何用提供的API，模拟一个N*N的蓄水池呢？方法肯定是坐标转换，我的方法是将每个i,j的Site表示为一维数组中的i*N+j，那么这个N*N的蓄水池起始点为1*N+1，终止点为N*N+N。先把0*N+1到0*N+N以及（N+1）*N+1到（N*1）*N+N链接起来，这样的好处就是：我们检查是否percolates的条件就是看0*N+1,(N+1)*N+1这两个点是否Connected就行了。如果不这么做，我们需要循环检查所有0*N+1到0*N+N以及（N+1）*N+1到（N*1）*N+N是否Connected。

这道题看起来难度不大，但是实际上做的时候还是需要想一想的。首先的问题就是，这是一个蓄水问题，而不是图连接问题，如何将蓄水问题转换为图连接问题呢？我的方法是这样的：
（1）Open一个Site
（2）分别检查Open的这个Site的上下左右是不是也有Open的Site
如果有，那么将这两个Open的Site链接起来
这样一来，题目就简单多了。

第二个问题是，如何用提供的API，模拟一个N*N的蓄水池呢？方法肯定是坐标转换，我的方法是将每个i,j的Site表示为一维数组中的i*N+j，那么这个N*N的蓄水池起始点为1*N+1，终止点为N*N+N。先把0*N+1到0*N+N以及（N+1）*N+1到（N*1）*N+N链接起来，这样的好处就是：我们检查是否percolates的条件就是看0*N+1,(N+1)*N+1这两个点是否Connected就行了。如果不这么做，我们需要循环检查所有0*N+1到0*N+N以及（N+1）*N+1到（N*1）*N+N是否Connected。
public class Percolation { private WeightedQuickUnionUF uf; private WeightedQuickUnionUF uf_backwash; private int N; private boolean[] arrayOpen; // create N-by-N grid, with all sites blocked public Percolation(int N){ this.N = N; uf = new WeightedQuickUnionUF((N+1)*(N)+N+1); uf_backwash = new WeightedQuickUnionUF(N*N+N+1); arrayOpen = new boolean[(N+1)*(N)+N+1]; for (int i=1; i<=N; i++){ uf.union(0*N+1, 0*N+i); uf_backwash.union(0*N+1, 0*N+i); arrayOpen[0*N+i] = true; uf.union((N+1)*N+1, (N+1)*N+i); arrayOpen[(N+1)*N+i] = true; } } // open site (row i, column j) if it is not already public void open(int i, int j){ if (i < 1 || i > N){ throw new IndexOutOfBoundsException("row index " + i + " out of bounds"); } if (j < 1 || j > N){ throw new IndexOutOfBoundsException("row index " + j + " out of bounds"); } if (arrayOpen[i*N+j]){ return; } arrayOpen[i*N+j] = true; if (arrayOpen[(i-1)*N+j]){ uf.union(i*N+j, (i-1)*N+j); uf_backwash.union(i*N+j, (i-1)*N+j); } if (arrayOpen[(i+1)*N+j]){ uf.union(i*N+j, (i+1)*N+j); if (i!=N){ uf_backwash.union(i*N+j, (i+1)*N+j); } } if (j!=1 && arrayOpen[i*N+j-1]){ uf.union(i*N+j, i*N+j-1); uf_backwash.union(i*N+j, i*N+j-1); } if (j!=N && arrayOpen[i*N+j+1]){ uf.union(i*N+j, i*N+j+1); uf_backwash.union(i*N+j, i*N+j+1); } } // is site (row i, column j) open? public boolean isOpen(int i, int j){ if (i <1 || i > N){ throw new IndexOutOfBoundsException("row index " + i + " out of bounds"); } if (j < 1 || j > N){ throw new IndexOutOfBoundsException("row index " + j + " out of bounds"); } return arrayOpen[i*N+j]; } // is site (row i, column j) full? public boolean isFull(int i, int j){ if (i <1 || i > N){ throw new IndexOutOfBoundsException("row index " + i + " out of bounds"); } if (j < 1 || j > N){ throw new IndexOutOfBoundsException("row index " + j + " out of bounds"); } return uf_backwash.connected(i*N+j, 0*N+1) && arrayOpen[i*N+j]; } // does the system percolate? public boolean percolates(){ return uf.connected(0*N+1, (N+1)*N+1); } }