Massive Algorithms: Union-Find

Showing posts with label Union-Find - Weight. Show all posts

结点A与根关系	结点B与根关系	A与B关系
0	0	0
0	1	2
0	2	1
1	0	1
1	1	0
1	2	2
2	0	2
2	1	1
2	2	0

结点A与根关系	结点B与A关系	B与根关系
0	0	0
0	1	1
0	2	2
1	0	1
1	1	2
1	2	0
2	0	2
2	1	0
2	2	1

int par[maxn],N,M; int sum[maxn];//sum[i]表示i到i的父结点的距离，即i的父结点比i高的分数 void init() { for(int i=0;i<=N;i++) par[i]=i; memset(sum,0,sizeof(sum)); } int find(int x) { if(par[x]!=x) { int tmp=par[x]; par[x]=find(par[x]); sum[x]+=sum[tmp]; } return par[x]; } void unit(int x,int y,int v) { int fx=find(x),fy=find(y); if(fx==fy)//在一棵树上 { return ; } else { par[fx]=fy; sum[fx]=sum[y]-sum[x]-v; } return ; } bool same(int x,int y) { return find(x)==find(y); } int main() { int q; while(~scanf("%d%d%d",&N,&M,&q)) { init(); while(M--) { int x,y,v; scanf("%d%d%d",&x,&y,&v); unit(x,y,v); } while(q--) { int x,y; scanf("%d%d",&x,&y); printf("%d\n",same(x,y)?sum[y]-sum[x]:-1); } } return 0; }

同样使用带权并查集，用数组p记录同学们的权值（与根的分数差），那么路径压缩时，p[a] = p[a] + p]，合并时，p[rb] = p[a] + S - p[b]

Union-Find Data Structure Summary

https://zhuanlan.zhihu.com/p/34929966

然后与传统的不同，带权并查集有一个新的数组叫做 $v[i]$ ，他表示的意义是：从当前节点到根节点的有向距离（或者在这题称为盈亏利润，这里定义A到B的有向距离为dis时，B到A的有向距离为-dis），而对于两个点对 $(x,y)$ 的距离，如果他们不在一个集合里则肯定没有距离（显然），而他们在集合里的有向距离则为 $v[y]-v[x]$ （相当于是y到根节点的距离与根节点到x点的距离之和）。

X. 带权并查集

并查集的本质是一个森林，每棵树代表一个集合，树根为集合的代表元。支持两种操作：

查询一个元素所处的集合
合并两个集合

查询一个元素所处的集合，只需要不断寻找父节点，直到找到代表元。
合并两个集合时，先找到两个集合的代表元x、y，然后令fa[x]=y即可。

优化

路径压缩：沿着树根的路径找到元素a所在集合的代表元b后，对这条路径上的所有元素执行fa[x]=b
按rank启发式合并：为了避免退化，对于每个集合维护一个rank值，每次将较小的合并到较大的，相同时则rank=rank+1

void init(int n)
{
    for(int i = 0; i < n; i++)
    fa[i] = i;
}

int find(int v)
{
    return fa[v] = fa[v] == v ? v : find(fa[v]);
}

void merge(int x, int y)
{
    int a = find(x), b = find(y);
    if(rank[a] < rank[b])
        fa[a] = b;
    else {
        fa[b] = a;
        if(rank[a] == rank[b])
            rank[a]++;
    }
}

带权并查集

带权并查集即是结点存有权值信息的并查集。

当两个元素之间的关系可以量化，并且关系可以合并时，可以使用带权并查集来维护元素之间的关系。

带权并查集每个元素的权通常描述其与并查集中祖先的关系，这种关系如何合并，路径压缩时就如何压缩。

带权并查集可以推算集合内点的关系，而一般并查集只能判断属于某个集合。

http://www.corelab.ece.ntua.gr/acac10/ACAC2010_Talks/SimonYoffe.pdf

Delete(a) – remove a from its containing set

Union links the root of the shallower tree to the root of the taller tree (by rank) O(1)

To increase the amortized efficiency of the find operation we perform path compression

Link all the nodes in the path directly to the root

- Delete operation in constant time

https://en.wikipedia.org/wiki/Ackermann_function

A(m,n)={\begin{cases}n+1&{\mbox{if }}m=0\\A(m-1,1)&{\mbox{if }}m>0{\mbox{ and }}n=0\\A(m-1,A(m,n-1))&{\mbox{if }}m>0{\mbox{ and }}n>0.\end{cases}}

Its value grows rapidly, even for small inputs. For example, A(4, 2) is an integer of 19,729 decimal digits

https://en.wikipedia.org/wiki/Disjoint-set_data_structure#Time_complexity

Using both path compression, splitting, or halving and union by rank or size ensures that the amortized time per operation is only

O(\alpha (n))

,^[4]^[5] which is optimal,^[6] where

\alpha (n)

is the inverse Ackermann function. This function has a value

\alpha (n)<5

for any value of

n

that can be written in this physical universe, so the disjoint-set operations take place in essentially constant time.

Since the function f(n) = A(n, n) considered above grows very rapidly, its inverse function, f⁻¹, grows very slowly. This inverse Ackermann function f⁻¹ is usually denoted by α. In fact, α(n) is less than 5 for any practical input size n, since A(4, 4) is on the order of

2^{2^{2^{2^{16}}}}

This inverse appears in the time complexity of some algorithms, such as the disjoint-set data structure and Chazelle's algorithm for minimum spanning trees. Sometimes Ackermann's original function or other variations are used in these settings, but they all grow at similarly high rates. In particular, some modified functions simplify the expression by eliminating the −3 and similar terms.

https://cs.stackexchange.com/questions/96401/why-time-complexity-of-union-find-is-olgn-with-only-union-by-rank

I saw time complexity of union and find function depends on some conditions.

without anything: $O (N)$
with Union by Rank: $O (\log N)$
with Path Compression: $O (α (N))$ which is inverse Ackermann function.

https://www.geeksforgeeks.org/union-find-algorithm-set-2-union-by-rank/

The above operations can be optimized to O(Log n) in worst case. The idea is to always attach smaller depth tree under the root of the deeper tree. This technique is called union by rank. The term rank is preferred instead of height because if path compression technique (we have discussed it below) is used, then rank is not always equal to height. Also, size (in place of height) of trees can also be used as rank. Using size as rank also yields worst case time complexity as O(Logn)

The second optimization to naive method is Path Compression. The idea is to flatten the tree when find() is called. When find() is called for an element x, root of the tree is returned. The find() operation traverses up from x to find root. The idea of path compression is to make the found root as parent of x so that we don’t have to traverse all intermediate nodes again. If x is root of a subtree, then path (to root) from all nodes under x also compresses.

// class to represent Subset 

class subset 

{ 

    int parent; 

    int rank; 

} 

// A utility function to find  

// set of an element i (uses  

// path compression technique) 

int find(subset [] subsets , int i) 

{ 

if (subsets[i].parent != i) 

    subsets[i].parent = find(subsets,  

                             subsets[i].parent); 

    return subsets[i].parent; 

} 

// A function that does union 

// of two sets of x and y 

// (uses union by rank) 

void Union(subset [] subsets,  

           int x , int y ) 

{ 

    int xroot = find(subsets, x); 

        int yroot = find(subsets, y); 

    if (subsets[xroot].rank < subsets[yroot].rank) 

        subsets[xroot].parent = yroot; 

    else if (subsets[yroot].rank < subsets[xroot].rank) 

        subsets[yroot].parent = xroot; 

    else

    { 

        subsets[xroot].parent = yroot; 

        subsets[yroot].rank++; 

    } 

}

https://www.slideshare.net/WeiLi73/time-complexity-of-union-find-55858534

Union-Find Algorithm | Set 1 (Detect Cycle in a an Undirected Graph) | GeeksforGeeks

A disjoint-set data structure is a data structure that keeps track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. A union-find algorithm is an algorithm that performs two useful operations on such a data structure:

Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.

Union: Join two subsets into a single subset.

It supports two useful operations:

Find: Determine which subset a particular element is in. Find typically returns an item from this set that serves as its "representative"; by comparing the result of two Find operations, one can determine whether two elements are in the same subset.
Union: Join two subsets into a single subset.

The other important operation, MakeSet, which makes a set containing only a given element (asingleton), is generally trivial.

Union-Find Algorithm can be used to check whether an undirected graph contains cycle or not. This method assumes that graph doesn’t contain any self-loops.
We can keeps track of the subsets in a 1D array, lets call it parent[].

For each edge, make subsets using both the vertices of the edge. If both the vertices are in the same subset, a cycle is found.

class Graph

{

    int V, E;    // V-> no. of vertices & E->no.of edges

    Edge edge[]; // /collection of all edges

    class Edge

    {

        int src, dest;

    };

    // Creates a graph with V vertices and E edges

    Graph(int v,int e)

    {

        V = v;

        E = e;

        edge = new Edge[E];

        for (int i=0; i<e; ++i)

            edge[i] = new Edge();

    }

    // A utility function to find the subset of an element i

    int find(int parent[], int i)

    {

        if (parent[i] == -1)

            return i;

        return find(parent, parent[i]);

    }

    // A utility function to do union of two subsets

    void Union(int parent[], int x, int y)

    {

        int xset = find(parent, x);

        int yset = find(parent, y);

        parent[xset] = yset;

    }

    // The main function to check whether a given graph

    // contains cycle or not

    int isCycle( Graph graph)

    {

        // Allocate memory for creating V subsets

        int parent[] = new int[graph.V];

        // Initialize all subsets as single element sets

        for (int i=0; i<graph.V; ++i)

            parent[i]=-1;

        // Iterate through all edges of graph, find subset of both

        // vertices of every edge, if both subsets are same, then

        // there is cycle in graph.

        for (int i = 0; i < graph.E; ++i)

        {

            int x = graph.find(parent, graph.edge[i].src);

            int y = graph.find(parent, graph.edge[i].dest);

            if (x == y)

                return 1;

            graph.Union(parent, x, y);

        }

        return 0;

    }

}

http://www.geeksforgeeks.org/union-find-algorithm-set-2-union-by-rank/
Note that the implementation of union() and find() is naive and takes O(n) time in worst case. These methods can be improved to O(Logn) using Union by Rank or Height.

The idea is to always attach smaller depth tree under the root of the deeper tree. This technique is called union by rank. The term rank is preferred instead of height because if path compression technique (we have discussed it below) is used, then rank is not always equal to height. Also, size (in place of height) of trees can also be used as rank. Using size as rank also yields worst case time complexity as O(Logn)

Let us see the above example with union by rank
Initially all elements are single element subsets.
0 1 2 3 

Do Union(0, 1)
   1   2   3  
  /
 0

Do Union(1, 2)
   1    3
 /  \
0    2

Do Union(2, 3)
    1    
 /  |  \
0   2   3

Let the subset {0, 1, .. 9} be represented as below and find() is called
for element 3.
              9
         /    |    \  
        4     5      6
     /     \        /  \
    0        3     7    8
            /  \
           1    2  

When find() is called for 3, we traverse up and find 9 as representative
of this subset. With path compression, we also make 3 as child of 9 so 
that when find() is called next time for 1, 2 or 3, the path to root is 
reduced.

               9
         /    /  \    \
        4    5    6     3 
     /           /  \   /  \
    0           7    8  1   2

// a structure to represent an edge in graph

struct Edge

{

    int src, dest;

}

// a structure to represent a graph

struct Graph

{

    // V-> Number of vertices, E-> Number of edges

    int V, E;

    // graph is represented as an array of edges

    struct Edge* edge;

};

struct subset

{

    int parent;

    int rank;

};

// Creates a graph with V vertices and E edges

struct Graph* createGraph(int V, int E)

{

    struct Graph* graph = (struct Graph*) malloc( sizeof(struct Graph) );

    graph->V = V;

    graph->E = E;

    graph->edge = (struct Edge*) malloc( graph->E * sizeof( struct Edge ) );

    return graph;

}

// A utility function to find set of an element i

// (uses path compression technique)

int find(struct subset subsets[], int i)

{

    // find root and make root as parent of i (path compression)

    if (subsets[i].parent != i)

        subsets[i].parent = find(subsets, subsets[i].parent);

    return subsets[i].parent;

}

// A function that does union of two sets of x and y

// (uses union by rank)

void Union(struct subset subsets[], int x, int y)

{

    int xroot = find(subsets, x);

    int yroot = find(subsets, y);

    // Attach smaller rank tree under root of high rank tree

    // (Union by Rank)

    if (subsets[xroot].rank < subsets[yroot].rank)

        subsets[xroot].parent = yroot;

    else if (subsets[xroot].rank > subsets[yroot].rank)

        subsets[yroot].parent = xroot;

    // If ranks are same, then make one as root and increment

    // its rank by one

    else

    {

        subsets[yroot].parent = xroot;

        subsets[xroot].rank++;

    }

}

// The main function to check whether a given graph contains cycle or not

int isCycle( struct Graph* graph )

{

    int V = graph->V;

    int E = graph->E;

    // Allocate memory for creating V sets

    struct subset *subsets =

        (struct subset*) malloc( V * sizeof(struct subset) );

    for (int v = 0; v < V; ++v)

    {

        subsets[v].parent = v;

        subsets[v].rank = 0;

    }

    // Iterate through all edges of graph, find sets of both

    // vertices of every edge, if sets are same, then there is

    // cycle in graph.

    for(int e = 0; e < E; ++e)

    {

        int x = find(subsets, graph->edge[e].src);

        int y = find(subsets, graph->edge[e].dest);

        if (x == y)

            return 1;

        Union(subsets, x, y);

    }

    return 0;

}

http://blog.csdn.net/dm_vincent/article/details/7655764

在对问题进行建模的时候，我们应该尽量想清楚需要解决的问题是什么。因为模型中选择的数据结构和算法显然会根据问题的不同而不同，就动态连通性这个场景而言，我们需要解决的问题可能是：

给出两个节点，判断它们是否连通，如果连通，不需要给出具体的路径
给出两个节点，判断它们是否连通，如果连通，需要给出具体的路径

就上面两种问题而言，虽然只有是否能够给出具体路径的区别，但是这个区别导致了选择算法的不同，本文主要介绍的是第一种情况，即不需要给出具体路径的Union-Find算法，而第二种情况可以使用基于DFS的算法。

Quick-Find 算法：

上述代码的find方法十分高效，因为仅仅需要一次数组读取操作就能够找到该节点的组号，但是问题随之而来，对于需要添加新路径的情况，就涉及到对于组号的修改，因为并不能确定哪些节点的组号需要被修改，因此就必须对整个数组进行遍历，找到需要修改的节点，逐一修改，这一下每次添加新路径带来的复杂度就是线性关系了，如果要添加的新路径的数量是M，节点数量是N，那么最后的时间复杂度就是MN，显然是一个平方阶的复杂度，对于大规模的数据而言，平方阶的算法是存在问题的，这种情况下，每次添加新路径就是“牵一发而动全身”，想要解决这个问题，关键就是要提高union方法的效率，让它不再需要遍历整个数组。

private int[] id; // access to component id (site indexed)
private int count; // number of components
public UF(int N)
{
// Initialize component id array.
count = N;
id = new int[N];
for (int i = 0; i < N; i++)
id[i] = i;
}
public int count()
{ return count; }
public boolean connected(int p, int q)
{ return find(p) == find(q); }
public int find(int p)
{ return id[p]; }
public void union(int p, int q)
{
// 获得p和q的组号
int pID = find(p);
int qID = find(q);
// 如果两个组号相等，直接返回
if (pID == qID) return;
// 遍历一次，改变组号使他们属于一个组
for (int i = 0; i < id.length; i++)
if (id[i] == pID) id[i] = qID;
count--;
}

如果不改变底层数据结构，即不改变使用数组的表示方法的话。可以采用parent-link的方式将节点组织起来，举例而言，id[p]的值就是p节点的父节点的序号，如果p是树根的话，id[p]的值就是p，因此最后经过若干次查找，一个节点总是能够找到它的根节点，即满足id[root] = root的节点也就是组的根节点了，然后就可以使用根节点的序号来表示组号。所以在处理一个pair的时候，将首先找到pair中每一个节点的组号(即它们所在树的根节点的序号)，如果属于不同的组的话，就将其中一个根节点的父节点设置为另外一个根节点，相当于将一颗独立的树编程另一颗独立的树的子树。直观的过程如下图所示。但是这个时候又引入了问题。

private int find(int p)
{
// 寻找p节点所在组的根节点，根节点具有性质id[root] = root
while (p != id[p]) p = id[p];
return p;
}
public void union(int p, int q)
{
// Give p and q the same root.
int pRoot = find(p);
int qRoot = find(q);
if (pRoot == qRoot)
return;
id[pRoot] = qRoot; // 将一颗树(即一个组)变成另外一课树(即一个组)的子树
count--;
}

树这种数据结构容易出现极端情况，因为在建树的过程中，树的最终形态严重依赖于输入数据本身的性质，比如数据是否排序，是否随机分布等等。比如在输入数据是有序的情况下，构造的BST会退化成一个链表

在Quick-Union算法中的任何操作，都不可避免的需要调用find方法，而该方法的执行效率依赖于树的高度。树的高度减小了，find方法的效率就增加了，从而也就增加了整个Quick-Union算法的效率。

在find方法的执行过程中，不是需要进行一个while循环找到根节点嘛？如果保存所有路过的中间节点到一个数组中，然后在while循环结束之后，将这些中间节点的父节点指向根节点，不就行了么？但是这个方法也有问题，因为find操作的频繁性，会造成频繁生成中间节点数组，相应的分配销毁的时间自然就上升了。那么有没有更好的方法呢？还是有的，即将节点的父节点指向该节点的爷爷节点，这一点很巧妙，十分方便且有效，相当于在寻找根节点的同时，对路径进行了压缩，使整个树结构扁平化。相应的实现如下，实际上只需要添加一行代码：

[java] view plain copy


 print?
private int find(int p)  
{  
    while (p != id[p])  
    {  
        // 将p节点的父节点设置为它的爷爷节点  
        id[p] = id[id[p]];  
        p = id[p];  
    }  
    return p;  
}  

http://www.geeksforgeeks.org/union-find-algorithm-set-2-union-by-rank/

struct subset

{

    int parent;

    int rank;

};

// A utility function to find set of an element i

// (uses path compression technique)

int find(struct subset subsets[], int i)

{

    // find root and make root as parent of i (path compression)

    if (subsets[i].parent != i)

        subsets[i].parent = find(subsets, subsets[i].parent);

    return subsets[i].parent;

}

// A function that does union of two sets of x and y

// (uses union by rank)

void Union(struct subset subsets[], int x, int y)

{

    int xroot = find(subsets, x);

    int yroot = find(subsets, y);

    // Attach smaller rank tree under root of high rank tree

    // (Union by Rank)

    if (subsets[xroot].rank < subsets[yroot].rank)

        subsets[xroot].parent = yroot;

    else if (subsets[xroot].rank > subsets[yroot].rank)

        subsets[yroot].parent = xroot;

    // If ranks are same, then make one as root and increment

    // its rank by one

    else

{

        subsets[yroot].parent = xroot;

        subsets[xroot].rank++;

}

}

// The main function to check whether a given graph contains cycle or not

int isCycle( struct Graph* graph )

{

    int V = graph->V;

    int E = graph->E;

    // Allocate memory for creating V sets

    struct subset *subsets =

        (struct subset*) malloc( V * sizeof(struct subset) );

    for (int v = 0; v < V; ++v)

{

        subsets[v].parent = v;

        subsets[v].rank = 0;

}

    // Iterate through all edges of graph, find sets of both

    // vertices of every edge, if sets are same, then there is

    // cycle in graph.

    for(int e = 0; e < E; ++e)

{

        int x = find(subsets, graph->edge[e].src);

        int y = find(subsets, graph->edge[e].dest);

        if (x == y)

            return 1;

        Union(subsets, x, y);

}

    return 0;

}

http://algs4.cs.princeton.edu/15uf/WeightedQuickUnionUF.java.html
http://algs4.cs.princeton.edu/15uf/UF.java.html
http://vancexu.github.io/2015/07/13/intro-to-union-find-data-structure.html

http://hehejun.blogspot.com/2015/02/data-structureunion-find.html

1. union

最简单的union操作就是从当前 p 和 q 一直找到 pRoot 和 qRoot，然后把 pRoot 的 parent设置成qRoot，这样时间复杂度是 O (n)，
如果连接的时候我们一直把 size 较小的那个 tree 连向 size 较大的 tree，可以优化到 O(log n)

2. find

size balancing优化后的复杂度是 O(log n)，我们可以做的优化的是，在我们从 p 到 root 的过程中，把路径上所有节点指向 root，
通过一个简单的递归可以实现，从底部到 root，拿到 root 后再一层一层 return 回来，同时把路径上的节点指向 root。
经过 path compression 的优化后，时间复杂度，可以达到 O(lg *n)，十分接近常数时间复杂度。

3. connected

判断 p 和 q 的 root是不是相等即可

public class UnionFind {

private int[] id; //parent
private int[] sz; //rank

public UnionFind(int size) {
id = new int[size];
sz = new int[size];
for(int i = 0; i < size; i++) {
id[i] = i;
sz[i] = 1;
}
}

public void union(int p, int q) {
int pRoot = find(p);
int qRoot = find(q);
//size balancing
if (sz[pRoot] <= sz[qRoot]) {
id[pRoot] = qRoot;
sz[qRoot] += sz[pRoot];
}
else {
id[qRoot] = pRoot;
sz[pRoot] += sz[qRoot];
}
}

public int find(int i) {
if (id[i] == i)
return i;
//path compression
int parent = find(id[i]);
id[i] = parent;
return parent;
}

public boolean connected(int p, int q) {
return find(p) == find(q);
}

public int size() {
return id.length;
}
}
http://www.cnblogs.com/googlerla/p/4831239.html

几个需要思考的问题：

这种算法通常应用于什么场景？怎么用？

如何将问题归约到union-find或者说union的目的是什么？

怎么union，即两个元素满足什么条件时可以union？

>>两个元素有关联

题目	union 什么	union目的
Graph Valid Tree	一条边的两个顶点	若union两个顶点时发现根一样，说明已经在同一棵树中，说明输入graph存在环，非tree；union结束后，计数有多少个不同的根，当且仅当存在一个根时事vaild tree
Subtree Sum	节点同其父节点	将以指定id节点为根的所有节点union在一起，对这个union中的元素值求和
Number of Islands	两个相邻的1元素	union后计数union集合数量（通过计数union数组中根节点数量）
Surrounded Regions	所有从边界可达的O元素union在一起	union完成后那些没有在边界可达O集合中的O是需要翻转的

https://www.jianshu.com/p/70b721a7e08b

https://medium.com/jeremy-gottfrieds-tech-blog/weighted-union-find-a-fascinating-and-elegant-algorithm-ae61aa0ab1e1

https://algs4.cs.princeton.edu/15uf/WeightedQuickUnionUF.java.html

POJ 1182 - 食物链

带权并查集

分析

代码

hiho#1515 - 分数调查（带权并查集的经典应用)

Union-Find Data Structure Summary

带权并查集

Labels

Popular Posts