并查集（并茶几）的应用

一、What‘s that？

并查集是一种树型的数据结构，用于处理一些不相交集合（Disjoint Sets）的合并及查询问题。常常在使用中以森林来表示。 ——百度百科

二、How to uphold

0.我们的需求

一开始我们拥有一个集族（集合的集合），集族中所有集合不相交，每个集合有且仅有一个元素。
在这里插入图片描述
~~百度百科告诉我们：~~
并查集至少需要支持两种操作：

1.合并某两个集合a,b。

2.询问两个元素x,y是否属于同一集合。

1.可行的想法

我们能够想到用树的结构存储这样一个集合，每一次的合并就相当于将其中一棵树的根的父亲改为另一子树的根的编号。

记录一个值f[i]表示i的父亲结点（当i为根时，将父亲视为自己，也就是f[i]=i）。
所以我们有一个性质：若f[i]=i，则i为此树的根。
初始时所有的点的f[i]=i。

2.初始化

void init(){ for (int i=1;i<=n;i++) f[i]=i; }

3.查找树根

那么如何寻找一个结点所在的树的根呢？
Q：只要沿着f[i]一直向上跳到f[i]=i，此时的i结点便是树根

int find(int x){ return f[x]==x?x:find(f[x]);}    
//1.如果f[x]==x，返回树根为x; 2.如果f[x]!=x，继续搜索x的祖先f[x]是否为树根

维护合并

接下来维护集合的合并操作。
例如我们需要合并2和6所在的集合：

那么只需要将2子树的根结点（1结点），与6子树的根结点（5结点）连接即可：

void Union(int x,int y)
{int xx=find(x),yy=find(y);                   //找到两树的根 if (xx!=yy) { f[yy]=xx; }                    //如果两树的根不同，合并两棵树 
}

5.优化原始并茶几

这就是并查集的“初始版本”了。
但我们发现它每一次的询问时间复杂度可能达到O(n)。
因为当数据退化为一条链的时候，每一次对链尾的find的次数是O(size)

a.按秩合并

所以这样一个数据结构在我们看来并不优，尝试着优化。
我们发现一棵较高的树x与一棵较低的树y合并时，应该从较高的树向较低的树连边，使其合并。是为按秩合并。（f[y]=x）

void Union(int x,int y)
{int xx=find(x),yy=find(y);                if (xx!=yy) { if (rank[xx]>rank[yy]) f[yy]=xx; //按秩合并，rank[xx]表示xx树的高度 else f[xx]=yy;                           }                   
}

这一优化据说能将时间复杂度降低为O(logn)

b.路径压缩

还有一个更简单且高效的优化：路径压缩。
我们发现我们并不需要知道我们如何沿着f[i]向上跳，我们只需要找到最终的root就能够完成任务了。所以我们把f[i]的定义改为记录i的祖先结点的编号，在我们每一次find树根的时候，更新f[i]的值为树根，如此一来，当我们再次find(i)时，就能一步找到树根的位置。

int find(int x){ return f[x]=(f[x]==x?x:find(f[x])); }    
//将得到的值直接赋予f[x]即可

这样路径压缩优化过后，每一次find时间复杂度理论上为O(1)，union的时间也是O(1)，这样我们就完成了一个能够维护部分集合操作的优秀数据结构。
（一般情况下，路径压缩后的时间复杂度已经能达到O(1)，所以通常不会写较为麻烦的按秩合并，so以下的所有代码舍弃按秩合并的优化。）

三、How to use it？

并查集是一个短小精悍的数据结构，因此在各类竞赛中的出现频率还是比较高的，并查集也有一些巧妙的思路与方法。

一般的并查集能够维护：

子树内信息。（通常在树根上维护）
从子结点到根的信息。

1.首先来一道裸题：亲戚

问题描述
若某个家族人员过于庞大，要判断两个是否是亲戚，确实还很不容易，现在给出某个亲戚关系图，求任意给出的两个人是否具有亲戚关系。
规定
x和y是亲戚，y和z是亲戚，那么x和z也是亲戚。如果x,y是亲戚，那么x的亲戚都是y的亲戚，y的亲戚也都是x的亲戚。

数据输入：
第一行：三个整数n,m,p，（n<=5000,m<=5000,p<=5000），分别表示有n个人，m个亲戚关系，询问p对亲戚关系。
以下m行：每行两个数Mi，Mj，1<=Mi，Mj<=N，表示Ai和Bi具有亲戚关系。
接下来p行：每行两个数Pi，Pj，询问Pi和Pj是否具有亲戚关系。
数据输出：
P行，每行一个’Yes’或’No’。表示第i个询问的答案为“具有”或“不具有”亲戚关系
样例：
input
6 5 3
1 2
1 5
3 4
5 2
1 3
1 4
2 3
5 6
output
Yes
Yes
No

solution：

一题并查集裸题，只是为了演示整体代码。明显只是维护集合的合并，询问结点连通性。

#include <vector>
#include <list>
#include <map>
#include <set>
#include <deque>
#include <queue>
#include <stack>
#include <bitset>
#include <algorithm>
#include <functional>
#include <numeric>
#include <utility>
#include <sstream>
#include <iostream>
#include <iomanip>
#include <cstdio>
#include <cmath>
#include <cstdlib>
#include <cctype>
#include <string>
#include <cstring>
#include <ctime>
#include <cassert>
#include <string.h>
//#include <unordered_set>
//#include <unordered_map>#define MP(A,B) make_pair(A,B)
#define PB(A) push_back(A)
#define SIZE(A) ((int)A.size())
#define LEN(A) ((int)A.length())
#define FOR(i,a,b) for(int i=(a);i<(b);++i)
#define fi first
#define se secondusing namespace std;template<typename T>inline bool upmin(T &x,T y) { return y<x?x=y,1:0; }
template<typename T>inline bool upmax(T &x,T y) { return x<y?x=y,1:0; }typedef long long ll;
typedef unsigned long long ull;
typedef long double lod;
typedef pair<int,int> PR;
typedef vector<int> VI;const lod eps=1e-11;
const lod pi=acos(-1);
const int oo=1<<30;
const ll loo=1ll<<62;
const int mods=1e9+7;
const int INF=0x3f3f3f3f;//1061109567
const int MAXN=20005;
/*--------------------------------------------------------------------*/ 
//请省略以上部分int f[MAXN];
inline int read()
{int f=1,x=0; char c=getchar();while (c<'0'||c>'9') { if (c=='-') f=-1; c=getchar(); }while (c>='0'&&c<='9') { x=(x<<3)+(x<<1)+c-'0'; c=getchar(); }return x*f;
}
int find(int x){ return f[x]=(f[x]==x?x:find(f[x])); }
void Union(int x,int y)
{int xx=find(x),yy=find(y);                if (xx!=yy) f[xx]=yy;             
}
void init(int n){ for (int i=1;i<=n;i++) f[i]=i; } 
int main()
{int n=read(),m=read(),Case=read();init(n);for (int i=1;i<=m;i++){int x=read(),y=read();Union(x,y);}while (Case--){int x=read(),y=read();if (find(x)==find(y)) puts("Yes");else puts("No");}return 0;
}

2.一道稍稍不同的题：POJ1988 Cube Staking

Description
Farmer John and Betsy are playing a game with N (1 <= N <= 30,000)identical cubes labeled 1 through N. They start with N stacks, each containing a single cube. Farmer John asks Betsy to perform P (1<= P <= 100,000) operation. There are two types of operations:
moves and counts.

In a move operation, Farmer John asks Bessie to move the stack containing cube X on top of the stack containing cube Y.
In a count operation, Farmer John asks Bessie to count the number of cubes on the stack with cube X that are under the cube X and report that value.

Write a program that can verify the results of the game.

Input

Line 1: A single integer, P
Lines 2…P+1: Each of these lines describes a legal operation. Line 2 describes the first operation, etc. Each line begins with a ‘M’ for a move operation or a ‘C’ for a count operation. For move operations, the line also contains two integers: X and Y.For count operations, the line also contains a single integer: X.

Note that the value for N does not appear in the input file. No move operation will request a move a stack onto itself.

Output
Print the output from each of the count operations in the same order as the input file.

Sample Input
6
M 1 6
C 1
M 2 4
M 2 6
C 3
C 4

Sample Output
1
0
2
Source
USACO 2004 U S Open

Solution

题意：
有两种操作：

将一堆立方体x堆在另一堆立方体y上面。
询问x下面有多少个立方体。

这一题需要维护两个额外的信息：

r[i]表示从i结点到根节点一共有多少结点（i到根的距离+1）
num[i]表示子树一共有多少结点。

只需要稍稍更改find的过程即可。

#include <vector>
#include <list>
#include <map>
#include <set>
#include <deque>
#include <queue>
#include <stack>
#include <bitset>
#include <algorithm>
#include <functional>
#include <numeric>
#include <utility>
#include <sstream>
#include <iostream>
#include <iomanip>
#include <cstdio>
#include <cmath>
#include <cstdlib>
#include <cctype>
#include <string>
#include <cstring>
#include <ctime>
#include <cassert>
#include <string.h>
//#include <unordered_set>
//#include <unordered_map>#define MP(A,B) make_pair(A,B)
#define PB(A) push_back(A)
#define SIZE(A) ((int)A.size())
#define LEN(A) ((int)A.length())
#define FOR(i,a,b) for(int i=(a);i<(b);++i)
#define fi first
#define se secondusing namespace std;template<typename T>inline bool upmin(T &x,T y) { return y<x?x=y,1:0; }
template<typename T>inline bool upmax(T &x,T y) { return x<y?x=y,1:0; }typedef long long ll;
typedef unsigned long long ull;
typedef long double lod;
typedef pair<int,int> PR;
typedef vector<int> VI;const lod eps=1e-11;
const lod pi=acos(-1);
const int oo=1<<30;
const ll loo=1ll<<62;
const int mods=1e9+7;
const int INF=0x3f3f3f3f;//1061109567
const int MAXN=30005;
/*--------------------------------------------------------------------*/
//请省略以上部分int f[MAXN<<1],r[MAXN<<1],num[MAXN<<1];
inline int read()
{int f=1,x=0; char c=getchar();while (c<'0'||c>'9') { if (c=='-') f=-1; c=getchar(); }while (c>='0'&&c<='9') { x=(x<<3)+(x<<1)+c-'0'; c=getchar(); }return x*f;
}
inline char readc()
{char c=getchar();while (!isalnum(c)) c=getchar();return c;
}
inline int find(int x)
{if (f[x]==x) return x;int tmp=f[x];f[x]=find(f[x]);r[x]+=r[tmp];return f[x];
}
int main()
{int Case=read();for (int i=1;i<=MAXN;i++) f[i]=i,num[i]=1,r[i]=0;for (int i=1;i<=Case;i++){char c=readc();if (c=='M'){int x=read(),y=read();int xx=find(x),yy=find(y);if (xx!=yy){f[yy]=xx;r[yy]=num[xx];num[xx]+=num[yy];}}else {int x=read();int xx=find(x);printf("%d\n",num[xx]-r[x]-1);}} return 0;
}