小繁的Union_Find Set学习笔记

最新推荐文章于 2026-01-01 10:51:21 发布

原创最新推荐文章于 2026-01-01 10:51:21 发布 · 207 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#学习 #python #开发语言 #数据结构

数据结构算法笔记专栏收录该内容

3 篇文章

订阅专栏

并查集是一种用于处理不相交集合合并与查询的数据结构，常以森林表示。文章详细介绍了Find和Union操作，以及QuickFind和QuickUnion两种实现方式。针对性能问题，提出了路径压缩和使用size、rank标记进行优化的策略。最后，展示了LeetCode相关问题的应用示例。

文章目录

并查集是一种树型的数据结构，用于处理一些不相交集合（Disjoint Sets）的合并及查询问题。常常在使用中以森林来表示。

并查集的操作

Find：确定元素属于哪一个子集。它可以被用来确定两个元素是否属于同一子集。
Union：将两个子集合并成同一个集合。
Union的过程需要先通过Find判断是否为同一子集，不是的话就Union
在并查集中，只有父节点是自联通的。

并查集的实现

初始化

初始化是将每个元素看作一个集合，并将元素本身作为集合中的代表元素。

for i in range(N):#假设有N个结点
    parent[i] = i #初始化每个结点的根结点是本身

Find

查询过程就是查询元素所在集合的代表元素的过程。

递归实现

def find(x):
    if parents[x] == x:
        return x
    else:
        return find(parents[x])

循环实现

def find(x):
    while parents[x] != x:
        x = parents[x]
	return x

Union

Quick Find方式实现的并查集

Quick Find方式实现的并查集查询操作快，合并比较慢。
我们通过一个数组来实现一个并查集，数组索引作为数据编号：

在这里插入图片描述

0、1、2、3、4 属于一个集合，5、6、7、8、9属于一个集合。

4 和 5 两个元素就不属于同一个集合(或者不相连)，因为他们对应的编号不一样。4 对应的编号是 2 ，5对应的编号是 4
如果要合并两个集合union(1,5)，因为 1 和 5 是属于两个不同的集合
合并后，以前分别和元素 1 连接的元素；和 5 连接的元素，也都连接起来了：
![[Pasted image 20230117165349.png]]

根据上面的描述得知，基于上面实现方案的并查集，查询操作的时间复杂度为 $\mathcal{O}(1)$ ，合并操作的时间复杂度为 $\mathcal{O}(n)$
Python实现

for i in range(N):#假设有N个结点
    parent[i] = i #初始化每个结点的根结点是本身
    
def find(x, parent):
    return parent[x]

def Union(p,q):
	p = find(p)
	q = find(q)
	
	if p == q:
		return
		
	for i in range(len(arr)):
		if parent[i] == p:
			parent[i] = q

Quick Union方式实现的并查集

通过树形结构来描述节点之间的关系，底层存储通过数组来存储。

以前我们介绍到树都是父节点指向子节点的，这里我们是通过子节点来指向父节点，根节点指向它自己。

数组索引用来表示元素编号，存储的是元素编号对应的父节点编号。如下图所示：
![[Pasted image 20230117172238.png]]
从上图可以看出，每个节点的父节点编号都是它自己，说明每个节点都是一个根节点，那么这个数组就表示一个森林：
![[Pasted image 20230117172300.png]]

例如：合并 1、2，合并3 和 4，就变成：
![[Pasted image 20230117172322.png]]
合并 1、3 ，找到 1 和 3 的对应的根节点，然后让 1 的根节点指向 3 的根节点：
![[Pasted image 20230117172346.png]]

从上面的分析，合并和查找操作的时间复杂度为 $\mathcal{O}(h)$ ，h就是树的高度。
相对 Quick Find 实现的并查集 Quick Union 实现的并查集牺牲了一点查找的性能，提高了合并的性能。
Python 实现

for i in range(N):#假设有N个结点
    parent[i] = i #初始化每个结点的根结点是本身
    
def find(x, parent):
    r = x # 假设根就是当前的结点
    while r != parent[r]: # 如果假设不成立（r不是根节点），就继续循环
        r = parent[r] # 假设根节点是当前节点的父节点，即往树的上面走一层
    return r # 循环结束了，根也就找到了

def Union(p,q):
	p = find(p)
	q = find(q)
	
	if p == q:
		return
		
	parent[p] = q

路径压缩

优化Find函数

当并查集的数量很多并形成一条链时，查找函数的效率会非常低，我们可以考虑优化Find函数解决该问题。
例如下面的例子

father[1] = 1;
father[2] = 1;
father[3] = 2;
father[4] = 3;

如果只是为了查找father节点，可以将所有节点的father都改为

father[1] = 1;
father[2] = 1;
father[3] = 1;
father[4] = 1;

变化过程如图所示
![[Pasted image 20230118171000.png]]

这样就相当于将Find路径上所有节点的father都指向同一个结点。
转换过程分为以下步骤

获得当前节点x的father节点r
从当前节点x不断寻找father节点，将Find路径上的所有节点的father全部改为节点r
Python实现

def find(x):
	a = x
	while x != father[x]:
		x = father[x]

	#此时x保存father节点，a保存当初的节点x，并开始find其他father节点为x的节点，找到保存到z并修改father节点
	while a != father[a]:
		z = a
		a = father[a]
		father[z] = x

	return x

size标记优化Union函数

该方法将每个节点的father添加size标记，表示该节点所属的并查集（树结构）内含有的节点个数
基本思路就是将节点个数少的树往节点个数多的树去合并
注意初始化的时候需要添加size标记操作
Python代码实现

parent = [0]*len(M)
size = [0]*len(M)#size用来记录每棵树包含的节点数
for i in range(len(M)):
            parent[i] = i
            size[i] = 1 # 一开始只有一个节点，因此初始化节点数量为1

def union(x, y):
    x_root = find(x)
    y_root = find(y)

    if x_root != y_root:
        # 谁的节点数多，谁就做根节点
        if size[x_root] > size[y_root]:
            parent[y_root] = x_root
            size[x_root] += size[y_root]
        else:
            parent[x_root] = y_root
            size[y_root] += size[x_root]

rank标记优化Union函数

rank标记与size标记的不同在于标记内容的不同
rank标记表示father节点的并查集中树的高度
比如对下面两个并查集合并
![[Pasted image 20230118173553.png]]
由于rank(A) > rank(F) ，因此令pre[F]= A。合并后的图形如下图所示：
![[Pasted image 20230118173644.png]]
合并前两个树的最大高度为3，合并后依然是3，这也就达到了我们的目的。但如果令pre[A]= F，那么就会使得合并后的树的总高度增加
Python代码实现

parent = [0]*len(M)
rank = [0]*len(M)#size用来记录每棵树的高度
for i in range(len(M)):
            parent[i] = i
            rank[i] = 1 # 一开始只有一个节点，因此初始化树高度为1

def union(x, y):
    x_root = find(x)
    y_root = find(y)

    if x_root != y_root:
        # 谁的高度高，谁就做根节点
        if rank[x_root] > rank[y_root]:
            parent[y_root] = x_root  
        else:
	        if rank[x_root] == rank[y_root]:
		        rank[y_root] += 1 #如果x,y高度相同，需要更新rank
            parent[x_root] = y_root

LeetCode 990等式方程的可满足性

Union操作：遍历含有等号的元素，并合并为一个set
Find操作：查询不等元素的father，如果相等直接return False

class Solution:
    def equationsPossible(self, equations: List[str]) -> bool:
        union_find = [0]*26
        for i in range(26):
            union_find[i] = i
        def find(x):
            while x != union_find[x]:
                x = union_find[x]
            return x
        def union(x,y):
            x = find(x)
            y = find(y)

            if x == y:return
            union_find[x] = y

        for s in equations:
            if s[1] == '=':
                union(ord(s[0])-ord('a'),ord(s[3])-ord('a'))
        
        for s in equations:
            if s[1] == '!':
                if find(ord(s[0])-ord('a')) == find(ord(s[3])-ord('a')):
                    return False
        
        return True

优化Find函数

class Solution:
    def equationsPossible(self, equations: List[str]) -> bool:
        union_find = [0]*26
        for i in range(26):
            union_find[i] = i

        def find(x):
            a = x
            while x != union_find[x]:
                x = union_find[x]
            
            while a != union_find[a]:
                z = a
                a = union_find[a]
                union_find[z] = x
            
            return x

        def union(x,y):
            x = find(x)
            y = find(y)

            if x == y:return
            union_find[x] = y

        for s in equations:
            if s[1] == '=':
                union(ord(s[0])-ord('a'),ord(s[3])-ord('a'))
        
        for s in equations:
            if s[1] == '!':
                if find(ord(s[0])-ord('a')) == find(ord(s[3])-ord('a')):
                    return False
        
        return True

LeetCode 面试题17.07 婴儿名字
该并查集为链式结构

class Solution:
    def trulyMostPopular(self, names: List[str], synonyms: List[str]) -> List[str]:
        union_find_set = {}
        res_dic = collections.defaultdict(int)
        result = []

        #根据synonyms数组构建并查集，相同的名字对合并成set，父节点为set中字典序最小的，find过程为从根节点向上查询
        for name in synonyms:
            a , b = name.strip('(').strip(')').split(',') #字符串处理
            while a in union_find_set:
                a = union_find_set[a]
            while b in union_find_set: #寻找并查集父节点
                b = union_find_set[b]

            if a == b:continue

            if a > b:
                union_find_set[a] = b #谁小谁做父节点
            else:
                union_find_set[b] = a

        for s in names:
            name , count =  s.strip(')').split('(')
            count = int(count)
            #字符串处理成str和int
            while name in union_find_set:
                name = union_find_set[name]
            #find父节点
            res_dic[name] += count

        for key , val in res_dic.items():
            result.append(key + '(' + str(val) + ')') #答案编码成str存入result数组

        return result