Python序列类型之集合set

原创已于 2025-05-18 22:36:02 修改 · 822 阅读

5 ·

CC 4.0 BY-SA版权

文章标签：

#机器学习前导

于 2025-04-21 22:19:07 首次发布

机器学习专栏收录该内容

14 篇文章

订阅专栏

来源： “码农不会写诗”公众号
链接：Python序列类型之集合set

文章目录

01 基本概念
02 集合创建
03 集合操作
04 集合运算
05 拓展1集合set的优势

集合Set

集合(Set)是一个无序的、不包含重复元素的容器。通常用于去重、集合运算（如并集、交集、差集等）等。

01 基本概念

即一个无序的、不包含重复元素的集合体，一旦创建元素不能被更改。

02 集合创建

set创建集合
需要注意的是，集合set符号为{}，与字典dict相同，且直接使用{}创建空序列时默认为字典，因此创建空集合必须使用set()函数，不能使用花括号{}

empty_set = set()
print(type(set()))  # <class 'set'>
print(type({}))     # <class 'dict'>

{}创建集合

#在{}中采用set形式的值时，自动推导为set类型
s = {"hello", "python", "world"}
print(type(s))  # <class 'set'>

从列表、元组、推导致等可迭代对象创建集合

s1 = set([1, 2, 2, 3, 4, 4, 4])  # 重复元素自动去除
print(s1)  # {1, 2, 3, 4}
s2 = set((1, 2, 2))
print(s2)  # {1, 2}
s3 = {x**2 for x in range(5)}
print(s3)  # {0, 1, 4, 9, 16}

03 集合操作

添加单个元素s.add(x)
以下简单列举需要注意的几点内容
将元素x添加到集合s中，如果元素已存在，则不进行任何操作。

s = {"hello", "python"}
s.add("world")
print(s)  # {'hello', 'world', 'python'}

s = {"hello", "python"}
s.add("hello")
print(s)   # {'hello', 'python'}

s = {"hello", "python"}
s.add(1)
print(s)  # {1, 'hello', 'python'}

添加多个元素s.update(xs)
参数xs可以是列表，元组，字典等，xs 可以有多个，用逗号分开。

s = {"hello", "python"}
s.update({"hello", "world"})
print(s)  # {'hello', 'world', 'python'}

s = {"hello", "python"}
s.update([1, 4], [5, 6])
print(s)  # {1, 'hello', 4, 5, 6, 'python'}

移除元素s.remove(x)
元素 x 从集合 s 中移除，如果元素不存在，则会发生错误。

s = {"hello", "python", "world"}
s.remove("python")
print(s)  # {'hello', 'world'}
s.remove("Hi")   # 不存在会发生错误

移除集合中指定元素s.discard(x)
如果元素不存在，不会发生错误。

s = {"hello", "python", "world"}
s.discard("Hi")  # 不存在不会发生错误
print(s)  # {'hello', 'world', 'python'}
s.discard("python") 
print(s)  # {'hello', 'world'}

随机删除集合中的一个元素s.pop()

s = {"hello", "python", "world"}
x = s.pop()
print(x)  # hello
print(s)  # {'world', 'python'}

判断x和y是否包含相同的元素x.isdisjoint(y)

s1 = {"hello", "python"}
s2 = {"hello", "world"}
s3 = {"python"}
print(s1.isdisjoint(s2))  # False
print(s2.isdisjoint(s3))  # True

判断x是否为y的子集x.issubset(y)

s1 = {"hello", "python", "world"}
s2 = {"hello", "world"}
s3 = {"python", "Hi"}
print(s2.issubset(s1))  # True
print(s3.issubset(s1))  # False

判断x是否为y的子集x.issuperset(y)

s1 = {"hello", "python", "world"}
s2 = {"hello", "world"}
s3 = {"python", "Hi"}
print(s1.issuperset(s2))  # True
print(s1.issuperset(s3))  # False

04 集合运算

并集: s.union() 或运算符’|’

s1 = {"hello", "world"}
s2 = s1.union({"hello", "python"})
print(s2)  # {'hello', 'world', 'python'}

s1 = {"hello", "world"}
s2 = {"hello", "python"}
print(s1 | s2)  # {'hello', 'world', 'python'}

交集: s.intersection() 或运算符’&’

s1 = {"hello", "world"}
s2= s1.intersection({"hello", "python"})
print(s2)  # {'hello'}

s1 = {"hello", "world"}
s2 = {"hello", "python"} 
print(s1 & s2)   # {'hello'}

差集: s.difference() 或运算符’-’

s1 = {"hello", "python", "world"}
s2 = s1.difference({"hello", "world"})
print(s2)  # {'python'}

s1 = {"hello", "python", "world"}
s2 = {"hello", "world"}
print(s1 - s2)  # {'python'}

对称差集s.symmetric_difference() 或运算符’^’
即不同时包含于两个集合的元素

s1 = {"hello", "python", "world"}
s2 = s1.symmetric_difference({"hello", "world"})
print(s2)  # {'python'}

s1 = {"hello", "python", "world"}
s2 = {"hello", "world"}
print(s1 ^ s2)  # {'python'}

05 拓展1集合set的优势

集合的一个重要应用是快速去除列表中的重复元素。此外集合还可用于判断元素是否存在，由于其底层实现基于哈希表，查找速度非常快。
1. 底层实现
集合底层实现基于哈希表。当一个元素被添加到集合中时，会经过哈希函数的计算得到一个唯一的哈希值，其决定了该元素在内存中的存储位置。
2. 元素查找
集合进行查找时，对给定元素应用哈希函数得到哈希值，然后直接定位到对应的内存位置检查该元素是否存在，所以查找操作的时间复杂度可以近似看作是常数时间，即 O (1)。如果两个元素的哈希值相同（这种情况被称为哈希冲突，会进一步处理，以后有机会再分享）。

今天的内容就到这里啦，先拜了个拜~