Python 3数据类型之字典与集合

最新推荐文章于 2022-05-15 17:22:48 发布

原创最新推荐文章于 2022-05-15 17:22:48 发布 · 510 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#python #dict #set

计算机技术同时被 2 个专栏收录

20 篇文章

订阅专栏

Python

8 篇文章

订阅专栏

本文介绍了Python 3中字典dict和集合set的基本概念、接口定义、实现与操作方法。dict实现了MutableMapping接口，set实现了MutableSet接口，它们都支持Container和Iterable接口，但dict支持Mapping接口，set支持Set接口。文章详细讲解了两者在映射、集合运算、使用方法上的异同，包括下标索引、迭代、哈希及集合操作如交集、并集、差集和对称差集等。同时，讨论了len()方法与bool值在dict和set中的应用。

Python 3数据类型之字典与集合

Python 3中的内置collections类型包括序列,字典dict和集合set/frozenset。序列在https://editor.youkuaiyun.com/md/?articleId=112730263一文中已经详细解析，这里总结字典dict和集合set/frozenset。

映射和集合的接口定义与实现

字典dict和集合set/frozenset实际上是builtins模块中的具体实现类型，它们的接口类定义都位于collections.abc模块中。dict类型实现了接口类collections.abc.MutableMapping定义的方法，它其实是一个映射类型；set类型则实现了接口类collections.abc.MutableSet定义的方法；frozenset类型则实现了接口类collections.abc.Set定义的方法。从接口类名称不难判断，dict和set都是可变类型，因此它们都不支持hash()方法；frozenset则是不可变类型，支持hash()方法。相关接口类层次如下所示。

    builtins.object
        Container
        Hashable
        Iterable
            Iterator
                Generator
            Reversible
                Sequence(Reversible, Collection)
                    ByteString
                    MutableSequence
        Sized
            Collection(Sized, Iterable, Container)
                Mapping
                    MutableMapping
                Set
                    MutableSet

从上面的接口类层次可以看出，序列、字典、集合它们都有共同的接口基类collections.abc.Collection，因此它们都支持Collection定义的方法。但是序列类型支持的接口类collections.abc.Sequence，dict和set并不支持；dict支持的接口类collections.abc.Mapping,序列类型和set并不支持；set支持的接口类collections.abc.Set，序列类型和dict并不支持，这是操作这些类型的区别的根本所在。主要的接口方法总结如下。

Container: 主要方法__contains__(), 用于判断容器是否含有某个元素。需要注意的是对于dict，这个操作针对的是关键字。
Iterable: 主要方法__iter__(), 用于返回一个迭代器。调用iter()函数时，其实也就是调用对象的__iter__()方法。需要注意的是，对于dict，遍历的是关键字。
Reversible: 主要方法__reversed__()，用于返回一个逆向迭代器。调用reverse()函数时，其实也就是调用对象的__reversed__()方法。从下面的测试代码可知，集合类set由于是无序的，因此并不支持逆向遍历。而对于dict，迭代的同样是关键字。
Sized: 主要方法__len__()，用于返回序列长度（元素个数）。调用len()函数时，其实就是调用对象的__len__()方法。
Iterator: 迭代器的基类，除了从Iterable继承的__iter__()方法，还定义了__next__()方法。迭代器遍历正是通过这两个方法实现的。
Mapping: 主要接口方法__getitem__()，这是dict类支持下标索引的基础。但dict类在实现该方法时，传入的参数是关键字，因此dict类的下标索引也是关键字。Set类并未定义该方法，因此集合set并不支持下标索引。
Set: 主要方法定义了集合运算，包括__and__(), __or__(), __sub__(), __xor__()。因此对于set类型的对象，可以使用’&,|, ^, -'等运算,其含义分别对应于数学集合中的交集、并集、对称差集以及差集。

序列，字典与集合的接口方法的主要差别再总结如下。

序列和dict都支持下标索引方式，但dict的下标为关键字。set不支持下标索引方式。本质原因在于是否支持接口方法__getitem__。
序列和dict都支持正方向迭代器iter()和reversed()，set只支持iter(),不支持reversed()。本质原因在于是否实现Reversible接口的__reversed__()方法。
list, dict, set都不支持hash(),其它序列和frozenset支持hash()。本质原因在于是否实现Hashable接口的__hash__()方法。
set/frozenset类支持’&, |, ^, -‘运算，序列和dict并不支持;序列支持拼接运算’+, *’, dict和set并不支持。

import collections
print(issubclass(dict,collections.abc.MutableMapping))
print(issubclass(dict,collections.abc.Hashable))
print(issubclass(dict,collections.abc.Reversible))
print(issubclass(set,collections.abc.MutableSet))
print(issubclass(set,collections.abc.Hashable))
print(issubclass(set,collections.abc.Reversible))
print(issubclass(frozenset,collections.abc.Set))
print(issubclass(frozenset,collections.abc.Hashable))
print(issubclass(frozenset,collections.abc.Reversible))

True
False
True
True
False
False
True
True
False

具体的实现类dict和set又提供了一些额外的操作方法，这些操作方法可能只有这种实现才提供。在之前序列的解析中，也可以看到类似的设计，即接口类定义了一些方法，实现类除了实现接口规定的方法，还提供一些自己的操作方法。由此可以看出，Python明确地把接口方法和实现相关方法分开了。这样，如果用户要自己实现collection类，最低要求就是实现接口类定义的方法。关于dict类和set类的操作方法，可通过命令help(dict)和help(set)分别查看。

字典和集合的使用

set和frozenset除了可变与不可变之外，操作上并无太大区别。因此下面主要以set为例。dict和set都采用{}来表示，但是dict是key:value的形式。

# set
print(type({1,2,3}))
s={'hello'}
print(type(s),s)
s={*'hello'}
print(type(s),s)
#dict
print(type({1:'val1', 2:'val2'}))
d={'name': 'Tom', 'age':30}
print(type(d), d)

<class 'set'>
<class 'set'> {'hello'}
<class 'set'> {'e', 'h', 'o', 'l'}
<class 'dict'>
<class 'dict'> {'name': 'Tom', 'age': 30}

set和dict都用{}表示，那么字面值对象’{}'到底是set对象还是dict对象呢？答案是dict。如果要表示空的set对象，则应该用set()。当然，空的字典也可以用dict()。

print(type({}))
print(type(set()))
print(type(dict()))

<class 'dict'>
<class 'set'>
<class 'dict'>

序列中的tuple和list的元素可以是任意对象类型，这对set和dict则不完全适用。set要求集合中的元素对象必须支持hash()方法，因此不能将list、set和dict作为set的元素。类似地，dict也要求每个元素的关键字对象必须支持hash()方法。这是因为set内部会采用元素的hash值来表示元素，dict内部也采用关键字的hash值来表示关键字。

# set
s={(1,2,3), 'abc', 'def', 123}
for x in s:
    print('set element:', type(x), x)
    
try:
    s={[1, 2, 3], 'abc'}
except TypeError as e:
    print(e)
    pass

# dict
d={'list': [1, 2, 3], 'set': {*'hello'}, (1, 2):'tuple indexed value'}
print('list in dict:', type(d['list']), d['list'])
print('set in dict:', type(d['set']), d['set'])
print('tuple indexed item in dict:', type(d[(1,2)]), d[(1,2)])

try:
    d={[1, 2, 3]:'list indexed value'}
except TypeError as e:
    print(e)
    pass

set element: <class 'int'> 123
set element: <class 'str'> abc
set element: <class 'tuple'> (1, 2, 3)
set element: <class 'str'> def
unhashable type: 'list'
list in dict: <class 'list'> [1, 2, 3]
set in dict: <class 'set'> {'e', 'h', 'o', 'l'}
tuple indexed item in dict: <class 'str'> tuple indexed value
unhashable type: 'list'

由此也不难知道，如果两个同类型的元素或者关键字的hash值一致，则set和dict就不能区分。因此，set要求同类型元素的hash值唯一，如果有重复的只会存储一个。dict则要求同类型的关键字的hash值唯一。不过，不同类型的元素或者关键字hash值还是可以重复的。比如下面的例子中，字符串’hello’和字节串b’hello’的hash值虽然一样，但它们是不同类型，因此可以在set中用作元素和dict中用作关键字。

print(hash('hello'))
print(hash(b'hello'))
s={'hello', b'hello', 'hello', 'world'}
print(s, len(s))
d={'hello':1, b'hello':2, 'hello': 3, 'world': 4}
print(d, len(d))

-4264248251498942331
-4264248251498942331
{'world', 'hello', b'hello'} 3
{'hello': 3, b'hello': 2, 'world': 4} 3

set支持常见集合的操作，如交集、并集、差集和对称差集。可以调用对象方法来进行运算，也可以通过运算符实现。对应关系如下。

‘-’, difference(): 差集，即属于第一个集合而不属于第二个集合的元素集
‘&’, intersection(): 交集
‘|’, union(): 并集，至少属于其中一个集合的元素集
‘^’, symmetric_difference(): 对称差集，只属于一个集合的元素集

此外还有一些判断集合关系的方法。

isdisjoint(): 判断两个集合是否没有交集
issubset(): 判断当前集合是否是另一个集合的子集
issuperset(): 判断当前集合是否包含另一个集合

s1={1,2,3,4,5}
s2={4,5,6,7,8}
s3={4,5}
print(s1-s2)
print(s2-s1)
print(s1&s2)
print(s1|s2)
print(s1^s2)
print(s1.isdisjoint(s3))
print(s1.issuperset(s3), s1 > s3)
print(s1.issubset(s3), s1 < s3)

{1, 2, 3}
{8, 6, 7}
{4, 5}
{1, 2, 3, 4, 5, 6, 7, 8}
{1, 2, 3, 6, 7, 8}
False
True True
False False

集合set的常用的其它操作如下。

add(): 添加一个元素
discard(): 删除一个元素
clear: 删除所有元素
copy(): 浅拷贝，类似list的copy方法

s={1,2,3}
s.add(4)
print(s)
s.discard(1)
print(s)
s.clear()
print(s)
s1={1,2,3}
s2={3,2,1}
print(s1==s2, s1>=s2, s1<=s2, s1!=s2, s1>s2,s1<s2)

{1, 2, 3, 4}
{2, 3, 4}
set()
True True True False False False

字典dict除了采用字面值来构造外，还可构造函数进行构造。其构造函数有多种形式。

class dict(object)
 |  dict() -> new empty dictionary
 |  dict(mapping) -> new dictionary initialized from a mapping object's
 |      (key, value) pairs
 |  dict(iterable) -> new dictionary initialized as if via:
 |      d = {}
 |      for k, v in iterable:
 |          d[k] = v
 |  dict(**kwargs) -> new dictionary initialized with the name=value pairs
 |      in the keyword argument list.  For example:  dict(one=1, two=2)

d=dict(key1='value1', key2='value2', key3='value3')
print(type(d), d)
d=dict([(1, 'a'), (2, 'b'), (3, 'c')])
print(type(d), d)
d=dict(((4, 'd'), (5, 'e'), (6, 'f')))
print(type(d), d)

<class 'dict'> {'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}
<class 'dict'> {1: 'a', 2: 'b', 3: 'c'}
<class 'dict'> {4: 'd', 5: 'e', 6: 'f'}

字典可以对关键字和值分开操作，当然也可以对(关键字,值)对进行操作。

items()方法返回关键字,values()方法返回值，items()方法则返回(key,value)
pop()方法从字典中删除匹配关键字的元素
clear()方法清空字典
copy()方法浅拷贝

d=dict(key1='value1', key2='value2', key3='value3')
keys=d.keys()
values=d.values()
items=d.items()
print(type(keys), keys)
print(type(values), values)
print(type(items), items)
it=iter(d)
print(type(it))
# key iterator
for k in it:
    print(k)
for k in keys:
    print(k)
# value
for v in values:
    print(v)
# (key, value) pair
for k,v in items:
    print(k,':',v)
# pop
d.pop('key1')
print(d)
# add
d['key4'] = 'value4'
print(d)
# del
del d['key3']
print(d)
# clear
d.clear()
print(d)

<class 'dict_keys'> dict_keys(['key1', 'key2', 'key3'])
<class 'dict_values'> dict_values(['value1', 'value2', 'value3'])
<class 'dict_items'> dict_items([('key1', 'value1'), ('key2', 'value2'), ('key3', 'value3')])
<class 'dict_keyiterator'>
key1
key2
key3
key1
key2
key3
value1
value2
value3
key1 : value1
key2 : value2
key3 : value3
{'key2': 'value2', 'key3': 'value3'}
{'key2': 'value2', 'key3': 'value3', 'key4': 'value4'}
{'key2': 'value2', 'key4': 'value4'}
{}

字典还有一个重要的用途是用做函数参数。在定义函数时，如果如果在参数前面加’**’，则表明这个参数是dict类型，因此调用此函数时，会从调用函数的参数行来构造这个dict类型的参数对象。如果调用处是一个字典对象，则需要用’**'解开再调用。如下所示。

def test_func_kw(**kwargs):
    print('test_func_kw kwargs', type(kwargs), kwargs)
    
test_func_kw(arg1=1, arg2=2, arg3=3)
test_func_kw(**{'one':'1', 'two': '2'})

test_func_kw kwargs <class 'dict'> {'arg1': 1, 'arg2': 2, 'arg3': 3}
test_func_kw kwargs <class 'dict'> {'one': '1', 'two': '2'}

len()方法与bool值

在条件判断中，都会用到bool值。对于数字类型，Number接口定义了__bool__()方法。因此数字类型能够自动转换为bool值从而直接用在条件判断中。而collections里面的接口类则没有__bool__()方法，但collections里面的Sized接口类定了__len__()方法，通过该方法的返回值也能构造bool值。因此collections内置类型也可直接用于条件判断，此时判断的是collections是否为空。见下面的例子。

def collection_is_empty(c):
    if c:
        print(type(c), c, 'is not empty')
    else:
        print(type(c), c, 'is empty')
        
collection_is_empty('')
collection_is_empty('123')
collection_is_empty(())
collection_is_empty((1,2,3))
collection_is_empty([])
collection_is_empty([1,2,3])
collection_is_empty(set())
collection_is_empty({1,2,3})
collection_is_empty({})
collection_is_empty({1:'a',2:'b',3:'c'})

<class 'str'>  is empty
<class 'str'> 123 is not empty
<class 'tuple'> () is empty
<class 'tuple'> (1, 2, 3) is not empty
<class 'list'> [] is empty
<class 'list'> [1, 2, 3] is not empty
<class 'set'> set() is empty
<class 'set'> {1, 2, 3} is not empty
<class 'dict'> {} is empty
<class 'dict'> {1: 'a', 2: 'b', 3: 'c'} is not empty