目录
二、pickle 模块:处理 python 所有数据,只能用于 python
一、序列化基础知识
序列化 :把对象(变量)从内存中变成可存储或传输的中间格式的过程称之为序列化
即:python中数据写入文件保存 就是一种序列化过程在 Python 中叫 pickling,在其他语言中也被称之为 serialization,marshalling,flattening 等等
!!注意:把变量内容从序列化的对象重新读到内存里称之为反序列化,即unpickling
序列化优点
- 持久保存状态
- 跨平台数据交互
实现序列化方式 及其优缺点
- json
- 优点:兼容所有语言,可以跨平台交互数据
- 缺点:并不支持所有python类型,只支持常用类型
- pickle:
- 优点:可以支持所有python数据类型
- 缺点:不能跨平台
二、pickle 模块:处理 python 所有数据,只能用于 python
Pickle能将python中所有的数据序列化,但它只能用于Python,并且可能不同版本的Python彼此都不兼容
因此,只能用Pickle保存那些不重要的数据,不能成功地反序列化也没关系。
2-1 pickle序列化
# 序列化 import pickle dic = {'name': 'alvin', 'age': 23, 'sex': 'male'} print(type(dic)) # <class 'dict'> j = pickle.dumps(dic) print(type(j)) # <class 'bytes'> f = open('序列化对象_pickle', 'wb') # 注意是w是写入str,wb是写入bytes,j是'bytes' f.write(j) # 等价于pickle.dump(dic,f) f.close()
2-2 pickle反序列化
# 反序列化 import pickle f = open('序列化对象_pickle', 'rb') data = pickle.loads(f.read()) # 等价于data=pickle.load(f) print(data['age'])
三、json 模块:处理 JSON 字符串
json 模块 :用于处理 json 字符串的模块
json:一种通用的轻量级数据交换格式,本质是字符串
- 如果要在不同的编程语言之间传递对象,就必须把对象序列化为标准格式,比如XML(数据大)
- 但更好的方法是序列化为JSON,因为JSON表示出来就是一个字符串,可以被所有语言读取
- 也可以方便地存储到磁盘或者通过网络传输。
- 且json能支持的数据结构,也就是js支持的数据结构
- JSON不仅是通用的数据交换格式,并且比XML更快,而且可以直接在Web页面中读取,非常方便。
python 和 json 数据类型的对应关系
- python - json
- dict - object(对象 {} )
- list,tuple - array (数组 [])
- str - string("" 注:只能双引号)
- int、float - 123.4(number 数字类型)
- Turt、False - true、false
- None - null
- !!注:json格式,数据类型严格,不支持python的 元组,'',''' '''
json模块常用方法
- 序列化
- dump:处理文件 dump(数据类型)
- dumps:处理字符串 dumps(数据类型,文件对象)
- 反序列化
- load:处理文件 load(数据类型)
- loads:处理字符串 loads(数据类型,文件对象)
3-0 源码信息详解
3-0-1 dump
def dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw): """Serialize ``obj`` as a JSON formatted stream to ``fp`` (a ``.write()``-supporting file-like object). If ``skipkeys`` is true then ``dict`` keys that are not basic types (``str``, ``int``, ``float``, ``bool``, ``None``) will be skipped instead of raising a ``TypeError``. If ``ensure_ascii`` is false, then the strings written to ``fp`` can contain non-ASCII characters if they appear in strings contained in ``obj``. Otherwise, all such characters are escaped in JSON strings. If ``check_circular`` is false, then the circular reference check for container types will be skipped and a circular reference will result in an ``OverflowError`` (or worse). If ``allow_nan`` is false, then it will be a ``ValueError`` to serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``) in strict compliance of the JSON specification, instead of using the JavaScript equivalents (``NaN``, ``Infinity``, ``-Infinity``). If ``indent`` is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. ``None`` is the most compact representation. If specified, ``separators`` should be an ``(item_separator, key_separator)`` tuple. The default is ``(', ', ': ')`` if *indent* is ``None`` and ``(',', ': ')`` otherwise. To get the most compact JSON representation, you should specify ``(',', ':')`` to eliminate whitespace. ``default(obj)`` is a function that should return a serializable version of obj or raise TypeError. The default simply raises TypeError. If *sort_keys* is true (default: ``False``), then the output of dictionaries will be sorted by key. To use a custom ``JSONEncoder`` subclass (e.g. one that overrides the ``.default()`` method to serialize additional types), specify it with the ``cls`` kwarg; otherwise ``JSONEncoder`` is used. """
3-0-2 dumps
def dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw): """Serialize ``obj`` to a JSON formatted ``str``. If ``skipkeys`` is true then ``dict`` keys that are not basic types (``str``, ``int``, ``float``, ``bool``, ``None``) will be skipped instead of raising a ``TypeError``. If ``ensure_ascii`` is false, then the return value can contain non-ASCII characters if they appear in strings contained in ``obj``. Otherwise, all such characters are escaped in JSON strings. If ``check_circular`` is false, then the circular reference check for container types will be skipped and a circular reference will result in an ``OverflowError`` (or worse). If ``allow_nan`` is false, then it will be a ``ValueError`` to serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``) in strict compliance of the JSON specification, instead of using the JavaScript equivalents (``NaN``, ``Infinity``, ``-Infinity``). If ``indent`` is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. ``None`` is the most compact representation. If specified, ``separators`` should be an ``(item_separator, key_separator)`` tuple. The default is ``(', ', ': ')`` if *indent* is ``None`` and ``(',', ': ')`` otherwise. To get the most compact JSON representation, you should specify ``(',', ':')`` to eliminate whitespace. ``default(obj)`` is a function that should return a serializable version of obj or raise TypeError. The default simply raises TypeError. If *sort_keys* is true (default: ``False``), then the output of dictionaries will be sorted by key. sort_keys参数可在输出时候讲数据进行排序处理 To use a custom ``JSONEncoder`` subclass (e.g. one that overrides the ``.default()`` method to serialize additional types), specify it with the ``cls`` kwarg; otherwise ``JSONEncoder`` is used. """
3-0-3 load
def load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw): """Deserialize ``fp`` (a ``.read()``-supporting file-like object containing a JSON document) to a Python object. ``object_hook`` is an optional function that will be called with the result of any object literal decode (a ``dict``). The return value of ``object_hook`` will be used instead of the ``dict``. This feature can be used to implement custom decoders (e.g. JSON-RPC class hinting). ``object_pairs_hook`` is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of ``object_pairs_hook`` will be used instead of the ``dict``. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict will remember the order of insertion). If ``object_hook`` is also defined, the ``object_pairs_hook`` takes priority. To use a custom ``JSONDecoder`` subclass, specify it with the ``cls`` kwarg; otherwise ``JSONDecoder`` is used. """
3-0-4 loads
def loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw): """Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance containing a JSON document) to a Python object. ``object_hook`` is an optional function that will be called with the result of any object literal decode (a ``dict``). The return value of ``object_hook`` will be used instead of the ``dict``. This feature can be used to implement custom decoders (e.g. JSON-RPC class hinting). ``object_pairs_hook`` is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of ``object_pairs_hook`` will be used instead of the ``dict``. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict will remember the order of insertion). If ``object_hook`` is also defined, the ``object_pairs_hook`` takes priority. ``parse_float``, if specified, will be called with the string of every JSON float to be decoded. By default this is equivalent to float(num_str). This can be used to use another datatype or parser for JSON floats (e.g. decimal.Decimal). ``parse_int``, if specified, will be called with the string of every JSON int to be decoded. By default this is equivalent to int(num_str). This can be used to use another datatype or parser for JSON integers (e.g. float). ``parse_constant``, if specified, will be called with one of the following strings: -Infinity, Infinity, NaN. This can be used to raise an exception if invalid JSON numbers are encountered. To use a custom ``JSONDecoder`` subclass, specify it with the ``cls`` kwarg; otherwise ``JSONDecoder`` is used. The ``encoding`` argument is ignored and deprecated. """
3-1 json 序列化
# 序列化 import json dic = {'name': 'alvin', 'age': 23, 'sex': 'male'} print(type(dic)) # <class 'dict'> j = json.dumps(dic) print(type(j)) # <class 'str'> f = open('序列化对象', 'w') f.write(j) # 等价于json.dump(dic,f) f.close()
3-2 json 反序列化
import json f = open('序列化对象') data = json.loads(f.read()) # 等价于data=json.load(f)
3-3 pprint 模块格式输出 json 数据
四、shelve 模块:python自带的序列化工具
shelve 模块 :一种pthon自带的序列化工具,可自动序列化
- 可以直接通过import shelve来引用。
shelve类似于一个存储持久化对象的持久化字典,即字典文件。使用方法也类似于字典。
使用方法
- open
读写
close
!!注意!!
shelve 模块只有一个open函数,返回类似字典的对象,可读可写;
key必须为字符串,而值可以是python所支持的数据类型
4-1 shelve 序列化
# 保存对象至shelve文件中(序列化) import shelve db = shelve.open('shelveDict') # 打开一个文件 db['wangzhe'] = 'wangzhe' # 向文件中添加内容,添加方式与给字典添加键值对相同 db['lijianguo'] = 'lijianguo' db.close() # 关闭文件
4-2 shelve 反序列化
# 从文件中读取对象(反序列化) import shelve db = shelve.open('shelveDict') # 打开文件 a = db.get('wangzhe') print(db['wangzhe']) # 向从字典中获取键的方式一样读取内容 print(db['lijianguo']) # 结果为{'age': 25, 'name': 'lijianguo'} db.close() # 关闭文件
4-3 shelve 更新文件数据
# 更新文件中的数据: import shelve db = shelve.open('shelveDict') # 打开文件 wangzhe = db['wangzhe'] # 从文件中读取之前存储的对象 wangzhe['name'] = 'wang zhe' # 直接对对象进行修改 db['wangzhe'] = wangzhe # 重新存储至字典文件对象中 print(db['wangzhe']) # 结果如下{'age': 24, 'name': 'wang zhe'} db.close() # 关闭文件
五、xml 模块:实现不同语言或程序数据交换协议的模块
xml:xml是实现不同语言或程序之间进行数据交换的协议,跟json差不多,但json使用起来更简单。
语法
- 一、任何的起始标签都必须有一个结束标签。
二、可以采用另一种简化语法,可以在一个标签中同时表示起始和结束标签。
这种语法是在大于符号之前紧跟一个斜线(/),例如<百度百科词条/>。
XML解析器会将其翻译成<百度百科词条></百度百科词条>。
三、标签必须按合适的顺序进行嵌套,所以结束标签必须按镜像顺序匹配起始标签,
<tag1> <tag2> </tag2> </tag1> <!-- 这好比是将起始和结束标签看作是数学中的左右括号 --> <!-- 在没有关闭所有的内部括号之前,是不能关闭外面的括号的-->
四、所有的特性都必须有值。即:特性指的是属性
五、所有的特性都必须在值的周围加上双引号。
使用场景
- 1.配置文件
2.常规的数据交换
与json的区别
- xml的优点
(1)格式统一
(2)容易与其他系统进行远程交互,数据共享比较方便
xml的缺点
(1)xml文件庞大,文件格式复杂,传输占带宽
(2)服务器和客户端都需要花费大量代码来解析xml,导致服务器和客户端代码变得异常复杂且不易维护
(3)客户端和服务端解析xml花费较多的资源和时间
json的优点
(1)数据格式比较简单,易于读写,格式是压缩的,占用带宽小
(2)易于解析,包括JavaScript可以通过简单的通过eval_r()进行json数据的读取
json的缺点
(1)没有xml那么通用
(2)json格式目前还在推广阶段
ElmentTree 表示文件的节点树
- Elment 表示一个节点
属性
1.text : <>text</>
2.attrib : 所有属性
3.tag: 标签的名字
方法
get 获取某个属性的值
5-1 xml 基本操作
import xml.etree.ElementTree as et # 获取节点树 tree = et.parse('TEST.xml') # 读取xml到内存,的到一个包含所有数据的节点树 print(tree) # 查找标签 root = tree.getroot() # 获取根标签 print(root.iter('year')) # 全文搜索,返回迭代器对象 print(root.find('country')) # 在root的子节点找,只找一个,且只找第一个 print(root.findall('country')) # 在root的子节点找,找所有 # 遍历xml文件 for country in root: print(country.tag, country.attrib, country.text) for t in country: print(t.tag, t.attrib, t.text) # 获取一个属性 print(root.find('country').get('name')) # 修改xml文件 # 读取到内存 # 并在内存中修改 tree = et.parse('TEST.xml') for country in tree.findall('country'): yeartag = country.find('year') yeartag.text = str(int(yeartag.text) + 1) # 写回文件 tree.write('TEST.xml', encoding='utf-8', xml_declaration=False) for node in root.iter('year'): new_year = int(node.text) + 1 node.text = str(new_year) node.set('updated', 'yes') node.set('version', '1.0') tree.write('test.xml') # 删除 tree = et.parse('TEST.xml') root = tree.getroot() for country in root.findall('country'): rank = int(country.find('rank').text) if rank > 50: root.remove(country) tree.write('output.xml') # 增加xml文件 # 在country内添加(append)节点year2 import xml.etree.ElementTree as ET tree = ET.parse("a.xml") root = tree.getroot() for country in root.findall('country'): for year in country.findall('year'): if int(year.text) > 2000: year2 = ET.Element('year2') year2.text = '新年' year2.attrib = {'update': 'yes'} country.append(year2) # 往country节点下添加子节点 tree.write('a.xml.swap')
5-2 创建xml文件 - 方式一
# 方法一: import xml.etree.ElementTree as ET # 创建根标签 new_xml = ET.Element("namelist") # 简单创建xml name = ET.SubElement(new_xml, "name", attrib={"enrolled": "yes"}) age = ET.SubElement(name, "age", attrib={"checked": "no"}) sex = ET.SubElement(name, "sex") sex.text = '33' name2 = ET.SubElement(new_xml, "name", attrib={"enrolled": "no"}) age = ET.SubElement(name2, "age") age.text = '19' # # 写入文件 et = ET.ElementTree(new_xml) # 生成文档对象 et.write("test1.xml", encoding="utf-8", xml_declaration=True) ET.dump(new_xml) # 打印生成的格式
5-3 创建xml文件 - 方式二
# 方法二: import xml.etree.ElementTree as ET new_xml = ET.Element("namelist") # 创建节点树 et et = ET.ElementTree(new_xml) # 生成文档对象 person = ET.Element('person') person.attrib['name'] = 'test' person.attrib['sex'] = 'male' person.attrib['age'] = '18' person.attrib['text'] = '这是一个person标签' new_xml.append(person) # # 写入文件 et.write("test1.xml", encoding="utf-8", xml_declaration=True) ET.dump(new_xml) # 打印生成的格式