34、数据持久化：从简单序列化到关系序列化-优快云博客

本文链接：https://blog.youkuaiyun.com/peach/article/details/155870528

数据持久化：从简单序列化到关系序列化

简单序列化与 ZODB

在数据处理过程中，有时我们只需要简单地保存和存储 Python 对象以供后续使用。之前介绍的脚本导入了 yaml 和 custom_class 模块，从先前创建的 YAML 文件中创建了一个可读文件对象，将 YAML 文件加载到一个对象中，并打印该对象。运行该脚本时，其输出与之前的反序列化示例一致。

除了常见的 pickle 和 YAML 序列化方式外，Zope 的 ZODB 模块也是一种数据序列化的选择。ZODB 即“Zope 对象数据库”，其简单使用方式与 pickle 或 YAML 序列化类似，但它具有可扩展性。例如，若操作需要原子性，ZODB 提供事务支持；若需要更具扩展性的持久存储，可以使用 Zope 的分布式对象存储 ZEO。

虽然 ZODB 本可以归类到“关系持久化”部分，但这个对象数据库并不完全符合我们多年来所认知的关系数据库模式，尽管可以轻松地在对象之间建立关系。在示例中，它更像是 shelve 而非关系数据库，因此我们将其放在“简单持久化”部分。

ZODB 安装

安装 ZODB 非常简单，只需执行 easy_install ZODB3 。ZODB 模块有一些依赖项，但 easy_install 能很好地解决这些问题，下载并安装所需的一切。

ZODB 简单使用示例

以下是将字典和列表序列化到 ZODB 的代码：

#!/usr/bin/env python
import ZODB
import ZODB.FileStorage
import transaction
filestorage = ZODB.FileStorage.FileStorage('zodb_filestorage.db')
db = ZODB.DB(filestorage)
conn = db.open()
root = conn.root()
root['list'] = ['this', 'is', 'a', 'list']
root['dict'] = {'this': 'is', 'a': 'dictionary'}
transaction.commit()
conn.close()

使用 ZODB 开始时需要多写几行代码，但一旦创建并初始化了持久存储，其使用方式与其他选项类似。具体步骤如下：
1. 导入 ZODB 、 ZODB.FileStorage 和 transaction 模块。
2. 创建 FileStorage 对象，指定要使用的数据库文件。
3. 创建 DB 对象并将其连接到 FileStorage 对象。
4. 打开数据库对象并获取其根节点。
5. 使用数据结构更新根对象。
6. 使用 transaction.commit() 提交更改。
7. 使用 conn.close() 关闭数据库连接。

读取 ZODB 数据

创建 ZODB 数据存储容器并提交数据后，可能需要将数据取出。以下是读取数据的示例代码：

#!/usr/bin/env python
import ZODB
import ZODB.FileStorage
filestorage = ZODB.FileStorage.FileStorage('zodb_filestorage.db')
db = ZODB.DB(filestorage)
conn = db.open()
root = conn.root()
print root.items()
conn.close()

运行此代码后，将输出存储在 ZODB 中的数据。

序列化自定义类

在 ZODB 中序列化自定义类的方式与其他框架类似。以下是一个自定义的 Account 类：

#!/usr/bin/env python
import persistent
class OutOfFunds(Exception):
    pass
class Account(persistent.Persistent):
    def __init__(self, name, starting_balance=0):
        self.name = name
        self.balance = starting_balance
    def __str__(self):
        return "Account %s, balance %s" % (self.name, self.balance)
    def __repr__(self):
        return "Account %s, balance %s" % (self.name, self.balance)
    def deposit(self, amount):
        self.balance += amount
        return self.balance
    def withdraw(self, amount):
        if amount > self.balance:
            raise OutOfFunds
        self.balance -= amount
        return self.balance

以下是将自定义类对象序列化到 ZODB 的代码：

#!/usr/bin/env python
import ZODB
import ZODB.FileStorage
import transaction
import custom_class_zodb
filestorage = ZODB.FileStorage.FileStorage('zodb_filestorage.db')
db = ZODB.DB(filestorage)
conn = db.open()
root = conn.root()
noah = custom_class_zodb.Account('noah', 1000)
print noah
root['noah'] = noah
jeremy =  custom_class_zodb.Account('jeremy', 1000)
print jeremy
root['jeremy'] = jeremy
transaction.commit()
conn.close()

运行此代码后，将创建两个 Account 对象并将其保存到 ZODB 数据库中。

更新 ZODB 数据

以下是打开数据库并将 300 从 noah 账户转移到 jeremy 账户的代码：

#!/usr/bin/env python
import ZODB
import ZODB.FileStorage
import transaction
import custom_class_zodb
filestorage = ZODB.FileStorage.FileStorage('zodb_filestorage.db')
db = ZODB.DB(filestorage)
conn = db.open()
root = conn.root()
noah = root['noah']
print "BEFORE WITHDRAWAL"
print "================="
print noah
jeremy = root['jeremy']
print jeremy
print "-----------------"
transaction.begin()
noah.withdraw(300)
jeremy.deposit(300)
transaction.commit()
print "AFTER WITHDRAWAL"
print "================"
print noah
print jeremy
print "----------------"
conn.close()

运行此脚本后， noah 账户的余额将减少 300， jeremy 账户的余额将增加 300。

事务处理示例

以下是一个使用事务循环进行转账的示例代码，以确保资金不会丢失：

#!/usr/bin/env python
import ZODB
import ZODB.FileStorage
import transaction
import custom_class_zodb
filestorage = ZODB.FileStorage.FileStorage('zodb_filestorage.db')
db = ZODB.DB(filestorage)
conn = db.open()
root = conn.root()
noah = root['noah']
print "BEFORE TRANSFER"
print "==============="
print noah
jeremy = root['jeremy']
print jeremy
print "-----------------"
while True:
    try:
        transaction.begin()
        jeremy.deposit(300)
        noah.withdraw(300)
        transaction.commit()
    except custom_class_zodb.OutOfFunds:
        print "OutOfFunds Error"
        print "Current account information:"
        print noah
        print jeremy
        transaction.abort()
        break
print "AFTER TRANSFER"
print "=============="
print noah
print jeremy
print "----------------"
conn.close()

此脚本会不断从 noah 账户向 jeremy 账户转账 300，直到 noah 账户余额不足。当余额不足时，会打印异常信息并中止事务。

关系序列化

简单序列化有时可能不够，我们可能需要关系分析的强大功能。关系序列化指的是将 Python 对象序列化并与其他 Python 对象建立关系，或者将关系数据存储在关系数据库中，并为该数据提供类似 Python 对象的接口。

SQLite

SQLite 是一种软件库，实现了一个自包含、无服务器、零配置、支持事务的 SQL 数据库引擎。与传统数据库不同，SQLite 数据库引擎与代码在同一进程中运行，数据存储在一个文件中，无需配置主机名、端口、用户名、密码等信息。使用 SQLite 的主要好处是易于使用，并且能完成与“真实”数据库类似的工作，同时它具有广泛的支持。

以下是创建 SQLite 数据库的步骤：
1. 准备一个包含表定义的 SQL 文件，例如 inventory.sql ，内容如下：

BEGIN;
CREATE TABLE "inventory_ipaddress" (
    "id" integer NOT NULL PRIMARY KEY,
    "address" text NULL,
    "server_id" integer NOT NULL
);
CREATE TABLE "inventory_hardwarecomponent" (
    "id" integer NOT NULL PRIMARY KEY,
    "manufacturer" varchar(50) NOT NULL,
    "type" varchar(50) NOT NULL,
    "model" varchar(50) NULL,
    "vendor_part_number" varchar(50) NULL,
    "description" text NULL
);
CREATE TABLE "inventory_operatingsystem" (
    "id" integer NOT NULL PRIMARY KEY,
    "name" varchar(50) NOT NULL,
    "description" text NULL
);
CREATE TABLE "inventory_service" (
    "id" integer NOT NULL PRIMARY KEY,
    "name" varchar(50) NOT NULL,
    "description" text NULL
);
CREATE TABLE "inventory_server" (
    "id" integer NOT NULL PRIMARY KEY,
    "name" varchar(50) NOT NULL,
    "description" text NULL,
    "os_id" integer NOT NULL REFERENCES "inventory_operatingsystem" ("id")
);
CREATE TABLE "inventory_server_services" (
    "id" integer NOT NULL PRIMARY KEY,
    "server_id" integer NOT NULL REFERENCES "inventory_server" ("id"),
    "service_id" integer NOT NULL REFERENCES "inventory_service" ("id"),
    UNIQUE ("server_id", "service_id")
);
CREATE TABLE "inventory_server_hardware_component" (
    "id" integer NOT NULL PRIMARY KEY,
    "server_id" integer NOT NULL REFERENCES "inventory_server" ("id"),
    "hardwarecomponent_id" integer 
      NOT NULL REFERENCES "inventory_hardwarecomponent" ("id"),
    UNIQUE ("server_id", "hardwarecomponent_id")
);
COMMIT;

使用以下命令创建 SQLite 数据库：

jmjones@dinkgutsy:~/code$ sqlite3 inventory.db < inventory.sql

安装 SQLite：
- 在 Ubuntu 和 Debian 系统上，使用 apt-get install sqlite3 。
- 在 Red Hat 系统上，使用 yum install sqlite 。
- 对于其他 Linux 发行版、UNIX 系统或 Windows 系统，可以从 http://www.sqlite.org/download.html 下载源代码和预编译二进制文件。

操作 SQLite 数据库

以下是连接到 SQLite 数据库并插入数据的代码：

import sqlite3
conn = sqlite3.connect('inventory.db')
cursor = conn.execute("insert into inventory_operatingsystem (name, description) values ('Linux', '2.0.34 kernel');")
cursor.fetchall()
conn.commit()

以下是从 SQLite 数据库中读取数据的代码：

import sqlite3
conn = sqlite3.connect('inventory.db')
cursor = conn.execute('select * from inventory_operatingsystem;')
print cursor.fetchall()

SQLite 适合数据仅由一个脚本或少数用户访问的场景，但 sqlite3 模块提供的接口较为晦涩。

Storm ORM

虽然纯 SQL 接口足以从数据库中检索、更新、插入和删除数据，但通过 Python 简单地访问数据通常更为方便。近年来，数据库访问的一个趋势是创建数据库中存储数据的面向对象表示，即对象关系映射（ORM）。

Storm 是由 Canonical 开源的 ORM，它是 Python 数据库领域的新成员，但已经有了一定的用户群体，有望成为领先的 Python ORM 之一。

以下是使用 Storm 访问 SQLite 数据库的步骤：
1. 创建 Python 类与数据库表的映射：

import storm.locals
class OperatingSystem(object):
    __storm_table__ = 'inventory_operatingsystem'
    id = storm.locals.Int(primary=True)
    name = storm.locals.Unicode()
    description = storm.locals.Unicode()

向数据库中添加数据：

import storm.locals
import storm_model
import os
operating_system = storm_model.OperatingSystem()
operating_system.name = u'Windows'
operating_system.description = u'3.1.1'
db = storm.locals.create_database('sqlite:///%s' % os.path.join(os.getcwd(), 'inventory.db'))
store = storm.locals.Store(db)
store.add(operating_system)
store.commit()

从数据库中检索数据：

import storm.locals
import storm_model
import os
db = storm.locals.create_database('sqlite:///%s' % os.path.join(os.getcwd(), 'inventory.db'))
store = storm.locals.Store(db)
for o in store.find(storm_model.OperatingSystem):
    print o.id, o.name, o.description

综上所述，不同的数据持久化方式各有优缺点，在实际应用中需要根据具体需求选择合适的方式。无论是简单序列化还是关系序列化，都能帮助我们有效地存储和管理数据。

数据持久化：从简单序列化到关系序列化

各持久化方式对比与总结

数据持久化的选择流程

在实际项目中，如何选择合适的数据持久化方式是一个关键问题。以下是一个简单的选择流程图（使用 mermaid 语法）：

graph TD;
    A[是否需要简单存储 Python 对象?] -->|是| B[是否需要事务支持?];
    A -->|否| C[是否需要关系分析?];
    B -->|是| D[选择 ZODB];
    B -->|否| E[选择 YAML 或 pickle];
    C -->|是| F[是否小型应用或单用户场景?];
    C -->|否| G[考虑其他大型关系数据库];
    F -->|是| H[选择 SQLite];
    F -->|否| I[选择 Storm ORM 或其他 ORM 框架];