Documentation\device-mapper\thin-provisioning.txt

最新推荐文章于 2024-11-06 00:50:01 发布

zjgsu_linux

最新推荐文章于 2024-11-06 00:50:01 发布

阅读量1.4k

点赞数 1

本文档介绍了设备映射器中的精简配置和快照技术。这些技术允许多个虚拟设备存储在同一数据卷上，简化管理和减少磁盘使用。文中还详细介绍了池设备的设置、精简配置卷的创建与使用、快照的创建与使用等内容。

Chinese translated version of Documentation\device-mapper\thin-provisioning.txt

If you have any comment or update to the content, please contact the
original document maintainer directly. However, if you have a problem
communicating in English you can also ask the Chinese maintainer for
help. Contact the Chinese maintainer if this translation is outdated
or if there is a problem with the translation.

Chinese maintainer:huneng < huneng1991@163.com>
---------------------------------------------------------------------
Documentation\device-mapper\thin-provisioning.txt的中文翻译

如果想评论或更新本文的内容，请直接联系原文档的维护者。如果你使用英文
交流有困难的话，也可以向中文版维护者求助。如果本翻译更新不及时或者翻
译存在问题，请联系中文版维护者。

中文版维护者：胡能 < huneng1991@163.com>
中文版翻译者：胡能 < huneng1991@163.com>
中文版校译者：胡能 < huneng1991@163.com>

以下为正文
---------------------------------------------------------------------

Introduction
============
介绍

This document describes a collection of device-mapper targets that
between them implement thin-provisioning and snapshots.
本文档描述一个设备映射目标集合，这些目标实现了精简配置和快照。

The main highlight of this implementation, compared to the previous
implementation of snapshots, is that it allows many virtual devices to
be stored on the same data volume. This simplifies administration and
allows the sharing of data between volumes, thus reducing disk usage.
相比于以前快照的实现，这样的实现的主要优点在于允许多个虚拟设备存储在同一个数据卷上。
这样简化了管理，而且允许数据在卷之间共享，进而减少了磁盘的使用。

Another significant feature is support for an arbitrary depth of
recursive snapshots (snapshots of snapshots of snapshots ...). The
previous implementation of snapshots did this by chaining together
lookup tables, and so performance was O(depth). This new
implementation uses a single data structure to avoid this degradation
with depth. Fragmentation may still be an issue, however, in some
scenarios.
另一个显著的特点是支持任意深度的迭代快照（快照的快照的。。。）。
上个版本的快照实现通过链接查找表来实现迭代快照，所以表现出来的深度为0。
新的实现方式使用一个单一的数据结构来避免这种形式的深度减少。
在一些场景，碎片可能依旧是一个问题。

Metadata is stored on a separate device from data, giving the
administrator some freedom, for example to:

- Improve metadata resilience by storing metadata on a mirrored volume
but data on a non-mirrored one.

- Improve performance by storing the metadata on SSD.

元数据存放在与数据分离的设备上，使得管理更自由写，比如：

- 通过把元数据存储在镜像卷，把数据存在非镜像卷，提高了元数据弹性
- 把元数据存储在SSD上提高了性能

Status 状态
======

These targets are very much still in the EXPERIMENTAL state. Please
do not yet rely on them in production. But do experiment and offer us
feedback. Different use cases will have different performance
characteristics, for example due to fragmentation of the data volume.
这些目标十分的而且任然处于试验状态。
请不要在生产中依赖它们。
当时我们希望获得更多的试验结果。
不同的用例会带来不同的表现特征。

If you find this software is not performing as expected please mail
dm-devel@redhat.com with details and we'll try our best to improve
things for you.
如果你发现这款软件没有像期望的那样表现，请将详细信息发送到邮箱dm-devel@redhat.com，
我们会尽最大努力提供更好的产品给你。

Userspace tools for checking and repairing the metadata are under
development.
用户空间下的用来检查和修复元数据的工具正在开发中

Cookbook 烹饪书
========

This section describes some quick recipes for using thin provisioning.
They use the dmsetup program to control the device-mapper driver
directly. End users will be advised to use a higher-level volume
manager such as LVM2 once support has been added.
这个章节描述一些快的方法来使用精简配置。
他们使用设备映射启动程序直接来控制设备映射驱动。
最终用户建议使用更高级别的卷管理器，例如LVM2，LVM2的一次性支持被添加了。

Pool device 池设备
-----------

The pool device ties together the metadata volume and the data volume.
It maps I/O linearly to the data volume and updates the metadata via
two mechanisms:

- Function calls from the thin targets

- Device-mapper 'messages' from userspace which control the creation of new
virtual devices amongst other things.
池设备绑定了元数据卷和数据卷。
它线性的映射I/O到数据卷，并且使用如下机制更新元数据：

- 精简配置的函数调用
- 设备从其他事物中控制新的虚拟设备创建的用户空间映射‘消息’。

Setting up a fresh pool device 建立一个新的池设备
------------------------------

Setting up a pool device requires a valid metadata device, and a
data device. If you do not have an existing metadata device you can
make one by zeroing the first 4k to indicate empty metadata.
建立一个池设备需要一个有效的元数据设备和一个数据设备。

dd if=/dev/zero of=$metadata_dev bs=4096 count=1

The amount of metadata you need will vary according to how many blocks
are shared between thin devices (i.e. through snapshots). If you have
less sharing than average you'll need a larger-than-average metadata device.
你所需要的元数据数量，会根据在小设备之间共享的块的数量而变化。
如果你有的共享块比平均数量少，你会需要一个比平均大小的元数据设备更大的设备。

As a guide, we suggest you calculate the number of bytes to use in the
metadata device as 48 * $data_dev_size / $data_block_size but round it up
to 2MB if the answer is smaller. If you're creating large numbers of
snapshots which are recording large amounts of change, you may find you
need to increase this.
作为引导，我们建议计算在元数据设备中使用的比特的数量，即48* $data_dev_size / $data_block_size，
但是舍入到2MB，如果结果更小。
如果你创建大数量的快照，记录着大量的变化，你可能需要增加元数据量

The largest size supported is 16GB: If the device is larger,
a warning will be issued and the excess space will not be used.
最大的容量支持是16GB：如果设备很大，会被提示警告，多余的空间也不能被使用。

Reloading a pool table 重新加载一个池表
----------------------

You may reload a pool's table, indeed this is how the pool is resized
if it runs out of space. (N.B. While specifying a different metadata
device when reloading is not forbidden at the moment, things will go
wrong if it does not route I/O to exactly the same on-disk location as
previously.)
你可能重新加载一个池表，事实上这是池超出了空间如何重订大小的方法。
（N.B.当实例化一个不同的此时不被允许重载的元数据设备时，如果没有定向I/O准确到
和先前同样的磁盘位置，会出现错误的。）

Using an existing pool device 使用存在的池设备
-----------------------------

dmsetup create pool \
--table "0 20971520 thin-pool $metadata_dev $data_dev \
$data_block_size $low_water_mark"

$data_block_size gives the smallest unit of disk space that can be
allocated at a time expressed in units of 512-byte sectors. People
primarily interested in thin provisioning may want to use a value such
as 1024 (512KB). People doing lots of snapshotting may want a smaller value
such as 128 (64KB). If you are not zeroing newly-allocated data,
a larger $data_block_size in the region of 256000 (128MB) is suggested.
$data_block_size must be the same for the lifetime of the
metadata device.
$data_block_size给定了最小的磁盘空间单位，可以同一时间放置，它通常表示为512比特扇区的单位。
基本上对精简配置感兴趣的人可能想要使用一个值如1024（512KB）。
人们可能会想要一个更小的值如128（64KB)，为此做大量的快照。
如果你没有打算清空最近放置的数据，$data_block_size建议值为25600（128KB）。

$low_water_mark is expressed in blocks of size $data_block_size. If
free space on the data device drops below this level then a dm event
will be triggered which a userspace daemon should catch allowing it to
extend the pool device. Only one such event will be sent.
Resuming a device with a new table itself triggers an event so the
userspace daemon can use this to detect a situation where a new table
already exceeds the threshold.
$low_water_mark表示为块大小$data_block_size。
如果释放数据设备空间下降到这个水平，一dm（设备映射）事件会被处罚，
用户空间的守护进程会接收到这个事件并且允许它扩展池设备。
只有这样的事件会被发出。
使用新的表恢复一个设备，它自己会触发一个事件，这样用户空间的守护进程可以
使用这个事件来监视超出阈值的表的场景。

Thin provisioning 精简配置
-----------------

i) Creating a new thinly-provisioned volume.
创建一个新的精简配置卷。

To create a new thinly- provisioned volume you must send a message to an
active pool device, /dev/mapper/pool in this example.
创建一个新的精简配置卷，你必须发送一个消息，给激活的池设备，/dev/mapper/pool
就是这样的一个例子。

dmsetup message /dev/mapper/pool 0 "create_thin 0"

Here '0' is an identifier for the volume, a 24-bit number. It's up
to the caller to allocate and manage these identifiers. If the
identifier is already in use, the message will fail with -EEXIST.
这里‘0’是一个卷的标识，一个24-bit数。由调用者分配和管理这些标识。
如果那标识已经被使用，消息会失败，返回结果-EEXIST。

ii) Using a thinly-provisioned volume.
使用一个精简配置卷
Thinly-provisioned volumes are activated using the 'thin' target:
精简配置卷通过使用‘thin’目标激活：
dmsetup create thin --table "0 2097152 thin /dev/mapper/pool 0"

The last parameter is the identifier for the thinp device.
最后一个参数是精简配置设备标识符。

Internal snapshots 内部快照
------------------

i) Creating an internal snapshot.
创建内部快照

Snapshots are created with another message to the pool.
快照使用另一发送到池的消息创建。

N.B. If the origin device that you wish to snapshot is active, you
must suspend it before creating the snapshot to avoid corruption.
This is NOT enforced at the moment, so please be careful!
N.B. 如果你希望进行快照的原始设备是激活的，你必须注销它，在生成快照之前，
以避免错误。
现在还没有强制要求，所以请当心。

dmsetup suspend /dev/mapper/thin
dmsetup message /dev/mapper/pool 0 "create_snap 1 0"
dmsetup resume /dev/mapper/thin

Here '1' is the identifier for the volume, a 24-bit number. '0' is the
identifier for the origin device.
这里‘1’是卷的标示符，一个24位的数。‘0’是原始设备的标示符。

ii) Using an internal snapshot.
使用内部快照。

Once created, the user doesn't have to worry about any connection
between the origin and the snapshot. Indeed the snapshot is no
different from any other thinly-provisioned device and can be
snapshotted itself via the same method. It's perfectly legal to
have only one of them active, and there's no ordering requirement on
activating or removing them both. (This differs from conventional
device-mapper snapshots.)
一旦创建，用户不需要担心原始设备和快照之间的连接。事实上，快照和其他的
精简配置设备没什么两样，而且可以被快照它自己通过同样的方法。
让它们中的一个激活是完全可以的，并且没有顺序要求去激活或删除它们。
（这和通常的设备映射快照不同。）

Activate it exactly the same way as any other thinly-provisioned volume:
激活快照的方法和激活精简配置卷的方法类似

dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 1"

External snapshots 外部快照
------------------

You can use an external _read only_ device as an origin for a
thinly-provisioned volume. Any read to an unprovisioned area of the
thin device will be passed through to the origin. Writes trigger
the allocation of new blocks as usual.
你可以私用一个外部的只读设备作为原始设备，为一个精简配置卷。
对该外部快照的没有配置区域的读操作会被传递到原始设备。
写操作通常触发新的块分配。

One use case for this is VM hosts that want to run guests on
thinly-provisioned volumes but have the base image on another device
(possibly shared between many VMs).
一个使用的案例：当虚拟主机想要启动临时会话，在精简配置卷，但是有基础镜像在其他的设备上
（可能被许多虚拟机共享）。

You must not write to the origin device if you use this technique!
Of course, you may write to the thin device and take internal snapshots
of the thin volume.
如果使用这种技术，最好不要对原始设备进行写操作。
当然，你可以写精简设备，给精简卷做快照。

i) Creating a snapshot of an external device
对外部设备生成快照
This is the same as creating a thin device.
You don't mention the origin at this stage.
方法同生成精简设备是一样的。
你不必再命令行中标明原设备。
dmsetup message /dev/mapper/pool 0 "create_thin 0"

ii) Using a snapshot of an external device.
使用外部设备快照。
Append an extra parameter to the thin target specifying the origin:
给精简目标实例化原始设备的命令行添加一个的参数：
dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 0 /dev/image"

N.B. All descendants (internal snapshots) of this snapshot require the
same extra origin parameter.
N.B. 所有外部快照的后裔（内部快照）需要额外的原始设备参数。

Deactivation 停用
------------

All devices using a pool must be deactivated before the pool itself
can be.
所有使用池的设备必须在池停用之前停用。
dmsetup remove thin
dmsetup remove snap
dmsetup remove pool

Reference 参考
=========

'thin-pool' target
'精简池' 目标
------------------

i) Constructor
构造器
thin-pool <metadata dev> <data dev> <data block size (sectors)> \
<low water mark (blocks)> [<number of feature args> [<arg>]*]
精简池
Optional feature arguments:
供选择的参数：
skip_block_zeroing: Skip the zeroing of newly-provisioned blocks.
skip_block_zeroing: 跳过新的配置块清零。

ignore_discard: Disable discard support.
ignore_discard: 禁用丢弃功能。

no_discard_passdown: Don't pass discards down to the underlying
data device, but just remove the mapping.
no_discard_passdown：不要传递丢弃物到相关的数据设备，但是仅仅删除映射

read_only: Don't allow any changes to be made to the pool
metadata.
read_only: 只允许对池读
Data block size must be between 64KB (128 sectors) and 1GB
(2097152 sectors) inclusive.
数据块的大小必须在64KB（128个扇区）到1GB（2097152个扇区）之间。

ii) Status
状态
<transaction id> <used metadata blocks>/<total metadata blocks>
<used data blocks>/<total data blocks> <held metadata root>
[no_]discard_passdown ro|rw

transaction id:
转换ID：
A 64-bit number used by userspace to help synchronise with metadata
from volume managers.
一个64位数由用户空间使用来帮助同步元数据，从卷管理者那里。

used data blocks / total data blocks
使用数据块/所有的数据块
If the number of free blocks drops below the pool's low water mark a
dm event will be sent to userspace. This event is edge-triggered and
it will occur only once after each resume so volume manager writers
should register for the event and then check the target's status.
如果空闲块的数量低于池的最低标准，一个dm时间会被发送到用户空间。
这个事件是边缘触发的而且它只出现在每次重新开始之后，所以卷管理者应该注册事件
之后还要检查目标的状态。

held metadata root:
保持元数据的根：
The location, in sectors, of the metadata root that has been
'held' for userspace read access. '-' indicates there is no
held root. This feature is not yet implemented so '-' is
always returned.
元数据根的在扇区中的位置已经被控制，为用户空间的读取。
'-'标明没有控制根。
这种性质没有实现，所以'-'总是被返回。

discard_passdown|no_discard_passdown
Whether or not discards are actually being passed down to the
underlying device. When this is enabled when loading the table,
it can get disabled if the underlying device doesn't support it.
无论如何，抛弃物实际上被传递到相关设备。
当在加载表时它使能，它可以禁用，如果相关设备不支持他。

ro|rw
If the pool encounters certain types of device failures it will
drop into a read-only metadata mode in which no changes to
the pool metadata (like allocating new blocks) are permitted.
如果池遇到某个可能的设备失败类型，它会陷入只读元数据状态，在这种状态中，
不会有改变元数据的操作被允许。

In serious cases where even a read-only mode is deemed unsafe
no further I/O will be permitted and the status will just
contain the string 'Fail'. The userspace recovery tools
should then be used.
在一些严重的情况中，就连只读状态也被视为危险的，进一步的I/O不会被允许，
而且状态字只有’Fail'.
用户控件的恢复工具这是就需要使用了。

iii) Messages
消息
create_thin <dev id>

Create a new thinly-provisioned device.
<dev id> is an arbitrary unique 24-bit identifier chosen by
the caller.
生成一个新的精简配置设备。
<dev id>是一个任意的独特的24位标识符，有调用者选定的。

create_snap <dev id> <origin id>

Create a new snapshot of another thinly-provisioned device.
<dev id> is an arbitrary unique 24-bit identifier chosen by
the caller.
<origin id> is the identifier of the thinly-provisioned device
of which the new device will be a snapshot.
生成另一个精简设备的快照。
<dev id>是一个任意的独特的24位标识符，有调用者选定的。
<origin id>是一个精简设备的标识符，该设备的新设备是快照。

delete <dev id>

Deletes a thin device. Irreversible.
删除精简设备。不可逆。

set_transaction_id <current id> <new id>

Userland volume managers, such as LVM, need a way to
synchronise their external metadata with the internal metadata of the
pool target. The thin-pool target offers to store an
arbitrary 64-bit transaction id and return it on the target's
status line. To avoid races you must provide what you think
the current transaction id is when you change it with this
compare-and-swap message.
用户使用卷管理者，如LVM，需要一种方式同步他们的额外元数据，
用池目标的内部元数据。
精简池目标提供存储任意64位交换id，当你用compare-and-swap消息改变它。

reserve_metadata_snap

Reserve a copy of the data mapping btree for use by userland.
This allows userland to inspect the mappings as they were when
this message was executed. Use the pool's status command to
get the root block associated with the metadata snapshot.
用户空间存储了一份数据映射B树的拷贝。
这样允许用户空间检查映射关系，当他们执行消息时。
使用池状态命令来获得根时钟，和元数据快照相连的。

release_metadata_snap

Release a previously reserved copy of the data mapping btree.
释放一个旧的元数据映射B树备份。

'thin' target 精简目标
-------------

i) Constructor 构造器

thin <pool dev> <dev id> [<external origin dev>]

pool dev:
the thin-pool device, e.g. /dev/mapper/my_pool or 253:0
精简池设备，如 /dev/mapper/my_pool或者253:0

dev id:
the internal device identifier of the device to be
activated.
内部设备的设备标识符，用来激活设备。

external origin dev:
an optional block device outside the pool to be treated as a
read-only snapshot origin: reads to unprovisioned areas of the
thin target will be mapped to this device.
一个供选择的块设备，在池外，被作为只读快照的原设备：读到精简设备
未配置的领域将会被映射到这个设备。

The pool doesn't store any size against the thin devices. If you
load a thin target that is smaller than you've been using previously,
then you'll have no access to blocks mapped beyond the end. If you
load a target that is bigger than before, then extra blocks will be
provisioned as and when needed.
池不会存储精简设备的大小。
如果你加载一个比你以前用过的精简设备要小的设备，那么你不能使用映射到最后的块。
如果你加载的目标比之前的的大，那么额外的块将精简到使用时所需要的大小。

If you wish to reduce the size of your thin device and potentially
regain some space then send the 'trim' message to the pool.
如果你希望减小你的精简设备大小，而且可能地增加影协空间，需要发送'trim'消息到池。

ii) Status
状态
<nr mapped sectors> <highest mapped sector>

If the pool has encountered device errors and failed, the status
will just contain the string 'Fail'. The userspace recovery
tools should then be used.
如果池出现设备错误并且失败，状态仅仅是失败。用户空间的恢复工具需要被使用。