说明:
本文基于ceph 10.2.3的;
远程同步配置,见http://blog.youkuaiyun.com/for_tech/article/details/68927956
文中以secondary site为本地,所以local和secondary意思相同;remote和source意思相同;
+=======================================================================+
| Remote Site (Source Site) |+=======================================================================+
0.1 Pool: {source-zone}.rgw.data.root
Bucket instances;
0.2 Pool: {source-zone}.rgw.buckets.dataObjects;
0.3 Pool: {source-zone}.rgw.buckets.indexFor example, we have 4 buckets, each bucket has 3 shards (rgw_override_bucket_index_max_shards=3)
bucket-0 shard0 .dir.{key-of-bucket-0}.0
bucket-0 shard1 .dir.{key-of-bucket-0}.1
bucket-0 shard2 .dir.{key-of-bucket-0}.2
bucket-1 shard0 .dir.{key-of-bucket-1}.0
bucket-1 shard1 .dir.{key-of-bucket-1}.1
bucket-1 shard2 .dir.{key-of-bucket-1}.2
bucket-2 shard0 .dir.{key-of-bucket-2}.0
bucket-2 shard1 .dir.{key-of-bucket-2}.1
bucket-2 shard2 .dir.{key-of-bucket-2}.2
bucket-3 shard0 .dir.{key-of-bucket-3}.0
bucket-3 shard1 .dir.{key-of-bucket-3}.1
bucket-3 shard2 .dir.{key-of-bucket-3}.2
0.4 Pool: {source-zone}.rgw.log
for example, rgw_data_log_num_shards=4
data_log.1
data_log.2
data_log.3
0.5 map between bucket shards to data log shards
there are
{bucket-number}*rgw_override_bucket_index_max_shards (12 in our example)bucket shards, and
rgw_data_log_num_shards (4 in our example)
data log shards;
These bucket shards are mapped to these data log shards, see function choose_oid()
bucket-0 shard0 -------- \
bucket-0 shard1 \
bucket-0 shard2 \
bucket-1 shard0 \
bucket-1 shard1 data_log.0
bucket-1 shard2 map to data_log.1
bucket-2 shard0 data_log.2
bucket-2 shard1 data_log.3
bucket-2 shard2 /
bucket-3 shard0 /
bucket-3 shard1 /
bucket-3 shard2 -------- /
Important: there are 2 kinds of shards:
data-log-shard: [0, rgw_data_log_num_shards)
bucket-shard : [0, rgw_override_bucket_index_max_shards)
0.6 Writing an object
When an object (OBJ_444) in put into bucket-B, shardS
A. the object is written into pool {source-zone}.rgw.buckets.dataB. write log into pool={source-zone}.rgw.buckets.index, obj=.dir.{key-of-bucket-B}.S as KV pairs:
OBJ_444 ==> info of OBJ_444, such as owner, content-type ...
.0_00000000001.4.2 ==> write OBJ_444 state=CLS_RGW_STATE_PENDING_MODIFY
.0_00000000002.5.3 ==> write OBJ_444 state=CLS_RGW_STATE_COMPLETE
update omap header to
.0_00000000002.5.3
Important: these logs are written by the "Ceph classes", so that the markers (the omap key, such as
.0_00000000001.4.2, .0_00000000002.5.3) are guaranteed ordered. The order is the key to sync data
between sites (secondary site needs to process the logs in the same order.)
"Ceph Classes": Ceph loads .so classes stored in the osd class dir directory dynamically
(i.e., $libdir/rados-classes by default). When you implement a class, you can create new
object methods that have the ability to call the native methods in the Ceph Object Store,
or other class methods you incorporate via libraries or create yourself. On writes, Ceph
Classes can call native or class methods, perform any series of operations on the inbound
data and generate a resulting write transaction that Ceph will apply atomically. On reads,
Ceph Classes can call native or class methods, perform any series of operations on the
outbound data and return the data to the client.
write log into pool={source-zone}.rgw.log, obj=data_log.X as KV pair
1_1489979397.374156_23.1 ==> some info like bucket-B shardS has modification, timestampT
update omap header to
1_1489979397.374156_23.1
Important: these logs are written by the "Ceph classes", so that the markers (the omap key, such as
1_1489979397.374156_23.1) are guaranteed ordered. The order is the key to sync data between
sites (secondary site needs to process the logs in the same order.)
modifications are logged as KV paris;
latest marker are recorded as omap header;
the markers are guaranteed ordered.