ClickHouse源码阅读(0000 0101) —— ReplicatedMergeTree存储引擎

/** The engine that uses the merge tree (see MergeTreeData) and replicated through ZooKeeper.
ReplicatedMergeTree引擎使用MergeTree存储数据,通过ZooKeeper协调保存副本数据。

  * ZooKeeper is used for the following things:
  * - the structure of the table (/ metadata, /columns)
  * - action log with data (/log/log-...,/replicas/replica_name/queue/queue-...);
  * - a replica list (/replicas), and replica activity tag (/replicas/replica_name/is_active), replica addresses (/replicas/replica_name/host);
  * - select the leader replica (/leader_election) - this is the replica that assigns the merge;
  * - a set of parts of data on each replica (/replicas/replica_name/parts);
  * - list of the last N blocks of data with checksum, for deduplication (/blocks);
  * - the list of incremental block numbers (/block_numbers) that we are about to insert,
  *   to ensure the linear order of data insertion and data merge only on the intervals in this sequence;
  * - coordinates writes with quorum (/quorum).
  * - Storage of mutation entries (ALTER DELETE, ALTER UPDATE etc.) to execute (/mutations).
  *   See comments in StorageReplicatedMergeTree::mutate() for details.
  */
Zookeeper的作用,主要用于保存以下内容:
1、表结构(在/metadata, /columns这两个目录下)
2、对数据的操作日志(/log/log-...,/replicas/replica_name/queue/queue-...)
3、副本列表(/replicas), 副本活动标记(/replicas/replica_name/is_active), 副本的地址(/replicas/replica_name/host)
4、选择所有副本的leader(/leader_election),leader副本是负责进行merger的那个副本
5、每个副本上的一组数据部分(/replicas/replica_name/parts);
6、用于重复数据消除的最后N个校验和数据块的列表(/blocks);
7、将要插入的增量块编号(/block_numbers)列表, 用于确保数据插入和数据合并的线性顺序仅在该序列的间隔上;
8、协调写入(/quorum)
9、需要执行的ALTER DELETE、ALTER UPDATE等操作(/mutations)。
有关详细信息,请参阅StorageReplicatedMergeTree::mutate()中的注释。

/** The replicated tables have a common log (/log/log-...).
  * Log - a sequence of entries (LogEntry) about what to do.
  * Each entry is one of:
  * - normal data insertion (GET),
  * - merge (MERGE),
  * - delete the partition (DROP).
  *
复制表都有一个日志目录, 用于保存需要执行的一系列操作. 这些操作包括GET、MERGE、DROP.

  * Each replica copies (queueUpdatingTask, pullLogsToQueue) entries from the log to its queue (/replicas/replica_name/queue/queue-...)
  *  and then executes them (queueTask).
  * Despite the name of the "queue", execution can be reordered, if necessary (shouldExecuteLogEntry, executeLogEntry).
  * In addition, the records in the queue can be generated independently (not from the log), in the following cases:
  * - when creating a new replica, actions are put on GET from other replicas (createReplica);
  * - if the part is corrupt (removePartAndEnqueueFetch) or absent during the check (at start - checkParts, while running - searchForMissingPart),
  *   actions are put on GET from other replicas;
  *
  * The replica to which INSERT was made in the queue will also have an entry of the GET of this data.
  * Such an entry is considered to be executed as soon as the queue handler sees it.
  *
  * The log entry has a creation time. This time is generated by the clock of server that created entry
  * - the one on which the corresponding INSERT or ALTER query came.
  *
  * For the entries in the queue that the replica made for itself,
  * as the time will take the time of creation the appropriate part on any of the replicas.
  */
每个副本将操作日志复制到自己的队列中(通过queueUpdatingTask, pullLogsToQueue这几个方法), 并执行这些操作;
需要注意的是这些操作可能不是顺序执行的,必要的话可能会进行重排序。
遇到虾米那两种情况,队列中的操作也可单独生成。
1-
2-

需要插入数据的那个副本会创建一个GET类型的日志操作, 队列处理程序会去处理这些操作.

每条日志都有一个创建时间,这个时间是由创建这条日志的服务器生成的。
  */

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值