/** The engine that uses the merge tree (see MergeTreeData) and replicated through ZooKeeper.
ReplicatedMergeTree引擎使用MergeTree存储数据,通过ZooKeeper协调保存副本数据。
* ZooKeeper is used for the following things:
* - the structure of the table (/ metadata, /columns)
* - action log with data (/log/log-...,/replicas/replica_name/queue/queue-...);
* - a replica list (/replicas), and replica activity tag (/replicas/replica_name/is_active), replica addresses (/replicas/replica_name/host);
* - select the leader replica (/leader_election) - this is the replica that assigns the merge;
* - a set of parts of data on each replica (/replicas/replica_name/parts);
* - list of the last N blocks of data with checksum, for deduplication (/blocks);
* - the list of incremental block numbers (/block_numbers) that we are about to insert,
* to ensure the linear order of data insertion and data merge only on the intervals in this sequence;
* - coordinates writes with quorum (/quorum).
* - Storage of mutation entries (ALTER DELETE, ALTER UPDATE etc.) to execute (/mutations).
* See comments in StorageReplicatedMergeTree::mutate() for details.
*/
Zookeeper的作用,主要用于保存以下内容:
1、表结构(在/metadata, /columns这两个目录下)
2、对数据的操作日志(/log/log-...,/replicas/replica_name/queue/queue-...)
3、副本列表(/replicas), 副本活动标记(/replicas/replica_name/is_active), 副本的地址(/replicas/replica_name/host)
4、选择所有副本的leader(/leader_election),leader副本是负责进行merger的那个副本
5、每个副本上的一组数据部分(/replicas/replica_name/parts);
6、用于重复数据消除的最后N个校验和数据块的列表(/blocks);
7、将要插入的增量块编号(/block_numbers)列表, 用于确保数据插入和数据合并的线性顺序仅在该序列的间隔上;
8、协调写入(/quorum)
9、需要执行的ALTER DELETE、ALTER UPDATE等操作(/mutations)。
有关详细信息,请参阅StorageReplicatedMergeTree::mutate()中的注释。
/** The replicated tables have a common log (/log/log-...).
* Log - a sequence of entries (LogEntry) about what to do.
* Each entry is one of:
* - normal data insertion (GET),
* - merge (MERGE),
* - delete the partition (DROP).
*
复制表都有一个日志目录, 用于保存需要执行的一系列操作. 这些操作包括GET、MERGE、DROP.
* Each replica copies (queueUpdatingTask, pullLogsToQueue) entries from the log to its queue (/replicas/replica_name/queue/queue-...)
* and then executes them (queueTask).
* Despite the name of the "queue", execution can be reordered, if necessary (shouldExecuteLogEntry, executeLogEntry).
* In addition, the records in the queue can be generated independently (not from the log), in the following cases:
* - when creating a new replica, actions are put on GET from other replicas (createReplica);
* - if the part is corrupt (removePartAndEnqueueFetch) or absent during the check (at start - checkParts, while running - searchForMissingPart),
* actions are put on GET from other replicas;
*
* The replica to which INSERT was made in the queue will also have an entry of the GET of this data.
* Such an entry is considered to be executed as soon as the queue handler sees it.
*
* The log entry has a creation time. This time is generated by the clock of server that created entry
* - the one on which the corresponding INSERT or ALTER query came.
*
* For the entries in the queue that the replica made for itself,
* as the time will take the time of creation the appropriate part on any of the replicas.
*/
每个副本将操作日志复制到自己的队列中(通过queueUpdatingTask, pullLogsToQueue这几个方法), 并执行这些操作;
需要注意的是这些操作可能不是顺序执行的,必要的话可能会进行重排序。
遇到虾米那两种情况,队列中的操作也可单独生成。
1-
2-
需要插入数据的那个副本会创建一个GET类型的日志操作, 队列处理程序会去处理这些操作.
每条日志都有一个创建时间,这个时间是由创建这条日志的服务器生成的。
*/