ClickHouse源码简析

本文详细介绍了ClickHouse的源码流程,从main函数开始,阐述了服务启动、HTTP和TCP接口、内部HTTP接口以及复制接口的实现。深入到server.cpp中的main函数,分析了TCPHandler如何处理客户端连接,特别是TCPHandler::runImpl()中的数据接收和查询执行过程。接着,关注executeQuery()和executeQueryImpl()在查询、插入、ALTER和MERGE操作中的核心逻辑,如数据写入、查询执行、表结构变更和数据合并。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

ClickHouse源码流程

入口 main函数在dbms/programs/main.cpp

int main(int argc_, char ** argv_)

{

/// Print a basic help if nothing was matched

MainFunc main_func = printHelp;//这里凭据启动时传入的参数来确定后面执行哪个func,对于server来说,对应的函数为mainEntryClickHouseServer

for (auto & application : clickhouse_applications)

{

if (isClickhouseApp(application.first, argv))

{

main_func = application.second;

break;

}

}

return main_func(static_cast(argv.size()), argv.data());//对于server,这里挪用mainEntryClickHouseServer后,转到dbms/programs/server/server.cpp

}

dbms/programs/server/server.cpp中,提供三类接口,根据源码的形貌,说明如下:

/** Server provides three interfaces:

* 1. HTTP – simple interface for any applications.

适用于任何应用程序的HTTP接口。

* 2. TCP – interface for native clickhouse-client and for server to server internal communications.

用于内陆client和server之间通讯的TCP接口。

* More rich and efficient, but less compatible

厚实高效,但兼容性欠好

* – data is transferred by columns;

数据按列传输

* – data is transferred compressed;

数据压缩后传输

* Allows to get more information in response.

允许在响应新闻中获取更多信息

* 3. Interserver HTTP – for replication.

用于复制的内部HTTP。

*/

dbms/programs/server/server.cpp中的main函数会剖析参数设置,初始化server,启动服务监听端口。

int mainEntryClickHouseServer(int argc, char ** argv)

{

DB::Server app;

try

{

return app.run(argc, argv);//这里挪用run。

}

catch (…)

{

std::cerr << DB::getCurrentExceptionMessage(true) << “\n”;

auto code = DB::getCurrentExceptionCode();

return code ? code : 1;

}

}

clickhouse使用poco这个网络库来处置网络请求,每个client毗邻的处置逻辑在dbms/programs/server//TCPHandler.cpp的run()方式中。

void TCPHandler::run()

{

try

{

runImpl();//这里挪用 runImpl函数。

LOG_INFO(log, “Done processing connection.”);

}

catch (Poco::Exception & e)

{

/// Timeout – not an error.

if (!strcmp(e.what(), “Timeout”))

{

LOG_DEBUG(log, “Poco::Exception. Code: ” << ErrorCodes::POCO_EXCEPTION << “, e.code() = ” << e.code()

<< “, e.displayText() = ” << e.displayText() << “, e.what() = ” << e.what());

}

else

throw;

}

}

在TCPHandler::runImpl()函数中,除去握手,初始化上下文,异常处置等代码,主要逻辑如下:

void TCPHandler::runImpl()

{

receivePacket();//吸收请求

executeQuery(state.query, *query_context, false, state.stage, may_have_embedded_data);//处置请求

/// Does the request require receive data from client?

if (state.need_receive_data_for_insert)

processInsertQuery(connection_settings);//卖力将效果返回给客户端

else if (state.need_receive_data_for_input)

{

/// It is special case for input(), all works for reading data from client will be done in callbacks.

/// state.io.in is NullAndDoCopyBlockInputStream so read it once.

state.io.in->read();

state.io.onFinish();

}

else if (state.io.pipeline.initialized())

processOrdinaryQueryWithProcessors(query_context->getSettingsRef().max_threads);//卖力将效果返回给客户端

else

processOrdinaryQuery();//卖力将效果返回给客户端

}

接下来,我们继续看executeQuery处置请求的逻辑,在dbms/src/Interpreters/executeQuery.cpp中,主要逻辑如下:

BlockIO executeQuery(

const String & query,

Context & context,

bool internal,

QueryProcessingStage::Enum stage,

bool may_have_embedded_data,

bool allow_processors)

{

std::tie(ast, streams) = executeQueryImpl(query.data(), query.data() + query.size(), context,

internal, stage, !may_have_embedded_data, nullptr, allow_processors);//这里挪用executeQueryImpl

}

接下来再看executeQueryImpl的主要处置逻辑:

static std::tuple executeQueryImpl(

const char * begin,

const char * end,

Context & context,

bool internal,

QueryProcessingStage::Enum stage,

bool has_query_tail,

ReadBuffer * istr,

bool allow_processors)

{

ast = parseQuery(parser, begin, end, “”, max_query_size, settings.max_parser_depth);//剖析查询语句

if (use_processors)//使用pipeline

pipeline = interpreter->executeWithProcessors();

else//不使用pipiline

res = interpreter->execute();//凭据interpreter的类型来挪用对应类型的execute函数执行

【写入】

excuteQuery() // called in TCPHandler::runImpl()

executeQueryImpl()
parseQuery()
interpreter->execute() // InterpreterInsertQuery
BlockIO InterpreterInsertQuery::execute() // 构造一个MergeTreeBlockOutputStream state.io
processInsertQuery()最外层write方法 called in TCPHandler::runImpl()

void MergeTreeBlockOutputStream::write(const Block & block) 写入数据

MergeTreeDataWriter.splitBlockIntoParts //分割block

MergeTreeDataWriter.writeTempPart // 写临时块

MergedBlockOutputStream.writePrefix // MergedBlockOutputStream用于写数据到磁盘
MergedBlockOutputStream.writeWithPermutation
MergedBlockOutputStream.calculateAndSerializeSkipIndices // 写skip indies
MergedBlockOutputStream.writeSuffixAndFinalizePart // 落盘
StorageMergeTree. renameTempPartAndAdd // part生效

MergeTreeData::renameTempPartAndReplace // 更新缓存part 并使part生效 

part->renameTo(part_name, true); //rename part
auto part_it = data_parts_indexes.insert(part).first; 更新缓存中的part
modifyPartState(part_it, DataPartState::Committed); //更改part状态使其在select中可见

 【查询

excuteQuery() // called in TCPHandler::runImpl()

executeQueryImpl()

parseQuery()

interpreter->execute() // InterpreterSelectQuery BlockIO InterpreterSelectQuery::execute()

InterpreterSelectQuery.executeImpl()

executeFetchColumns()

StorageMergeTree::read()

MergeTreeDataSelectExecutor.read

data.getDataPartsVector() //查询之前从内存中查出已经为Committed的part信息
1
MergeTreeDataSelectExecutor.readFromParts // 对part进行一系列剪枝过滤

processOrdinaryQuery() // 返回结果 Pull query execution result, if exists, and send it to network 

alter

excuteQuery() // called in TCPHandler::runImpl()

executeQueryImpl()

parseQuery()

interpreter->execute() // InterpreterSelectQuery BlockIO InterpreterSelectQuery::execute()

BlockIO InterpreterAlterQuery::execute()

table->alter(alter_commands, context, alter_lock)

void StorageMergeTree::alter() //alter的核心逻辑 

1、
//更新metadata信息
changeSettings(new_metadata.settings_changes, table_lock_holder);
checkTTLExpressions(new_metadata, old_metadata);
/// Reinitialize primary key because primary key column types might have changed.
setProperties(new_metadata, old_metadata);

2、
//更新表名和metadata
DatabaseCatalog::instance().getDatabase(table_id.database_name)->alterTable(context, table_id, new_metadata);

3、
//实例化MergeTreeMutationEntry对象,创建tmp_mutation_{xxx}.txt,写入command、format version、create time到文件
//tmp_mutation_{xxx}.txt重命名为mutation_{version}.txt
//MergeTreeMutationEntry对象放入current_mutations_by_id和current_mutations_by_version
//唤醒后台线程 ,执行mutation函数 mergeMutateTask
if (!maybe_mutation_commands.empty())
mutation_version = startMutation(maybe_mutation_commands, mutation_file_name);
放入mutation队列中,唤醒异步线程

4、等待任务结束
 

BackgroundProcessingPoolTaskResult StorageMergeTree::mergeMutateTask()//开去mutation 后台任务

{
    auto share_lock = lockForShare(RWLockImpl::NO_QUERY, getSettings()->lock_acquire_timeout_for_background_operations);
    /// All use relative_data_path which changes during rename
    /// so execute under share lock.
    clearOldPartsFromFilesystem();
    //找出outdated状态的part,从文件系统删除,从内存中删除( data_parts_indexes),写part log
    clearOldTemporaryDirectories();
    clearOldWriteAheadLogs();
    //清理旧的tem目录及wal
}
clearOldMutations();//清理已经完成的mutation
 

 bool StorageMergeTree::tryMutatePart() //正式执行mutation part

MergeList::EntryPtr merge_entry = global_context.getMergeList().insert(table_id.database_name, table_id.table_name, future_part); //?

new_part = merger_mutator.mutatePartToTemporaryPart()//生成tmp_mutation part

renameTempPartAndReplace(new_part);//持久化part 服用插入过程中的的逻辑
 

 【merge

//写入和修改也会唤醒merge task,手动optimize会走merge ,核心逻辑都在bool StorageMergeTree::merge()

MergeTreeDataMergeMutator.selectPartsToMerge // 选出最适合merge的分区的所有parts

FutureMergedMutatedPart.assign // 将所有parts的信息进行合并,初始化将要生成的新part对象

 part_info.partition_id = parts.front()->info.partition_id; 
part_info.min_block = parts.front()->info.min_block; //合并block
part_info.max_block = parts.back()->info.max_block;
part_info.level = max_level + 1; //level + 1
part_info.mutation = max_mutation; //设置mutation版本

 

//下面两个步骤和插入及mutation步骤一样

merger_mutator.mergePartsToTemporaryPart()// 落盘到临时目录

merger_mutator.renameMergedTemporaryPart(new_part, future_part.parts, nullptr); // mv到正式目录 

建表 create attach】

excuteQuery() // called in TCPHandler::runImpl()

executeQueryImpl()

parseQuery()

interpreter->execute() // InterpreterCreateQuery

BlockIO InterpreterCreateQuery::execute() 构造一个MergeTreeBlockOutputStream state.io
 

 

if (!create.cluster.empty()) //执行oncluster ddl
{
    prepareOnClusterQuery(create, context, create.cluster);
    return executeDDLQueryOnCluster(query_ptr, context, getRequiredAccess());
}

/// CREATE|ATTACH DATABASE
if (!create.database.empty() && create.table.empty())  //建库
    return createDatabase(create);
else if (!create.is_dictionary) //建表
    return createTable(create);
else
    return createDictionary(create); //建字典

BlockIO InterpreterCreateQuery::createTable(ASTCreateQuery & create) //建表

  • bool created = doCreateTable(create, properties);

 

 res = StorageFactory::instance().get(create,
        data_path,
        context,
        context.getGlobalContext(),
        properties.columns,
        properties.constraints,
        false);
} //写zk

database->createTable(context, create.table, res, query_ptr); //写数据

 

  • void DatabaseOnDisk::createTable() //attch的话直接更新内存
  • - creating the .sql.tmp file;

  • - adding a table to `tables`;
    - rename .sql.tmp to .sql.
     
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值