DuckDB:QueryProfiler(查询分析器)

在这里插入图片描述


在 DuckDB 中, QueryProfiler是一个用于查询性能分析的工具,帮助用户和开发者理解查询的执行过程、性能瓶颈以及优化方向。以下是对 QueryProfiler的功能、作用以及使用方法的详细解析。


1.QueryProfiler的作用

QueryProfiler的主要作用是提供详细的查询执行性能数据,帮助用户分析和优化查询。它能够记录以下信息:

• 查询计划的执行时间。

• 每个操作符的执行时间、数据量(Cardinality)和性能指标。

• 查询优化器和计划生成器的性能开销。

• 查询的总体延迟和资源使用情况。

通过这些数据,用户可以快速定位性能瓶颈,优化查询逻辑或数据库配置。


2.QueryProfiler的功能

QueryProfiler提供了多种功能,用于全面分析查询性能:

2.1查询计划分析

通过EXPLAINEXPLAIN ANALYZE语句,用户可以查看查询计划的详细信息:

EXPLAIN:显示查询的逻辑计划和物理计划,但不执行查询。

EXPLAIN ANALYZE:执行查询并显示实际的性能数据,包括每个操作符的执行时间、数据量等。

2.2性能指标

QueryProfiler提供以下性能指标:

CPU_TIME:操作符的累计执行时间。

LATENCY:查询的总体延迟。

OPERATOR_CARDINALITY:每个操作符处理的数据量。

OPERATOR_TIMING:每个操作符的执行时间。

2.3详细模式

通过设置profiling_mode = 'detailed',用户可以获取更多性能指标,例如:

• 查询优化器的执行时间(如OPTIMIZER_JOIN_ORDER)。

• 查询计划生成器的性能开销(如PLANNERPHYSICAL_PLANNER)。

2.4自定义指标

用户可以通过custom_profiling_settings动态启用或禁用特定的性能指标。例如:

PRAGMA custom_profiling_settings = '{"CPU_TIME": "false", "OPERATOR_CARDINALITY": "true"}';

2.5输出格式

QueryProfiler支持多种输出格式,包括:

• 文本格式:直接在控制台输出。

• JSON 格式:便于后续处理和可视化。


3.使用方法

QueryProfiler的使用方法非常灵活,可以通过 SQL 命令或 API 进行配置。

3.1启用和配置

通过以下 SQL 命令启用和配置QueryProfiler

PRAGMA enable_profiling = 'json'; -- 启用性能分析并设置输出格式为 JSON
PRAGMA profiling_output = '/path/to/output.json'; -- 设置输出文件路径
PRAGMA profiling_mode = 'detailed'; -- 启用详细模式

3.2执行查询

使用EXPLAIN ANALYZE执行查询并获取性能数据:

EXPLAIN ANALYZE SELECT name FROM students JOIN exams USING (sid) WHERE name LIKE 'Ma%';

3.3查看结果

查询执行完成后,性能数据会输出到指定的文件中。例如,JSON 格式的输出可能如下:

{
    "query_name": "SELECT name\nFROM students\nJOIN exams USING (sid)\nWHERE name LIKE 'Ma%';",
    "cpu_time": 0.000095,
    "latency": 0.003708,
    "children": [
        {
            "operator_type": "PROJECTION",
            "operator_timing": 0.000001,
            "operator_cardinality": 2,
            "extra_info": {
                "Projections": "name",
                "Estimated Cardinality": "1"
            }
        },
        {
            "operator_type": "HASH_JOIN",
            "operator_timing": 0.000238,
            "operator_cardinality": 2,
            "extra_info": {
                "Join Type": "INNER",
                "Conditions": "sid = sid"
            }
        }
    ]
}

3.4可视化

DuckDB 提供了一个 Python 脚本,可以将 JSON 格式的性能数据渲染为查询图(Query Graph),便于直观分析:

python -m duckdb.query_graph /path/to/output.json

4.QueryProfiler的优势

QueryProfiler提供了以下优势:

• 全面的性能数据:能够记录查询的每个阶段(如解析、优化、执行)的性能开销。

• 灵活的配置:用户可以根据需要启用或禁用特定的性能指标。

• 多种输出格式:支持文本和 JSON 格式,便于后续处理和可视化。

• 易于使用:通过简单的 SQL 命令即可启用和配置。


5.总结

QueryProfiler是 DuckDB 中用于查询性能分析的强大工具。它通过EXPLAIN ANALYZE和多种性能指标,帮助用户深入理解查询的执行过程和性能瓶颈。通过灵活的配置和多种输出格式,用户可以轻松地分析和优化查询性能。

如果你需要更深入地了解QueryProfiler的实现细节,可以参考 DuckDB 的源码文件:

src/main/query_profiler.cpp
src/include/duckdb/main/query_profiler.hpp

这些文件中包含了QueryProfiler的具体实现和使用方式。
query_profiler.hpp

//===----------------------------------------------------------------------===//
//                         DuckDB
//
// duckdb/main/query_profiler.hpp
//
//
//===----------------------------------------------------------------------===//

#pragma once

#include "duckdb/common/common.hpp"
#include "duckdb/common/deque.hpp"
#include "duckdb/common/enums/profiler_format.hpp"
#include "duckdb/common/enums/explain_format.hpp"
#include "duckdb/common/pair.hpp"
#include "duckdb/common/profiler.hpp"
#include "duckdb/common/reference_map.hpp"
#include "duckdb/common/string_util.hpp"
#include "duckdb/common/types/data_chunk.hpp"
#include "duckdb/common/unordered_map.hpp"
#include "duckdb/common/winapi.hpp"
#include "duckdb/execution/expression_executor_state.hpp"
#include "duckdb/execution/physical_operator.hpp"
#include "duckdb/main/profiling_info.hpp"
#include "duckdb/main/profiling_node.hpp"

#include <stack>

namespace duckdb {
class ClientContext;
class ExpressionExecutor;
class ProfilingNode;
class PhysicalOperator;
class SQLStatement;

struct OperatorInformation {
	explicit OperatorInformation(double time_p = 0, idx_t elements_returned_p = 0, idx_t elements_scanned_p = 0,
	                             idx_t result_set_size_p = 0)
	    : time(time_p), elements_returned(elements_returned_p), result_set_size(result_set_size_p) {
	}

	double time;
	idx_t elements_returned;
	idx_t result_set_size;
	string name;
	InsertionOrderPreservingMap<string> extra_info;

	void AddTime(double n_time) {
		time += n_time;
	}

	void AddReturnedElements(idx_t n_elements) {
		elements_returned += n_elements;
	}

	void AddResultSetSize(idx_t n_result_set_size) {
		result_set_size += n_result_set_size;
	}
};

//! The OperatorProfiler measures timings of individual operators
//! This class exists once for all operators and collects `OperatorInfo` for each operator
class OperatorProfiler {
	friend class QueryProfiler;

public:
	DUCKDB_API explicit OperatorProfiler(ClientContext &context);
	~OperatorProfiler() {
	}

public:
	DUCKDB_API void StartOperator(optional_ptr<const PhysicalOperator> phys_op);
	DUCKDB_API void EndOperator(optional_ptr<DataChunk> chunk);

	//! Adds the timings in the OperatorProfiler (tree) to the QueryProfiler (tree).
	DUCKDB_API void Flush(const PhysicalOperator &phys_op);
	DUCKDB_API OperatorInformation &GetOperatorInfo(const PhysicalOperator &phys_op);

public:
	ClientContext &context;

private:
	//! Whether or not the profiler is enabled
	bool enabled;
	//! Sub-settings for the operator profiler
	profiler_settings_t settings;

	//! The timer used to time the execution time of the individual Physical Operators
	Profiler op;
	//! The stack of Physical Operators that are currently active
	optional_ptr<const PhysicalOperator> active_operator;
	//! A mapping of physical operators to profiled operator information.
	reference_map_t<const PhysicalOperator, OperatorInformation> operator_infos;
};

struct QueryInfo {
	QueryInfo() : blocked_thread_time(0) {};
	string query_name;
	double blocked_thread_time;
};

//! The QueryProfiler can be used to measure timings of queries
class QueryProfiler {
public:
	DUCKDB_API explicit QueryProfiler(ClientContext &context);

public:
	// Propagate save_location, enabled, detailed_enabled and automatic_print_format.
	void Propagate(QueryProfiler &qp);

	using TreeMap = reference_map_t<const PhysicalOperator, reference<ProfilingNode>>;

private:
	unique_ptr<ProfilingNode> CreateTree(const PhysicalOperator &root, const profiler_settings_t &settings,
	                                     const idx_t depth = 0);
	void Render(const ProfilingNode &node, std::ostream &str) const;
	string RenderDisabledMessage(ProfilerPrintFormat format) const;

public:
	DUCKDB_API bool IsEnabled() const;
	DUCKDB_API bool IsDetailedEnabled() const;
	DUCKDB_API ProfilerPrintFormat GetPrintFormat(ExplainFormat format = ExplainFormat::DEFAULT) const;
	DUCKDB_API bool PrintOptimizerOutput() const;
	DUCKDB_API string GetSaveLocation() const;

	DUCKDB_API static QueryProfiler &Get(ClientContext &context);

	DUCKDB_API void StartQuery(string query, bool is_explain_analyze = false, bool start_at_optimizer = false);
	DUCKDB_API void EndQuery();

	DUCKDB_API void StartExplainAnalyze();

	//! Adds the timings gathered by an OperatorProfiler to this query profiler
	DUCKDB_API void Flush(OperatorProfiler &profiler);
	//! Adds the top level query information to the global profiler.
	DUCKDB_API void SetInfo(const double &blocked_thread_time);

	DUCKDB_API void StartPhase(MetricsType phase_metric);
	DUCKDB_API void EndPhase();

	DUCKDB_API void Initialize(const PhysicalOperator &root);

	DUCKDB_API string QueryTreeToString() const;
	DUCKDB_API void QueryTreeToStream(std::ostream &str) const;
	DUCKDB_API void Print();

	//! return the printed as a string. Unlike ToString, which is always formatted as a string,
	//! the return value is formatted based on the current print format (see GetPrintFormat()).
	DUCKDB_API string ToString(ExplainFormat format = ExplainFormat::DEFAULT) const;
	DUCKDB_API string ToString(ProfilerPrintFormat format) const;

	static InsertionOrderPreservingMap<string> JSONSanitize(const InsertionOrderPreservingMap<string> &input);
	static string JSONSanitize(const string &text);
	static string DrawPadded(const string &str, idx_t width);
	DUCKDB_API string ToJSON() const;
	DUCKDB_API void WriteToFile(const char *path, string &info) const;

	idx_t OperatorSize() {
		return tree_map.size();
	}

	void Finalize(ProfilingNode &node);

	//! Return the root of the query tree
	optional_ptr<ProfilingNode> GetRoot() {
		return root.get();
	}

private:
	ClientContext &context;

	//! Whether or not the query profiler is running
	bool running;
	//! The lock used for accessing the global query profiler or flushing information to it from a thread
	mutable std::mutex lock;

	//! Whether or not the query requires profiling
	bool query_requires_profiling;

	//! The root of the query tree
	unique_ptr<ProfilingNode> root;

	//! Top level query information.
	QueryInfo query_info;
	//! The timer used to time the execution time of the entire query
	Profiler main_query;
	//! A map of a Physical Operator pointer to a tree node
	TreeMap tree_map;
	//! Whether or not we are running as part of a explain_analyze query
	bool is_explain_analyze;

public:
	const TreeMap &GetTreeMap() const {
		return tree_map;
	}

private:
	//! The timer used to time the individual phases of the planning process
	Profiler phase_profiler;
	//! A mapping of the phase names to the timings
	using PhaseTimingStorage = unordered_map<MetricsType, double, MetricsTypeHashFunction>;
	PhaseTimingStorage phase_timings;
	using PhaseTimingItem = PhaseTimingStorage::value_type;
	//! The stack of currently active phases
	vector<MetricsType> phase_stack;

private:
	void MoveOptimizerPhasesToRoot();

	//! Check whether or not an operator type requires query profiling. If none of the ops in a query require profiling
	//! no profiling information is output.
	bool OperatorRequiresProfiling(PhysicalOperatorType op_type);
	ExplainFormat GetExplainFormat(ProfilerPrintFormat format) const;
};

} // namespace duckdb

query_profiler.cpp

// query_profiler.cpp
#include "duckdb/main/query_profiler.hpp"

#include "duckdb/common/fstream.hpp"
#include "duckdb/common/limits.hpp"
#include "duckdb/common/numeric_utils.hpp"
#include "duckdb/common/printer.hpp"
#include "duckdb/common/string_util.hpp"
#include "duckdb/common/tree_renderer/text_tree_renderer.hpp"
#include "duckdb/execution/expression_executor.hpp"
#include "duckdb/execution/operator/helper/physical_execute.hpp"
#include "duckdb/execution/operator/scan/physical_table_scan.hpp"
#include "duckdb/execution/physical_operator.hpp"
#include "duckdb/main/client_config.hpp"
#include "duckdb/main/client_context.hpp"
#include "duckdb/main/client_data.hpp"
#include "duckdb/planner/expression/bound_function_expression.hpp"
#include "yyjson.hpp"

#include <algorithm>
#include <utility>

using namespace duckdb_yyjson; // NOLINT

namespace duckdb {

QueryProfiler::QueryProfiler(ClientContext &context_p)
    : context(context_p), running(false), query_requires_profiling(false), is_explain_analyze(false) {
}

bool QueryProfiler::IsEnabled() const {
	return is_explain_analyze || ClientConfig::GetConfig(context).enable_profiler;
}

bool QueryProfiler::IsDetailedEnabled() const {
	return !is_explain_analyze && ClientConfig::GetConfig(context).enable_detailed_profiling;
}

ProfilerPrintFormat QueryProfiler::GetPrintFormat(ExplainFormat format) const {
	auto print_format = ClientConfig::GetConfig(context).profiler_print_format;
	if (format == ExplainFormat::DEFAULT) {
		return print_format;
	}
	switch (format) {
	case ExplainFormat::TEXT:
		return ProfilerPrintFormat::QUERY_TREE;
	case ExplainFormat::JSON:
		return ProfilerPrintFormat::JSON;
	case ExplainFormat::HTML:
		return ProfilerPrintFormat::HTML;
	case ExplainFormat::GRAPHVIZ:
		return ProfilerPrintFormat::GRAPHVIZ;
	default:
		throw NotImplementedException("No mapping from ExplainFormat::%s to ProfilerPrintFormat",
		                              EnumUtil::ToString(format));
	}
}

ExplainFormat QueryProfiler::GetExplainFormat(ProfilerPrintFormat format) const {
	switch (format) {
	case ProfilerPrintFormat::QUERY_TREE:
	case ProfilerPrintFormat::QUERY_TREE_OPTIMIZER:
		return ExplainFormat::TEXT;
	case ProfilerPrintFormat::JSON:
		return ExplainFormat::JSON;
	case ProfilerPrintFormat::HTML:
		return ExplainFormat::HTML;
	case ProfilerPrintFormat::GRAPHVIZ:
		return ExplainFormat::GRAPHVIZ;
	case ProfilerPrintFormat::NO_OUTPUT:
		throw InternalException("Should not attempt to get ExplainFormat for ProfilerPrintFormat::NO_OUTPUT");
	default:
		throw NotImplementedException("No mapping from ProfilePrintFormat::%s to ExplainFormat",
		                              EnumUtil::ToString(format));
	}
}

bool QueryProfiler::PrintOptimizerOutput() const {
	return GetPrintFormat() == ProfilerPrintFormat::QUERY_TREE_OPTIMIZER || IsDetailedEnabled();
}

string QueryProfiler::GetSaveLocation() const {
	return is_explain_analyze ? string() : ClientConfig::GetConfig(context).profiler_save_location;
}

QueryProfiler &QueryProfiler::Get(ClientContext &context) {
	return *ClientData::Get(context).profiler;
}

void QueryProfiler::StartQuery(string query, bool is_explain_analyze_p, bool start_at_optimizer) {
	lock_guard<std::mutex> guard(lock);
	if (is_explain_analyze_p) {
		StartExplainAnalyze();
	}
	if (!IsEnabled()) {
		return;
	}
	if (start_at_optimizer && !PrintOptimizerOutput()) {
		// This is the StartQuery call before the optimizer, but we don't have to print optimizer output
		return;
	}
	if (running) {
		// Called while already running: this should only happen when we print optimizer output
		D_ASSERT(PrintOptimizerOutput());
		return;
	}

	running = true;
	query_info.query_name = std::move(query);
	tree_map.clear();
	root = nullptr;
	phase_timings.clear();
	phase_stack.clear();
	main_query.Start();
}

bool QueryProfiler::OperatorRequiresProfiling(PhysicalOperatorType op_type) {
	switch (op_type) {
	case PhysicalOperatorType::ORDER_BY:
	case PhysicalOperatorType::RESERVOIR_SAMPLE:
	case PhysicalOperatorType::STREAMING_SAMPLE:
	case PhysicalOperatorType::LIMIT:
	case PhysicalOperatorType::LIMIT_PERCENT:
	case PhysicalOperatorType::STREAMING_LIMIT:
	case PhysicalOperatorType::TOP_N:
	case PhysicalOperatorType::WINDOW:
	case PhysicalOperatorType::UNNEST:
	case PhysicalOperatorType::UNGROUPED_AGGREGATE:
	case PhysicalOperatorType::HASH_GROUP_BY:
	case PhysicalOperatorType::FILTER:
	case PhysicalOperatorType::PROJECTION:
	case PhysicalOperatorType::COPY_TO_FILE:
	case PhysicalOperatorType::TABLE_SCAN:
	case PhysicalOperatorType::CHUNK_SCAN:
	case PhysicalOperatorType::DELIM_SCAN:
	case PhysicalOperatorType::EXPRESSION_SCAN:
	case PhysicalOperatorType::BLOCKWISE_NL_JOIN:
	case PhysicalOperatorType::NESTED_LOOP_JOIN:
	case PhysicalOperatorType::HASH_JOIN:
	case PhysicalOperatorType::CROSS_PRODUCT:
	case PhysicalOperatorType::PIECEWISE_MERGE_JOIN:
	case PhysicalOperatorType::IE_JOIN:
	case PhysicalOperatorType::LEFT_DELIM_JOIN:
	case PhysicalOperatorType::RIGHT_DELIM_JOIN:
	case PhysicalOperatorType::UNION:
	case PhysicalOperatorType::RECURSIVE_CTE:
	case PhysicalOperatorType::EMPTY_RESULT:
	case PhysicalOperatorType::EXTENSION:
		return true;
	default:
		return false;
	}
}

void QueryProfiler::Finalize(ProfilingNode &node) {
	for (idx_t i = 0; i < node.GetChildCount(); i++) {
		auto child = node.GetChild(i);
		Finalize(*child);

		auto &info = node.GetProfilingInfo();
		auto type = PhysicalOperatorType(info.GetMetricValue<uint8_t>(MetricsType::OPERATOR_TYPE));
		if (type == PhysicalOperatorType::UNION &&
		    info.Enabled(info.expanded_settings, MetricsType::OPERATOR_CARDINALITY)) {

			auto &child_info = child->GetProfilingInfo();
			auto value = child_info.metrics[MetricsType::OPERATOR_CARDINALITY].GetValue<idx_t>();
			info.AddToMetric(MetricsType::OPERATOR_CARDINALITY, value);
		}
	}
}

void QueryProfiler::StartExplainAnalyze() {
	is_explain_analyze = true;
}

template <class METRIC_TYPE>
static void GetCumulativeMetric(ProfilingNode &node, MetricsType cumulative_metric, MetricsType child_metric) {
	auto &info = node.GetProfilingInfo();
	info.metrics[cumulative_metric] = info.metrics[child_metric];

	for (idx_t i = 0; i < node.GetChildCount(); i++) {
		auto child = node.GetChild(i);
		GetCumulativeMetric<METRIC_TYPE>(*child, cumulative_metric, child_metric);
		auto value = child->GetProfilingInfo().metrics[cumulative_metric].GetValue<METRIC_TYPE>();
		info.AddToMetric(cumulative_metric, value);
	}
}

Value GetCumulativeOptimizers(ProfilingNode &node) {
	auto &metrics = node.GetProfilingInfo().metrics;
	double count = 0;
	for (auto &metric : metrics) {
		if (MetricsUtils::IsOptimizerMetric(metric.first)) {
			count += metric.second.GetValue<double>();
		}
	}
	return Value::CreateValue(count);
}

void QueryProfiler::EndQuery() {
	unique_lock<std::mutex> guard(lock);
	if (!IsEnabled() || !running) {
		return;
	}

	main_query.End();
	if (root) {
		auto &info = root->GetProfilingInfo();
		if (info.Enabled(info.expanded_settings, MetricsType::OPERATOR_CARDINALITY)) {
			Finalize(*root->GetChild(0));
		}
	}
	running = false;

	bool emit_output = false;

	// Print or output the query profiling after query termination.
	// EXPLAIN ANALYZE output is not written by the profiler.
	if (IsEnabled() && !is_explain_analyze) {
		if (root) {
			auto &info = root->GetProfilingInfo();
			info = ProfilingInfo(ClientConfig::GetConfig(context).profiler_settings);
			auto &child_info = root->children[0]->GetProfilingInfo();
			info.metrics[MetricsType::QUERY_NAME] = query_info.query_name;

			auto &settings = info.expanded_settings;
			if (info.Enabled(settings, MetricsType::BLOCKED_THREAD_TIME)) {
				info.metrics[MetricsType::BLOCKED_THREAD_TIME] = query_info.blocked_thread_time;
			}
			if (info.Enabled(settings, MetricsType::LATENCY)) {
				info.metrics[MetricsType::LATENCY] = main_query.Elapsed();
			}
			if (info.Enabled(settings, MetricsType::ROWS_RETURNED)) {
				info.metrics[MetricsType::ROWS_RETURNED] = child_info.metrics[MetricsType::OPERATOR_CARDINALITY];
			}
			if (info.Enabled(settings, MetricsType::CPU_TIME)) {
				GetCumulativeMetric<double>(*root, MetricsType::CPU_TIME, MetricsType::OPERATOR_TIMING);
			}
			if (info.Enabled(settings, MetricsType::CUMULATIVE_CARDINALITY)) {
				GetCumulativeMetric<idx_t>(*root, MetricsType::CUMULATIVE_CARDINALITY,
				                           MetricsType::OPERATOR_CARDINALITY);
			}
			if (info.Enabled(settings, MetricsType::CUMULATIVE_ROWS_SCANNED)) {
				GetCumulativeMetric<idx_t>(*root, MetricsType::CUMULATIVE_ROWS_SCANNED,
				                           MetricsType::OPERATOR_ROWS_SCANNED);
			}
			if (info.Enabled(settings, MetricsType::RESULT_SET_SIZE)) {
				info.metrics[MetricsType::RESULT_SET_SIZE] = child_info.metrics[MetricsType::RESULT_SET_SIZE];
			}

			MoveOptimizerPhasesToRoot();
			if (info.Enabled(settings, MetricsType::CUMULATIVE_OPTIMIZER_TIMING)) {
				info.metrics.at(MetricsType::CUMULATIVE_OPTIMIZER_TIMING) = GetCumulativeOptimizers(*root);
			}
		}

		if (ClientConfig::GetConfig(context).emit_profiler_output) {
			emit_output = true;
		}
	}

	is_explain_analyze = false;

	guard.unlock();

	if (emit_output) {
		string tree = ToString();
		auto save_location = GetSaveLocation();

		if (save_location.empty()) {
			Printer::Print(tree);
			Printer::Print("\n");
		} else {
			WriteToFile(save_location.c_str(), tree);
		}
	}
}

string QueryProfiler::ToString(ExplainFormat explain_format) const {
	return ToString(GetPrintFormat(explain_format));
}

string QueryProfiler::ToString(ProfilerPrintFormat format) const {
	if (!IsEnabled()) {
		return RenderDisabledMessage(format);
	}
	switch (format) {
	case ProfilerPrintFormat::QUERY_TREE:
	case ProfilerPrintFormat::QUERY_TREE_OPTIMIZER:
		return QueryTreeToString();
	case ProfilerPrintFormat::JSON:
		return ToJSON();
	case ProfilerPrintFormat::NO_OUTPUT:
		return "";
	case ProfilerPrintFormat::HTML:
	case ProfilerPrintFormat::GRAPHVIZ: {
		lock_guard<std::mutex> guard(lock);
		// checking the tree to ensure the query is really empty
		// the query string is empty when a logical plan is deserialized
		if (query_info.query_name.empty() && !root) {
			return "";
		}
		auto renderer = TreeRenderer::CreateRenderer(GetExplainFormat(format));
		std::stringstream str;
		auto &info = root->GetProfilingInfo();
		if (info.Enabled(info.expanded_settings, MetricsType::OPERATOR_TIMING)) {
			info.metrics[MetricsType::OPERATOR_TIMING] = main_query.Elapsed();
		}
		renderer->Render(*root, str);
		return str.str();
	}
	default:
		throw InternalException("Unknown ProfilerPrintFormat \"%s\"", EnumUtil::ToString(format));
	}
}

void QueryProfiler::StartPhase(MetricsType phase_metric) {
	lock_guard<std::mutex> guard(lock);
	if (!IsEnabled() || !running) {
		return;
	}

	// start a new phase
	phase_stack.push_back(phase_metric);
	// restart the timer
	phase_profiler.Start();
}

void QueryProfiler::EndPhase() {
	lock_guard<std::mutex> guard(lock);
	if (!IsEnabled() || !running) {
		return;
	}
	D_ASSERT(!phase_stack.empty());

	// end the timer
	phase_profiler.End();
	// add the timing to all currently active phases
	for (auto &phase : phase_stack) {
		phase_timings[phase] += phase_profiler.Elapsed();
	}
	// now remove the last added phase
	phase_stack.pop_back();

	if (!phase_stack.empty()) {
		phase_profiler.Start();
	}
}

OperatorProfiler::OperatorProfiler(ClientContext &context) : context(context) {
	enabled = QueryProfiler::Get(context).IsEnabled();
	auto &context_metrics = ClientConfig::GetConfig(context).profiler_settings;

	// Expand.
	for (const auto metric : context_metrics) {
		settings.insert(metric);
		ProfilingInfo::Expand(settings, metric);
	}

	// Reduce.
	auto root_metrics = ProfilingInfo::DefaultRootSettings();
	for (const auto metric : root_metrics) {
		settings.erase(metric);
	}
}

void OperatorProfiler::StartOperator(optional_ptr<const PhysicalOperator> phys_op) {
	if (!enabled) {
		return;
	}
	if (active_operator) {
		throw InternalException("OperatorProfiler: Attempting to call StartOperator while another operator is active");
	}
	active_operator = phys_op;

	if (!settings.empty()) {
		if (ProfilingInfo::Enabled(settings, MetricsType::EXTRA_INFO)) {
			auto &info = GetOperatorInfo(*active_operator);
			auto params = active_operator->ParamsToString();
			info.extra_info = params;
		}

		// Start the timing of the current operator.
		if (ProfilingInfo::Enabled(settings, MetricsType::OPERATOR_TIMING)) {
			op.Start();
		}
	}
}

void OperatorProfiler::EndOperator(optional_ptr<DataChunk> chunk) {
	if (!enabled) {
		return;
	}
	if (!active_operator) {
		throw InternalException("OperatorProfiler: Attempting to call EndOperator while another operator is active");
	}

	if (!settings.empty()) {
		auto &info = GetOperatorInfo(*active_operator);
		if (ProfilingInfo::Enabled(settings, MetricsType::OPERATOR_TIMING)) {
			op.End();
			info.AddTime(op.Elapsed());
		}
		if (ProfilingInfo::Enabled(settings, MetricsType::OPERATOR_CARDINALITY) && chunk) {
			info.AddReturnedElements(chunk->size());
		}
		if (ProfilingInfo::Enabled(settings, MetricsType::RESULT_SET_SIZE) && chunk) {
			auto result_set_size = chunk->GetAllocationSize();
			info.AddResultSetSize(result_set_size);
		}
	}
	active_operator = nullptr;
}

OperatorInformation &OperatorProfiler::GetOperatorInfo(const PhysicalOperator &phys_op) {
	auto entry = operator_infos.find(phys_op);
	if (entry != operator_infos.end()) {
		return entry->second;
	}

	// Add a new entry.
	operator_infos[phys_op] = OperatorInformation();
	return operator_infos[phys_op];
}

void OperatorProfiler::Flush(const PhysicalOperator &phys_op) {
	auto entry = operator_infos.find(phys_op);
	if (entry == operator_infos.end()) {
		return;
	}

	auto &info = operator_infos.find(phys_op)->second;
	info.name = phys_op.GetName();
}

void QueryProfiler::Flush(OperatorProfiler &profiler) {
	lock_guard<std::mutex> guard(lock);
	if (!IsEnabled() || !running) {
		return;
	}
	for (auto &node : profiler.operator_infos) {
		auto &op = node.first.get();
		auto entry = tree_map.find(op);
		D_ASSERT(entry != tree_map.end());

		auto &tree_node = entry->second.get();
		auto &info = tree_node.GetProfilingInfo();

		if (ProfilingInfo::Enabled(profiler.settings, MetricsType::OPERATOR_TIMING)) {
			info.AddToMetric<double>(MetricsType::OPERATOR_TIMING, node.second.time);
		}
		if (ProfilingInfo::Enabled(profiler.settings, MetricsType::OPERATOR_CARDINALITY)) {
			info.AddToMetric<idx_t>(MetricsType::OPERATOR_CARDINALITY, node.second.elements_returned);
		}
		if (ProfilingInfo::Enabled(profiler.settings, MetricsType::OPERATOR_ROWS_SCANNED)) {
			if (op.type == PhysicalOperatorType::TABLE_SCAN) {
				auto &scan_op = op.Cast<PhysicalTableScan>();
				auto &bind_data = scan_op.bind_data;

				if (bind_data && scan_op.function.cardinality) {
					auto cardinality = scan_op.function.cardinality(context, &(*bind_data));
					if (cardinality && cardinality->has_estimated_cardinality) {
						info.AddToMetric<idx_t>(MetricsType::OPERATOR_ROWS_SCANNED, cardinality->estimated_cardinality);
					}
				}
			}
		}
		if (ProfilingInfo::Enabled(profiler.settings, MetricsType::RESULT_SET_SIZE)) {
			info.AddToMetric<idx_t>(MetricsType::RESULT_SET_SIZE, node.second.result_set_size);
		}
		if (ProfilingInfo::Enabled(profiler.settings, MetricsType::EXTRA_INFO)) {
			info.extra_info = node.second.extra_info;
		}
	}
	profiler.operator_infos.clear();
}

void QueryProfiler::SetInfo(const double &blocked_thread_time) {
	lock_guard<std::mutex> guard(lock);
	if (!IsEnabled() || !running) {
		return;
	}

	auto &info = root->GetProfilingInfo();
	auto metric_enabled = info.Enabled(info.expanded_settings, MetricsType::BLOCKED_THREAD_TIME);
	if (!metric_enabled) {
		return;
	}
	query_info.blocked_thread_time = blocked_thread_time;
}

string QueryProfiler::DrawPadded(const string &str, idx_t width) {
	if (str.size() > width) {
		return str.substr(0, width);
	} else {
		width -= str.size();
		auto half_spaces = width / 2;
		auto extra_left_space = NumericCast<idx_t>(width % 2 != 0 ? 1 : 0);
		return string(half_spaces + extra_left_space, ' ') + str + string(half_spaces, ' ');
	}
}

static string RenderTitleCase(string str) {
	str = StringUtil::Lower(str);
	str[0] = NumericCast<char>(toupper(str[0]));
	for (idx_t i = 0; i < str.size(); i++) {
		if (str[i] == '_') {
			str[i] = ' ';
			if (i + 1 < str.size()) {
				str[i + 1] = NumericCast<char>(toupper(str[i + 1]));
			}
		}
	}
	return str;
}

static string RenderTiming(double timing) {
	string timing_s;
	if (timing >= 1) {
		timing_s = StringUtil::Format("%.2f", timing);
	} else if (timing >= 0.1) {
		timing_s = StringUtil::Format("%.3f", timing);
	} else {
		timing_s = StringUtil::Format("%.4f", timing);
	}
	return timing_s + "s";
}

string QueryProfiler::QueryTreeToString() const {
	std::stringstream str;
	QueryTreeToStream(str);
	return str.str();
}

void RenderPhaseTimings(std::ostream &ss, const pair<string, double> &head, map<string, double> &timings, idx_t width) {
	ss << "┌────────────────────────────────────────────────┐\n";
	ss << "│" + QueryProfiler::DrawPadded(RenderTitleCase(head.first) + ": " + RenderTiming(head.second), width - 2) +
	          "│\n";
	ss << "│┌──────────────────────────────────────────────┐│\n";

	for (const auto &entry : timings) {
		ss << "││" +
		          QueryProfiler::DrawPadded(RenderTitleCase(entry.first) + ": " + RenderTiming(entry.second),
		                                    width - 4) +
		          "││\n";
	}
	ss << "│└──────────────────────────────────────────────┘│\n";
	ss << "└────────────────────────────────────────────────┘\n";
}

void PrintPhaseTimingsToStream(std::ostream &ss, const ProfilingInfo &info, idx_t width) {
	map<string, double> optimizer_timings;
	map<string, double> planner_timings;
	map<string, double> physical_planner_timings;

	pair<string, double> optimizer_head;
	pair<string, double> planner_head;
	pair<string, double> physical_planner_head;

	for (const auto &entry : info.metrics) {
		if (MetricsUtils::IsOptimizerMetric(entry.first)) {
			optimizer_timings[EnumUtil::ToString(entry.first).substr(10)] = entry.second.GetValue<double>();
		} else if (MetricsUtils::IsPhaseTimingMetric(entry.first)) {
			switch (entry.first) {
			case MetricsType::CUMULATIVE_OPTIMIZER_TIMING:
				continue;
			case MetricsType::ALL_OPTIMIZERS:
				optimizer_head = {"Optimizer", entry.second.GetValue<double>()};
				break;
			case MetricsType::PHYSICAL_PLANNER:
				physical_planner_head = {"Physical Planner", entry.second.GetValue<double>()};
				break;
			case MetricsType::PLANNER:
				planner_head = {"Planner", entry.second.GetValue<double>()};
				break;
			default:
				break;
			}

			auto metric = EnumUtil::ToString(entry.first);
			if (StringUtil::StartsWith(metric, "PHYSICAL_PLANNER") && entry.first != MetricsType::PHYSICAL_PLANNER) {
				physical_planner_timings[metric.substr(17)] = entry.second.GetValue<double>();
			} else if (StringUtil::StartsWith(metric, "PLANNER") && entry.first != MetricsType::PLANNER) {
				planner_timings[metric.substr(8)] = entry.second.GetValue<double>();
			}
		}
	}

	RenderPhaseTimings(ss, optimizer_head, optimizer_timings, width);
	RenderPhaseTimings(ss, physical_planner_head, physical_planner_timings, width);
	RenderPhaseTimings(ss, planner_head, planner_timings, width);
}

void QueryProfiler::QueryTreeToStream(std::ostream &ss) const {
	lock_guard<std::mutex> guard(lock);
	ss << "┌─────────────────────────────────────┐\n";
	ss << "│┌───────────────────────────────────┐│\n";
	ss << "││    Query Profiling Information    ││\n";
	ss << "│└───────────────────────────────────┘│\n";
	ss << "└─────────────────────────────────────┘\n";
	ss << StringUtil::Replace(query_info.query_name, "\n", " ") + "\n";

	// checking the tree to ensure the query is really empty
	// the query string is empty when a logical plan is deserialized
	if (query_info.query_name.empty() && !root) {
		return;
	}

	for (auto &state : context.registered_state->States()) {
		state->WriteProfilingInformation(ss);
	}

	constexpr idx_t TOTAL_BOX_WIDTH = 50;
	ss << "┌────────────────────────────────────────────────┐\n";
	ss << "│┌──────────────────────────────────────────────┐│\n";
	string total_time = "Total Time: " + RenderTiming(main_query.Elapsed());
	ss << "││" + DrawPadded(total_time, TOTAL_BOX_WIDTH - 4) + "││\n";
	ss << "│└──────────────────────────────────────────────┘│\n";
	ss << "└────────────────────────────────────────────────┘\n";
	// render the main operator tree
	if (root) {
		// print phase timings
		if (PrintOptimizerOutput()) {
			PrintPhaseTimingsToStream(ss, root->GetProfilingInfo(), TOTAL_BOX_WIDTH);
		}
		Render(*root, ss);
	}
}

InsertionOrderPreservingMap<string> QueryProfiler::JSONSanitize(const InsertionOrderPreservingMap<string> &input) {
	InsertionOrderPreservingMap<string> result;
	for (auto &it : input) {
		auto key = it.first;
		if (StringUtil::StartsWith(key, "__")) {
			key = StringUtil::Replace(key, "__", "");
			key = StringUtil::Replace(key, "_", " ");
			key = StringUtil::Title(key);
		}
		result[key] = it.second;
	}
	return result;
}

string QueryProfiler::JSONSanitize(const std::string &text) {
	string result;
	result.reserve(text.size());
	for (char i : text) {
		switch (i) {
		case '\b':
			result += "\\b";
			break;
		case '\f':
			result += "\\f";
			break;
		case '\n':
			result += "\\n";
			break;
		case '\r':
			result += "\\r";
			break;
		case '\t':
			result += "\\t";
			break;
		case '"':
			result += "\\\"";
			break;
		case '\\':
			result += "\\\\";
			break;
		default:
			result += i;
			break;
		}
	}
	return result;
}

static yyjson_mut_val *ToJSONRecursive(yyjson_mut_doc *doc, ProfilingNode &node) {
	auto result_obj = yyjson_mut_obj(doc);
	auto &profiling_info = node.GetProfilingInfo();
	profiling_info.extra_info = QueryProfiler::JSONSanitize(profiling_info.extra_info);
	profiling_info.WriteMetricsToJSON(doc, result_obj);

	auto children_list = yyjson_mut_arr(doc);
	for (idx_t i = 0; i < node.GetChildCount(); i++) {
		auto child = ToJSONRecursive(doc, *node.GetChild(i));
		yyjson_mut_arr_add_val(children_list, child);
	}
	yyjson_mut_obj_add_val(doc, result_obj, "children", children_list);
	return result_obj;
}

static string StringifyAndFree(yyjson_mut_doc *doc, yyjson_mut_val *object) {
	auto data = yyjson_mut_val_write_opts(object, YYJSON_WRITE_ALLOW_INF_AND_NAN | YYJSON_WRITE_PRETTY, nullptr,
	                                      nullptr, nullptr);
	if (!data) {
		yyjson_mut_doc_free(doc);
		throw InternalException("The plan could not be rendered as JSON, yyjson failed");
	}
	auto result = string(data);
	free(data);
	yyjson_mut_doc_free(doc);
	return result;
}

string QueryProfiler::ToJSON() const {
	lock_guard<std::mutex> guard(lock);
	auto doc = yyjson_mut_doc_new(nullptr);
	auto result_obj = yyjson_mut_obj(doc);
	yyjson_mut_doc_set_root(doc, result_obj);

	if (query_info.query_name.empty() && !root) {
		yyjson_mut_obj_add_str(doc, result_obj, "result", "empty");
		return StringifyAndFree(doc, result_obj);
	}
	if (!root) {
		yyjson_mut_obj_add_str(doc, result_obj, "result", "error");
		return StringifyAndFree(doc, result_obj);
	}

	auto &settings = root->GetProfilingInfo();

	settings.WriteMetricsToJSON(doc, result_obj);

	// recursively print the physical operator tree
	auto children_list = yyjson_mut_arr(doc);
	yyjson_mut_obj_add_val(doc, result_obj, "children", children_list);
	auto child = ToJSONRecursive(doc, *root->GetChild(0));
	yyjson_mut_arr_add_val(children_list, child);
	return StringifyAndFree(doc, result_obj);
}

void QueryProfiler::WriteToFile(const char *path, string &info) const {
	ofstream out(path);
	out << info;
	out.close();
	// throw an IO exception if it fails to write the file
	if (out.fail()) {
		throw IOException(strerror(errno));
	}
}

profiler_settings_t EraseQueryRootSettings(profiler_settings_t settings) {
	profiler_settings_t phase_timing_settings_to_erase;

	for (auto &setting : settings) {
		if (MetricsUtils::IsOptimizerMetric(setting) || MetricsUtils::IsPhaseTimingMetric(setting) ||
		    setting == MetricsType::BLOCKED_THREAD_TIME) {
			phase_timing_settings_to_erase.insert(setting);
		}
	}

	for (auto &setting : phase_timing_settings_to_erase) {
		settings.erase(setting);
	}

	return settings;
}

unique_ptr<ProfilingNode> QueryProfiler::CreateTree(const PhysicalOperator &root_p, const profiler_settings_t &settings,
                                                    const idx_t depth) {
	if (OperatorRequiresProfiling(root_p.type)) {
		query_requires_profiling = true;
	}

	unique_ptr<ProfilingNode> node = make_uniq<ProfilingNode>();
	auto &info = node->GetProfilingInfo();
	info = ProfilingInfo(settings, depth);
	auto child_settings = settings;
	if (depth == 0) {
		child_settings = EraseQueryRootSettings(child_settings);
	}
	node->depth = depth;

	if (depth != 0) {
		info.metrics[MetricsType::OPERATOR_NAME] = root_p.GetName();
		info.AddToMetric<uint8_t>(MetricsType::OPERATOR_TYPE, static_cast<uint8_t>(root_p.type));
	}
	if (info.Enabled(info.settings, MetricsType::EXTRA_INFO)) {
		info.extra_info = root_p.ParamsToString();
	}

	tree_map.insert(make_pair(reference<const PhysicalOperator>(root_p), reference<ProfilingNode>(*node)));
	auto children = root_p.GetChildren();
	for (auto &child : children) {
		auto child_node = CreateTree(child.get(), child_settings, depth + 1);
		node->AddChild(std::move(child_node));
	}
	return node;
}

string QueryProfiler::RenderDisabledMessage(ProfilerPrintFormat format) const {
	switch (format) {
	case ProfilerPrintFormat::NO_OUTPUT:
		return "";
	case ProfilerPrintFormat::QUERY_TREE:
	case ProfilerPrintFormat::QUERY_TREE_OPTIMIZER:
		return "Query profiling is disabled. Use 'PRAGMA enable_profiling;' to enable profiling!";
	case ProfilerPrintFormat::HTML:
		return R"(
				<!DOCTYPE html>
                <html lang="en"><head/><body>
                  Query profiling is disabled. Use 'PRAGMA enable_profiling;' to enable profiling!
                </body></html>
			)";
	case ProfilerPrintFormat::GRAPHVIZ:
		return R"(
				digraph G {
				    node [shape=box, style=rounded, fontname="Courier New", fontsize=10];
				    node_0_0 [label="Query profiling is disabled. Use 'PRAGMA enable_profiling;' to enable profiling!"];
				}
			)";
	case ProfilerPrintFormat::JSON: {
		auto doc = yyjson_mut_doc_new(nullptr);
		auto result_obj = yyjson_mut_obj(doc);
		yyjson_mut_doc_set_root(doc, result_obj);

		yyjson_mut_obj_add_str(doc, result_obj, "result", "disabled");
		return StringifyAndFree(doc, result_obj);
	}
	default:
		throw InternalException("Unknown ProfilerPrintFormat \"%s\"", EnumUtil::ToString(format));
	}
}

void QueryProfiler::Initialize(const PhysicalOperator &root_op) {
	lock_guard<std::mutex> guard(lock);
	if (!IsEnabled() || !running) {
		return;
	}
	query_requires_profiling = false;
	ClientConfig &config = ClientConfig::GetConfig(context);
	root = CreateTree(root_op, config.profiler_settings, 0);
	if (!query_requires_profiling) {
		// query does not require profiling: disable profiling for this query
		this->running = false;
		tree_map.clear();
		root = nullptr;
		phase_timings.clear();
		phase_stack.clear();
	}
}

void QueryProfiler::Render(const ProfilingNode &node, std::ostream &ss) const {
	TextTreeRenderer renderer;
	if (IsDetailedEnabled()) {
		renderer.EnableDetailed();
	} else {
		renderer.EnableStandard();
	}
	renderer.Render(node, ss);
}

void QueryProfiler::Print() {
	Printer::Print(QueryTreeToString());
}

void QueryProfiler::MoveOptimizerPhasesToRoot() {
	auto &root_info = root->GetProfilingInfo();
	auto &root_metrics = root_info.metrics;

	for (auto &entry : phase_timings) {
		auto &phase = entry.first;
		auto &timing = entry.second;
		if (root_info.Enabled(root_info.expanded_settings, phase)) {
			root_metrics[phase] = Value::CreateValue(timing);
		}
	}
}

void QueryProfiler::Propagate(QueryProfiler &) {
}

} // namespace duckdb
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值