StarRocks 中 CURRENT_TIMESTAMP 和 CURRENT_TIME 分区过滤问题

背景

本文基于Starrocks 3.3.5
最近在进行Starrocks 跑数据的时候,发现了一个SQL 扫描了所有分区的数据,简化后的SQL如下:

select date_created from tableA where date_created=date_format(current_time(), '%Y-%m-%d %H:%i:%S') limit 20

其中建表语句如下:

CREATE TABLE `tableA` (
  ...
  `date_created` datetime NOT NULL DEFAULT "1970-01-01 00:00:00" COMMENT "",
  ...
) ENGINE=OLAP
PRIMARY KEY(id,date_created)
PARTITION BY date_trunc("day",date_created)
DISTRIBUTED BY HASH(id) BUCKETS 50
PROPERTIES( 
  "compression" = "ZSTD"
  );

但是如果用CURRENT_TIMESTAMP 替换的话,就能实现分区下推,具体的SQL如下:

select date_created from tableA where date_created=CURRENT_TIMESTAMP() limit 20

结论

current_time() 函数不支持常量折叠,也就是不支持在计划解析和优化阶段来计算结果。而CURRENT_TIMESTAMP 在计划优化阶段就可以计算出结果。
具体的explain对应的SQL如下:
在这里插入图片描述
在这里插入图片描述

可以看到 用 current_time 函数的 扫描了全部的分区
CURRENT_TIMESTAMP函数的 只选择了一个分区的数据

分析

先执行两个命令从感官上来感受一下:

TRACE LOGS OPTIMIZER  SELECT CURRENT_TIMESTAMP()\G;
TRACE LOGS OPTIMIZER  SELECT (date_format(current_time(), '%Y-%m-%d %H:%i:%S'))\G;

TRACE LOGS OPTIMIZER SELECT CURRENT_TIMESTAMP()\G;`的结果如下:

...
*************************** 34. row ***************************
Explain String: 0ms|    [MV TRACE] [PREPARE GLOBAL] There are no valid related mvs for the query plan
*************************** 35. row ***************************
Explain String: 0ms|    [MV TRACE] [PREPARE GLOBAL] MV rewrite strategy: MvRewriteStrategy{enableMaterializedViewRewrite=false, enableForceRBORewrite=false, enableViewBasedRewrite=false, enableSingleTableRewrite=false, enableMultiTableRewrite=false, mvStrategy=DEFAULT}
*************************** 36. row ***************************
Explain String: 0ms|    origin logicOperatorTree:
*************************** 37. row ***************************
Explain String: LogicalProjectOperator {projection=[2025-03-28 10:35:00]}
*************************** 38. row ***************************
Explain String: ->  LOGICAL_VALUES
*************************** 39. row ***************************
Explain String: 0ms|    [TRACE QUERY 3efdbff0-0b7d-11f0-8f6c-00163e164034] APPLY RULE TF_PRUNE_PROJECT_COLUMNS 58
*************************** 40. row ***************************
Explain String: Original Expression:
*************************** 41. row ***************************
Explain String: LogicalProjectOperator {projection=[2025-03-28 10:35:00]}
*************************** 42. row ***************************
Explain String: ->  LOGICAL_VALUES
*************************** 43. row ***************************
Explain String: 0ms|
*************************** 44. row ***************************
Explain String: New Expression:
*************************** 45. row ***************************
Explain String: 0:LogicalProjectOperator {projection=[2025-03-28 10:35:00]}
...

TRACE LOGS OPTIMIZER SELECT (date_format(current_time(), '%Y-%m-%d %H:%i:%S'))\G的结果如下:

...
*************************** 34. row ***************************
Explain String: 0ms|    [MV TRACE] [PREPARE GLOBAL] MV rewrite strategy: MvRewriteStrategy{enableMaterializedViewRewrite=false, enableForceRBORewrite=false, enableViewBasedRewrite=false, enableSingleTableRewrite=false, enableMultiTableRewrite=false, mvStrategy=DEFAULT}
*************************** 35. row ***************************
Explain String: 0ms|    origin logicOperatorTree:
*************************** 36. row ***************************
Explain String: LogicalProjectOperator {projection=[date_format(cast(current_time() as datetime), %Y-%m-%d %H:%i:%S)]}
*************************** 37. row ***************************
Explain String: ->  LOGICAL_VALUES
*************************** 38. row ***************************
Explain String: 0ms|    [TRACE QUERY 7af2e9bb-0b7e-11f0-8f6c-00163e164034] APPLY RULE TF_PRUNE_PROJECT_COLUMNS 58
*************************** 39. row ***************************
Explain String: Original Expression:
*************************** 40. row ***************************
Explain String: LogicalProjectOperator {projection=[date_format(cast(current_time() as datetime), %Y-%m-%d %H:%i:%S)]}
*************************** 41. row ***************************
Explain String: ->  LOGICAL_VALUES
*************************** 42. row ***************************
Explain String: 0ms|
*************************** 43. row ***************************
Explain String: New Expression:
*************************** 44. row ***************************
Explain String: 0:LogicalProjectOperator {projection=[date_format(cast(current_time() as datetime), %Y-%m-%d %H:%i:%S)]}
*************************** 45. row ***************************
Explain String: ->  LOGICAL_VALUES
...

可以看到 CURRENT_TIMESTAMP 在优化算子阶段就已经计算出来了,为LogicalProjectOperator {projection=[2025-03-28 10:35:00]}
(date_format(current_time(), '%Y-%m-%d %H:%i:%S')) 并没有计算出来,为LogicalProjectOperator {projection=[date_format(cast(current_time() as datetime), %Y-%m-%d %H:%i:%S)]}

在这个案例中,主要涉及到的规则主要是:

FoldConstantsRule
PartitionPruneRule

我们分析一下简单的SQL语句的数据流:SELECT CURRENT_TIMESTAMP()

g4文件中querySpecification
    ||
    \/
ConnectProcessor.handleQuery
    ||
    \/
com.starrocks.sql.parser.SqlParser.parse
    ||
    \/
// 同时.g4 文件中 specialDateTimeExpression
// AstBuilder.visitSpecialDateTimeExpression 会构造 new FunctionCallExpr
// 这里最终会构建 SelectRelation(SelectList(FunctionCallExpr),ValuesRelation.newDualRelation)
AstBuilder.visitQuerySpecification 
    ||
    \/
StatementPlanner.plan
    ||
    \/
createQueryPlan
    ||
    \/
new RelationTransformer(transformerContext).transformWithSelectLimit(query) 
    ||
    \/
transform
    ||
    \/
visit(relation);
    ||
    \/
RelationTransformer.visitSelect
    ||
    \/
QueryTransformer.plan
    ||
    \/
SqlToScalarOperatorTranslator.translate => Visitor.visit => visitFunctionCall // 此时的逻辑计划为 SelectRelation(SelectList(CallOperator(CURRENT_TIMESTAMP)),ValuesRelation.newDualRelation)
                                        ||
                                        \/
                                        => scalarRewriter.rewrite(result, ScalarOperatorRewriter.DEFAULT_REWRITE_RULES) // 这里有ImplicitCastRule和FoldConstantsRule
    ||
    \/
projectForOrder  // 此时的的逻辑计划为 LogicalPlan(OptExprBuilder(LogicalProjectOperator(CallOperator(CURRENT_TIMESTAMP)))

现在来重点关注 DEFAULT_REWRITE_RULES 中涉及到的 ImplicitCastRule FoldConstantsRule 规则:
首先是 ImplicitCastRule 规则(这里主要是visitCall方法):

这个规则主做:
1. 如果表达式需要的类型和该表达式对应的子表达式的参数输出的类型如果不一致的话,则会给表达式的子表达式加上一个CastOperator操作
2. 对每一个子表达式都递归一遍1步骤

date_format(current_time(), '%Y-%m-%d %H:%i:%S')就会命中这个规则
再次 是 FoldConstantsRule 规则(这里主要是visitCall/visitCastOperator方法):

这个主要是做:
1. 主要是计算表达式为常量,即把CallOperator变成 ConstantOperator
2. 根据ScalarOperatorFunctions 和 MetaFunctions 函数中标注为 ConstantFunction 的函数,来看是否能够计算为常量
在这里能够找到  CURRENT_TIMESTAMP() 函数,但是找不到 current_time() 函数

CURRENT_TIMESTAMP() 就会命中这个规则

以上的 都在 “Transformer” 阶段完成的。
至于 PartitionPruneRule 则会在“Optimizer” 阶段完成 ,也就是optimizer.optimize方法中, 具体的实现,可以细看 PartitionPruneRule对应的方法,也就是在这个规则里会对涉及到的谓词来过滤出对应的分区,很显然因为CURRENT_TIMESTAMP是常量,所以能够裁剪到对应的分区中去,而date_format(current_time(), '%Y-%m-%d %H:%i:%S')不能计算出来,所以扫描了全表。

其他

trace输出信息的怎么回事

首先在g4文件中

queryStatement
    : (explainDesc | optimizerTrace) ? queryRelation outfile?;

有对应的optimze语句 也就是 TRACE LOGS
这个在解析的时候 AstBuilder.visitQueryStatement 中会调用 queryStatement.setIsTrace 方法:

public void setIsTrace(Tracers.Mode mode, String module) {
        this.isExplain = true;
        this.traceMode = mode;
        this.traceModule = module;
    }

此时 isExplain 设置为了true
之后在 StmtExecutor.execute方法中:

 } else if (parsedStmt.isExplain()) {
            String explainString = buildExplainString(execPlan, ResourceGroupClassifier.QueryType.SELECT,
                    parsedStmt.getExplainLevel());
            if (executeInFe) {
                explainString = "EXECUTE IN FE\n" + explainString;
            }

这里的方法buildExplainString就会组装对应的explain信息:

 if (parsedStmt.getTraceMode() == Tracers.Mode.TIMER) {
                explainString += Tracers.printScopeTimer();
            } else if (parsedStmt.getTraceMode() == Tracers.Mode.VARS) {
                explainString += Tracers.printVars();
            } else if (parsedStmt.getTraceMode() == Tracers.Mode.TIMING) {
                explainString += Tracers.printTiming();
            } else if (parsedStmt.getTraceMode() == Tracers.Mode.LOGS) {
                explainString += Tracers.printLogs();
            } else if (parsedStmt.getTraceMode() == Tracers.Mode.REASON) {
                explainString += Tracers.printReasons();
            } else {
                explainString += execPlan.getExplainString(explainLevel);
            }

所以在执行trace LOGS命令的时候会输出对应信息

订阅规则表 create table td_subscription_rule ( subscribe_id varchar(64) primary key, 订阅id topic varchar(64) not null(任务主题), user_id varchar(64), 租户id chatbot_id varchar(50) NOT NULL,(机器人id) keyword TEXT, -- 保存用户输入的规则表达式,如:人工智能 AND (教育 OR 医疗) cron_expression varchar(50), message_details timestamp, massage消息从表向主表拷贝时间 period varchar(50), 推送周期 moment varchar(50), 推送时刻 knowledge_source integer,(知识源:网页知识web、2本地知识库知识local、3本地知识库+网页知识both) status integer ,(任务配置启用状态:启用中1,未启用0) create_time timestamp default CURRENT_TIMESTAMP, updata_time timestamp default CURRENT_TIMESTAMP ); 消息任务主表 create table td_massage_task ( massage_id varchar(64) primary key, task_id varchar(64),关联td_subscription_rule表的主键 chatbot_id varchar(64),机器人id user_id varchar(64),用户id network_text JSON DEFAULT NULL, 联网检索 knowledge_text JSON DEFAULT NULL, 本地知识召回内容 think_content JSON DEFAULT NULL, 思考过程 content text, 文档检索内容 status integer,(0未读,1已读) create_time timestamp default CURRENT_TIMESTAMP ); 消息任务从表 create table td_massage_task_details ( massage_details_id varchar(64) primary key, task_id varchar(64), chatbot_id varchar(64),机器人id user_id varchar(64),用户id network_text JSON DEFAULT NULL, 联网检索 knowledge_text JSON DEFAULT NULL, 本地知识召回内容 think_content JSON DEFAULT NULL, 思考过程 content text, 文档检索内容 create_time timestamp default CURRENT_TIMESTAMP );进行优化?
09-12
with online_base as ( select a.session_id as xma_session_id , k.uniqueid as unique_id, a.customer_id as dz_customer_id, a.user_id , case when date(a.create_time)<'2025-05-22' then from_unixtime(unix_timestamp(a.create_time) - 3600) else a.create_time end as xma_begin_time , ---会话开始时间 case when date(a.end_time)<'2025-05-22' then from_unixtime(unix_timestamp(a.end_time) - 3600) else a.create_time end as end_time , --会话结束时间 a.total_time , --通话总时长 a.is_im , --是否转人工的意愿 是/否 a.is_closed , --是否关闭 是/否 a.is_artificial_first, --是否人工优先 是/否 a.to_user , --坐席用户号 g.name , --坐席名字 a.im_status , --im会话状态 a.distribute_status , --im分配类型 正式分配客服 接通看这个 a.offline_status, --im会话结束类型 是否自动结束 case when a.distribute_status ='正式分配客服' then a.im_start_time else null end ima_begin_time , --ima开始时间 case when a.distribute_status ='正式分配客服' then a.end_time else null end ima_end_time ,--ima结束时间 a.first_response_time /1000 as first_response_time, --首次响应时间 这是客户发了第一条消息到客服回复时间差值 单位毫秒 a.first_reply_time/1000 as first_reply_time , --人工首次回复时长 接通人工之后,客服发的第一条消息时间 单位毫秒 b.avg_response_time/1000 as avg_reply_dur , a.end_type , --结束类型 手动结束还是超时未回复结束 a.channel_type , a.first brief_sum1 , a.second brief_sum2 , a.third brief_sum3, a.ima_product_name , --来源渠道 k.app_code , a.dt from dm_opr.dmd_opr_lia_online_df a left join ( SELECT name from ods.ods_smartxma_basic_ai_basic_user_global_df where dt=replace (date_sub (current_date(), 1), '-', '') ) g on a.to_user=g.id left join ( select session_id, avg_response_time from ods.ods_smartxma_im_chat_im_chat_kpi_df where dt=replace (date_sub (current_date(), 1), '-', '') ) b on a.session_id=b.session_id left join report_csc_ana.xma_testuu x on x.customer_id=a.customer_id left join (select * from ods.ods_smartxma_basic_ai_basic_customer_identity_df where dt=replace (date_sub (current_date(), 1), '-', '') -- and app_code='pulsar' ) k on k.customer_id=a.customer_id left join (selECT distinct SUBSTRING(CAST (phone AS STRING), GREATEST(LENGTH(CAST(phone AS STRING)) - 9, 1), 10) as phone from report_csc_ana.testphone)pp on k.phone=pp.phone where a.dt=replace (date_sub (current_date(), 1), '-', '') and x.customer_id is null --剔除测试uuid and pp.phone is null ), t_xma as ( SELECT DISTINCT xma_begin_time, xma_session_id, unique_id, dz_customer_id, ima_product_name as xma_prod_name , app_code ,distribute_status from online_base where dt=replace(date_sub (current_date(), 1), '-', '') and to_date (xma_begin_time) >= '2025-05-08' ), t_xma_flow as ( --distribute_status 为空 表示排队没成功(crm系统显示的未转人工) distribute_status为非正常分配表示排队成功但是分配失败 select xma_session_id, 'xma未分流' as is_flow from online_base where dt=replace(date_sub (current_date(), 1), '-', '') and is_im='是' ), t_next_call as ( select *, lead (xma_begin_time) over ( partition by case when unique_id is null then dz_customer_id else unique_id end order by xma_begin_time ) as next_xma_time from ( select * from t_xma ) a ), t_ima as ( SELECT DISTINCT a.xma_session_id, a.xma_session_id as ima_session_id, a.name as to_cust_svc, a.ima_begin_time, a.ima_end_time, a.first_reply_time as 1st_resp_dur, avg_reply_dur, ---逻辑检查 first_value (a.brief_sum1) over ( PARTITION BY a.xma_session_id order by ima_begin_time ) as brief_sum1, first_value (a.brief_sum2) over ( PARTITION BY a.xma_session_id order by ima_begin_time ) as brief_sum2, first_value (a.brief_sum3) over ( PARTITION BY a.xma_session_id order by ima_begin_time ) as brief_sum3 from online_base a where a.dt=replace(date_sub (current_date(), 1), '-', '') and to_date (a.xma_begin_time) >= '2025-05-08' and a.distribute_status ='正式分配客服' and a.ima_begin_time is not null ), t_workorder as ( --提单判定 先模糊匹配 selECT distinct order_id as work_order_number, customer_id as uniqueid, session_id, --关联在线会话号 call_record_id, ---关联callid 这个目前关联不到有问题 rela_order_id , --关联工单号 order_desc ---描述 from dm_opr.dmd_opr_bso_wo_info_df where dt=replace (date_sub (current_date(), 1), '-', '') and order_status_desc !='草稿' ), t_tmp as ( select DISTINCT to_date (t_xma.xma_begin_time) as dt, t_xma.xma_begin_time, t_xma.xma_session_id, t_xma.xma_prod_name, t_xma.unique_id, t_xma.dz_customer_id, t_xma.app_code , t_xma.distribute_status if (t_xma_flow.is_flow is null,'xma分流',t_xma_flow.is_flow) is_flow, t_ima.ima_session_id, t_ima.ima_begin_time, t_ima.ima_end_time, t_ima.to_cust_svc, t_ima.1st_resp_dur, t_ima.avg_reply_dur, t_ima.brief_sum1, t_ima.brief_sum2, t_ima.brief_sum3, t_next_call.next_xma_time, t_workorder.work_order_number, first_value (a.begin_call_dt) over ( PARTITION BY t_xma.xma_session_id order by a.begin_call_dt ) as next_ivr_call from t_xma left join t_xma_flow on t_xma.xma_session_id = t_xma_flow.xma_session_id left join t_next_call on t_xma.xma_session_id = t_next_call.xma_session_id left join (selECT disTINCT case when uniqueid is null then participant_role_id else uniqueid end customer_id , case when date(a.room_start_time)<'2025-05-22' then from_unixtime(unix_timestamp(a.room_start_time) - 3600) else a.room_start_time end as begin_call_dt ,k.app_code from dm_opr.dmd_opr_lia_call_info_df a left join (select * from ods.ods_smartxma_basic_ai_basic_customer_identity_df where dt=replace (date_sub (current_date(), 1), '-', '') ) k on k.customer_id=a.participant_role_id where a.dt =replace (date_sub (current_date(), 1), '-', '') and a.participant_role='CUSTOMER' and a.call_type='呼入' ) a on (case when t_xma.unique_id is null then t_xma.dz_customer_id else t_xma.unique_id end) = a.customer_id and unix_timestamp (t_xma.xma_begin_time) < unix_timestamp (a.begin_call_dt) and t_xma.app_code=a.app_code left join t_ima on t_ima.xma_session_id = t_xma.xma_session_id left join t_workorder on t_xma.xma_session_id= t_workorder.session_id ) insert OVERWRITE dm_opr.online_base_df SELECT *, case when ( unix_timestamp (next_xma_time) - unix_timestamp (xma_begin_time) BETWEEN 0 and 3600 * 24 * 2 ) then '48h重复进线' else '48h未重复进线' end is_repeat_48h, case when ( unix_timestamp (next_xma_time) - unix_timestamp (xma_begin_time) BETWEEN 0 and 3600 * 24 * 3 ) then '72h重复进线' else '72h未重复进线' end is_repeat_72h, case when ( unix_timestamp (next_xma_time) - unix_timestamp (xma_begin_time) BETWEEN 0 and 3600 * 24 * 4 ) then '96h重复进线' else '96h未重复进线' end is_repeat_96h, case when ( unix_timestamp (next_xma_time) - unix_timestamp (xma_begin_time) BETWEEN 0 and 3600 * 24 * 7 ) then '7D重复进线' else '7D未重复进线' end is_repeat_7d, case when ( unix_timestamp (next_xma_time) - unix_timestamp (xma_begin_time) BETWEEN 0 and 3600 * 24 * 2 ) or ( unix_timestamp (next_ivr_call) - unix_timestamp (xma_begin_time) BETWEEN 0 and 3600 * 24 * 2 ) or work_order_number is not null then '48h未解决' else '48h解决' end is_solve_48h, case when ( unix_timestamp (next_xma_time) - unix_timestamp (xma_begin_time) BETWEEN 0 and 3600 * 24 * 3 ) or ( unix_timestamp (next_ivr_call) - unix_timestamp (xma_begin_time) BETWEEN 0 and 3600 * 24 * 3 ) or work_order_number is not null then '72h未解决' else '72h解决' end is_solve_72h, case when ( unix_timestamp (next_xma_time) - unix_timestamp (xma_begin_time) BETWEEN 0 and 3600 * 24 * 4 ) or ( unix_timestamp (next_ivr_call) - unix_timestamp (xma_begin_time) BETWEEN 0 and 3600 * 24 * 4 ) or work_order_number is not null then '96h未解决' else '96h解决' end is_solve_96h, case when ( unix_timestamp (next_xma_time) - unix_timestamp (xma_begin_time) BETWEEN 0 and 3600 * 24 * 7 ) or ( unix_timestamp (next_ivr_call) - unix_timestamp (xma_begin_time) BETWEEN 0 and 3600 * 24 * 7 ) or work_order_number is not null then '7D未解决' else '7D解决' end is_solve_7d from t_tmp; 这个sql,有什么问题
07-10
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值