一、概述
通常做低码化时会遇到通过填写表单内容转成sql或者sql转成表单的情况,那如何从sql中解析select内容,where内容等呢
二、Flink sql解析
通过如下代码示例,debug看sqlNode会豁然开朗
package org.example;
import org.apache.calcite.avatica.util.Casing;
import org.apache.calcite.sql.*;
import org.apache.calcite.sql.parser.SqlParseException;
import org.apache.calcite.sql.parser.SqlParser;
import org.apache.calcite.sql.util.SqlVisitor;
import org.apache.flink.sql.parser.impl.FlinkSqlParserImpl;
import org.apache.flink.sql.parser.validate.FlinkSqlConformance;
import static org.apache.calcite.avatica.util.Quoting.BACK_TICK;
public class FlinkSqlParserExample {
public static void main(String[] args) throws SqlParseException {
String sql = "SELECT a.id, b.name FROM tableA a JOIN tableB b ON a.id = b.aId";
String joinSql =
" select " +
" jiaoyije, \n" +
" jiaoyils, \n" +
" jiaoyima, \n" +
" yeziduan, \n" +
" jiaoyirq, \n" +
" SHIJCHUO,\n" +
" BALANCEVALUE\n" +
" from source left join dim FOR SYSTEM_TIME AS OF source.procTime" +
" on cast(DATE_FORMAT(TO_TIMESTAMP_LTZ((UNIX_TIMESTAMP(jiaoyirq,'yyyyMMdd') - 86400),0),'yyyyMMdd') as INT)=dim.ACTUALDAY";
sql = joinSql;
String cumulateSql = "SELECT\n" +
" date_format(window_start, 'yyyy-MM-dd') cal_day,\n" +
" behavior,\n" +
" count(user_id) pv,\n" +
" count(distinct user_id) uv\n" +
"FROM TABLE(\n" +
" CUMULATE(TABLE user_log, DESCRIPTOR(proc_time), INTERVAL '10' SECOND, INTERVAL '1' DAY)\n" +
") GROUP BY window_start, window_end, behavior\n";
sql = cumulateSql;
// 解析 SQL
// SqlParser parser = SqlParser.create(sql, SqlParser.Config.DEFAULT.withConformance(FlinkSqlConformance.DEFAULT));
SqlParser parser = SqlParser.create(sql, SqlParser.configBuilder()
.setParserFactory(FlinkSqlParserImpl.FACTORY)
.setQuoting(BACK_TICK)
.setUnquotedCasing(Casing.TO_LOWER) // 字段名统一转化为小写
.setQuotedCasing(Casing.UNCHANGED)
.setConformance(FlinkSqlConformance.DEFAULT)
.build());
SqlNode sqlNode = parser.parseStmt();
sqlNode.accept(new SqlVisitor<Void>() {
@Override
public Void visit(SqlLiteral sqlLiteral) {
return null;
}
@Override
public Void visit(SqlCall sqlCall) {
return null;
}
@Override
public Void visit(SqlNodeList sqlNodeList) {
return null;
}
@Override
public Void visit(SqlIdentifier sqlIdentifier) {
return null;
}
@Override
public Void visit(SqlDataTypeSpec sqlDataTypeSpec) {
return null;
}
@Override
public Void visit(SqlDynamicParam sqlDynamicParam) {
return null;
}
@Override
public Void visit(SqlIntervalQualifier sqlIntervalQualifier) {
return null;
}
});
System.out.println(sqlNode);
}
}
maven依赖
<properties>
<maven.compiler.source>17</maven.compiler.source>
<maven.compiler.target>17</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<scala.binary.version>2.12</scala.binary.version>
</properties>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_${scala.binary.version}</artifactId>
<version>1.15.0</version>
</dependency>
1.18.1版本依赖为
<!-- Flink Table Planner -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_${scala.binary.version}</artifactId>
<version>1.18.1</version>
</dependency>
三、DB sql解析
通过如下代码示例,debug看sqlNode会豁然开朗
package org.example;
import com.alibaba.druid.DbType;
import com.alibaba.druid.sql.SQLUtils;
import com.alibaba.druid.sql.ast.SQLStatement;
public class DruidOracleSqlParserExample {
public static void main(String[] args) {
String joinSql =
" select " +
" jiaoyije, \n" +
" jiaoyils, \n" +
" jiaoyima, \n" +
" yeziduan, \n" +
" jiaoyirq, \n" +
" SHIJCHUO,\n" +
" BALANCEVALUE\n" +
" from source left join dim " +
" on cast(DATE_FORMAT(TO_TIMESTAMP_LTZ((UNIX_TIMESTAMP(jiaoyirq,'yyyyMMdd') - 86400),0),'yyyyMMdd') as INT)=dim.ACTUALDAY";
String sql = joinSql;
SQLStatement sqlStatement = SQLUtils.parseSingleStatement(sql, DbType.oracle);
System.out.println(sqlStatement);
}
}
maven 依赖
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>druid</artifactId>
<version>1.2.16</version>
</dependency>
四、附录
Flink SQL的解析过程涉及到多个阶段,以下是根据搜索结果提供的源码逻辑和解析流程的概述:
-
SQL语句解析成语法树阶段(SQL -> SqlNode):
- Flink使用的是一款开源SQL解析工具Apache Calcite。Calcite使用Java CC对SQL语句进行解析,转换为Java/Scala语言能够执行的逻辑。
-
SqlNode验证(SqlNode -> Operation):
- 基于生成的SQLNode并结合Flink Catalog完成校验生成一颗Operation树。
-
语义分析(Operation -> RelNode):
- Blink planner将Operation树转为RelNode树然后进行优化。
-
优化阶段(RelNode -> optimize -> Transformation):
- 经过一系列的优化规则,生成最终的物理执行计划。
-
生成ExecutionPlan并执行:
- 最终执行计划被转换成StreamGraph或JobGraph,并在Flink集群上执行。
具体到源码层面,以下是一些关键的步骤和类:
-
SqlToRelConverter:
SqlToRelConverter
类是将SqlNode转换为RelNode的关键类。它负责将SQL语句转换成RelNode树,这是后续优化和执行的基础。
-
SqlToOperationConverter:
SqlToOperationConverter
类负责将解析后的SqlNode转换为Operation表示,这一步是将SQL语句转换为Flink可以理解的逻辑计划。
-
FlinkPlannerImpl:
FlinkPlannerImpl
类是Flink中用于规划SQL查询的核心类,它负责将SQL语句转换为Flink的物理执行计划。
-
FactoryUtil:
FactoryUtil
类用于动态加载source(DynamicTableSourceFactory
)和sink(DynamicTableSinkFactory
)工厂,这是将SQL中的connector映射到具体实现的关键步骤。
-
CatalogSourceTable:
- 在
CatalogSourceTable
中,包含了将connector映射到具体实现的逻辑,例如将connector = kafka
映射到FlinkKafkaConsumer
。
- 在
这些步骤和类共同构成了Flink SQL解析和执行的完整流程。从SQL语句的解析开始,经过验证、语义分析、优化,最终生成执行计划并执行。每个阶段都有其对应的源码实现,这些实现细节可以在Flink的源码库中找到。