【calcite】calcite实现SQL列级数据血缘 data lineage 查询

本文链接：https://blog.youkuaiyun.com/lisacumt/article/details/138530814

一、背景

大数据数据血缘，内部实现十分复杂一般需要依赖框架。calcite作为apache顶级项目，且为java体系成员，被多个项目所使用，如flink，spark，kafka等。calcite 对mysql，oracle，postgres和其他大数据平台支持较好，对sqlserver支持较差，没有看到sqlserver相关的代码。
另，python系推荐使用sqlglot，datahub采用。

calcite官方文档

二、实现方式

gradle添加依赖：

dependencies {
   
   
    testImplementation('org.apache.calcite:calcite-core:1.32.0')
}

以下均有scala语言实现，并使用Mysql5.7测试完成：

drop table if exists test.st01;
CREATE TABLE test.st01(
s_id BIGINT comment '主键',
s_name VARCHAR(20)  comment '姓名',
s_age INT comment '年龄',
s_sex VARCHAR(10) comment '性别',
s_part  VARCHAR(10) comment '分区字段',
ts TIMESTAMP comment '创建时间'
);
insert into test.hive_st01 values(1,'zhangsan',10,'male','student','2020-01-01 18:01:01.666');
insert into test.hive_st01 values(2,'lisi',66,'female','teacher','2020-01-01 10:01:01.666');
insert into test.hive_st01 values(3,'sunlirong',50,'male','student','2020-01-01 10:01:01.666');
insert into test.hive_st01 values(4,'laoliu',38,'female','teacher','2020-01-01 10:01:01.666');

create table test.st02 like test.st01;
insert into test.hive_st02 values(2,'wangwu',66,'male','teacher','2020-01-01 10:01:01.666');
insert into test.hive_st02 values(3,'zhaoliu',66,'female','student','2020-01-01 10:01:01.666');

create table test.st03 like test.st01;

先是设置好两个sql语句：

  /**
   * 简单测试
   */
  val MYSQL_SQL1 =
    """
      |select * from `st01` where 1=1
      |""".stripMargin

  /**
   * 测试内容：1、insert into 2、mysql非标准sql函数CONCAT 3、join 4、where
   */
  val MYSQL_SQL2 =
    """
      |insert into `test`.`st03`
      |select s_id,combined_name s_name,s_age,s_sex,s_part,ts from (
      |select
      |a.s_id as s_id
      |,CONCAT(a.s_name,'-',b.s_name) as combined_name
      |,a.s_age+b.s_age as s_age
      |,a.s_sex as s_sex
      |,'none' as s_part
      |,current_timestamp as ts
      |from `test`.`st01` a inner join `test`.`st02` b on a.s_id=b.s_id

【calcite】calcite实现SQL列级数据血缘 data lineage 查询

一、背景

二、 实现方式

二、实现方式