Hudi基础 -- Spark SQL DDL

原创已于 2023-02-09 23:30:47 修改 · 900 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#Hudi #数据湖

于 2023-02-09 23:29:34 首次发布

Hudi 专栏收录该内容

2 篇文章

订阅专栏

本文介绍了如何在Spark中使用Hudi创建不同类型的表，包括cow类型的无主键表和有主键表，以及mor类型并设置主键更新的表。此外，还展示了如何创建分区表以及通过CTAS创建非分区和分区主键表。

5.CTAS(Create table as select) 创建表

5.1 CTAS创建非分区表

5.2 CTAS创建分区、主键表

Spark Create Table

关键参数：

参数名称	描述	可选项:默认值
primaryKey	表的主键名称，组合主键使用逗号分隔；	(Optional) : id
type	表类型：cow’ 或 ‘mor’，默认是cow;	(Optional) : cow
preCombineField	当数据的主键相同时，会根据这个字段判断是否要更新此主键的数据。不指定默认保留最新 ;	(Optional) : ts

1.创建一个cow类型的表

-- create a non-primary key table
create table if not exists hudi_table2(
  id int,
  name string,
  price double
) using hudi
options (
  type = 'cow'
);

2.创建一个cow类型，主键为ID的表

-- create a managed cow table
create table if not exists hudi_table0 (
  id int,
  name string,
  price double
) using hudi
options (
  type = 'cow',
  primaryKey = 'id'
);

3.创建mor类型主键更新的表

create table if not exists hudi_table1 (
  id int,
  name string,
  price double,
  ts bigint
) using hudi
options (
  type = 'mor',
  primaryKey = 'id,name',
  preCombineField = 'ts'
);

4. 创建分区表

create table if not exists hudi_table_p0 (
id bigint,
name string,
dt string,
hh string  
) using hudi
options (
  type = 'cow',
  primaryKey = 'id',
  preCombineField = 'ts'
)
partitioned by (dt, hh);

5.CTAS(Create table as select) 创建表

Hudi支持 CTAS(Create table as select)的方式创建表

5.1 CTAS创建非分区表

create table h3 using hudi
as
select 1 as id, 'a1' as name, 10 as price;

5.2 CTAS创建分区、主键表

create table h2 using hudi
options (type = 'cow', primaryKey = 'id')
partitioned by (dt)
as
select 1 as id, 'a1' as name, 10 as price, 1000 as dt;

更多的 ddl 参考 SQL DDL | Apache Hudi