CREATE TABLE db_name.mysql_tb_mapping
USING org.apache.spark.sql.jdbc
OPTIONS (
url "jdbc:mysql://xxx.xxx.xxx.xxx:3306/db_name?zeroDateTimeBehavior=convertToNull",
dbtable "mysql_db_name.mysql_tb_name",
driver "com.mysql.jdbc.Driver",
user "xxxxxxx",
password "xxxxxx"
)
spark在2.3以后可以使用 using语法创建多数据源的映射表,这样在spark-sql 客户端可以直接查询异构数据源,实现跨数据源的联邦查询。
需要注意的是如果spark-sql 配置hive数据源,这个表会写入hiveMetadata中。但是表的映射信息是写在TBLPROPERTIES属性里,所以建议使用spark-sql客户端查询具体的列信息。
spark-sql 启动时需要配置 jdbc的jar包,直接通过 --jars 配置即可。本方法也适用于其他jdbc数据源
建表语法规则如下
CREATE TABLE [ IF NOT EXISTS ] table_identifier
[ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ]
USING data_source
[ OPTIONS ( key1 [ = ] val1, key2 [ = ] val2, ... ) ]
[ PARTITIONED BY ( col_name1, col_name2, ... ) ]
[ CLUSTERED BY ( col_name3, col_name4, ... )
[ SORTED BY ( col_name [ ASC | DESC ], ... ) ]
INTO num_buckets BUCKETS ]
[ LOCATION path ]
[ COMMENT table_comment ]
[ TBLPROPERTIES ( key1 [ = ] val1, key2 [ = ] val2, ... ) ]
[ AS select_statement ]