因为之前测试spark 2.4.0以上版本无法通过native jdbc接口写入clickhouse(之前的文章),尝试了下官方的jdbc接口。
背景
- clickhouse两个分片,无副本
- 读取hive分区,不同分区数据交替写入两个分片
实现
import java.util.Random
import org.apache.spark.SparkConf
import org.apache.spark.sql.types.{
DoubleType, LongType, StringType}
import org.apache.spark.sql.{
SaveMode, SparkSession}
import ru.yandex.clickhouse.ClickHouseDataSource
object OfficialJDBCDriver {
val chDriver = "ru.yandex.clickhouse.ClickHouseDriver"
val chUrls = Array(
"jdbc:clickhouse://1.1.1.1:8123/default",
"jdbc:clickhouse://2.2.2.2:8123/default")
def main(args: Array[String]): Unit = {
if (args.length < 3) {
System.err.println("Usage: OfficialJDBCDriver <tableName> <partitions> <batchSize>\n" +
" <tableName> is the hive table name \n" +
" <partitions> is the partitions which want insert into clickhouse, like 20200516,20200517\n" +
" <batchSize> is JDBC batch size, may be 1000\n\n"

最低0.47元/天 解锁文章
786





