spark sql读取hive底层_原创-spark sql 写入hive较慢优化思路

最新推荐文章于 2024-06-16 15:41:49 发布

原创

最新推荐文章于 2024-06-16 15:41:49 发布

· 1.7k 阅读

·

0

·

版权

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

文章标签：

#spark sql读取hive底层

本文分析了Spark SQL写入Hive分区表速度慢的问题，并提出了三种优化方案：直接生成分区文件、使用Hive脚本加载数据、调整Spark配置以匹配Hive版本。作者分享了其在Oracle GG到Kafka、HBase再到Hive的实时数据同步过程中的实践经验，提供了从HBase读取数据并保存到HDFS，然后通过Hive脚本加载到Hive表的示例代码。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在《spark sql 写入hive较慢原因分析》中已经分析了spark sql 写入hive分区文件慢的原因，笔者提供几种优化思路供参考：

(1)spark 直接生成hive库表底层分区文件，然后再使用add partion语句添加分区信息

spark.sql(s"alter table legend.test_log_hive_text add partition (name_par='${dirName}')")

(2)spark 生成文件存放到HDFS目录下，使用hive脚本命令,load数据到hive中

hive -e "load data inpath '/test/test_log_hive/name_par=test$i' overwrite into table legend.test_log_hive_text partition(name_par='test$i') "

(3)修改spark配置文件，指定hive metastore版本及jar所在位置，查看spark源码可看到spark支持的hive版本在0.12.0-2.3.3版本之间,修改参数spark.sql.hive.metastore.version及spark.sql.hive.metastore.jars参数

private[spark] object HiveUtils extends Logging {

def withHiveExternalCatalog(sc: SparkContext): SparkContext = {

sc.conf.set(CATALOG_IMPLEMENTATION.key, "hive")

sc

}

/** The version of hive used internally by Spark SQL. */

val builtinHiveVersion: String = "1.2.1"

val HIVE_METASTORE_VERSION = buildConf("spark.sql.hive.metastore.version")

.doc("Version of the Hive metastore. Available options are " +

s"0.12.0 through 2.3.3.")

.stringConf

.createWithDefault(builtinHiveVersion)

// A fake config which is only here for backward compatibility reasons. This config has no effect

// to Spark, just for reporting the builtin Hive version of Spark to existing applications that

// already rely on this config.

val FAKE_HIVE_VERSION = buildConf("spark.sql.hive.version")

.doc(s"deprecated, please use ${HIVE_METASTORE_VERSION.key} to get the Hive version in Spark.")

.stringConf

.createWithDefault(builtinHiveVersion)

val HIVE_METASTORE_JARS = buildConf("spark.sql.hive.metastore.jars")

.doc(s"""

| Location of the jars that should be used to instantiate the HiveMetastoreClient.

| This property can be one of three options: "

| 1. "builtin"

| Use Hive ${builtinHiveVersion}, which is bundled with the Spark assembly when

| -Phive is enabled. When this option is chosen,

| spark.sql.hive.metasto

最低0.47元/天解锁文章

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。