idea 中运行spark 保存到hive中,由于没有配置hive信息,spark运行会默认运行内部的hive...

本文详细记录了解决Spark在连接Hive时遇到的异常问题,主要原因是未正确配置Hive信息导致。通过调整配置,确保Hive目录权限正确,最终成功解决了运行时出现的IllegalArgumentException错误。

原因:idea 中运行spark 保存到hive中,由于没有配置hive信息,spark运行会默认运行内部的hive 

参考地址:(我尝试了,有效解决了)

https://blog.youkuaiyun.com/zgjdzwhy/article/details/71056801

http://mangocool.com/1473838702533.html

运行结果:Exception in thread "main" java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
    at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
    at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
    at MockDataGenerate$.main(MockDataGenerate.scala:167)
    at MockDataGenerate.main(MockDataGenerate.scala)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
    ... 12 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
    at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169)
    at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
    at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
    at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
    at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
    ... 17 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166)
    ... 25 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
    at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358)
    at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262)
    at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:66)
    ... 30 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
    at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
    ... 38 more
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
    at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
    at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
    ... 39 more

转载于:https://my.oschina.net/u/3962987/blog/3086514

在IntelliJ IDEA中使用Spark从MySQL读取数据并写入到Hive和Hudi,可按以下步骤实现: ### 环境准备 确保已经安装好IntelliJ IDEASpark、MySQL、Hive和Hudi,并且配置好相应的环境变量。同时,在项目的`pom.xml`文件中添加必要的依赖: ```xml <dependencies> <!-- Spark SQL --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.12</artifactId> <version>3.3.2</version> </dependency> <!-- MySQL Connector --> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.26</version> </dependency> <!-- Hudi --> <dependency> <groupId>org.apache.hudi</groupId> <artifactId>hudi-spark3-bundle_2.12</artifactId> <version>0.12.1</version> </dependency> </dependencies> ``` ### 代码实现 ```scala import org.apache.spark.sql.SparkSession import org.apache.hudi.DataSourceWriteOptions import org.apache.hudi.config.HoodieWriteConfig object SparkMySQLToHiveHudi { def main(args: Array[String]): Unit = { // 创建SparkSession对象 val spark = SparkSession.builder() .appName("SparkMySQLToHiveHudi") .master("local[*]") .enableHiveSupport() .getOrCreate() // 从MySQL读取数据 val mysqlDF = spark.read .format("jdbc") .option("url", "jdbc:mysql://localhost:3306/your_database") .option("driver", "com.mysql.cj.jdbc.Driver") .option("dbtable", "your_table") .option("user", "your_username") .option("password", "your_password") .load() // 将数据写入Hive mysqlDF.write .mode("overwrite") .saveAsTable("hive_database.hive_table") // 将数据写入Hudi val hudiOptions = Map( HoodieWriteConfig.TABLE_NAME -> "hudi_table", DataSourceWriteOptions.OPERATION_OPT_KEY -> "upsert", DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "id", DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "partition_col", DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "timestamp" ) mysqlDF.write .format("org.apache.hudi") .options(hudiOptions) .mode("overwrite") .save("hdfs://localhost:9000/path/to/hudi_table") // 停止SparkSession spark.stop() } } ``` ### 代码解释 1. **创建SparkSession**:使用`SparkSession.builder()`创建一个SparkSession对象,并启用Hive支持。 2. **从MySQL读取数据**:使用`spark.read.format("jdbc")`从MySQL读取数据,需要指定MySQL的URL、驱动、表名、用户名和密码。 3. **将数据写入Hive**:使用`mysqlDF.write.mode("overwrite").saveAsTable("hive_database.hive_table")`将数据写入Hive表。 4. **将数据写入Hudi**:使用`mysqlDF.write.format("org.apache.hudi")`将数据写入Hudi表,需要指定Hudi的相关配置选项。 5. **停止SparkSession**:使用`spark.stop()`停止SparkSession。 ### 注意事项 - 请根据实际情况修改MySQL的URL、用户名、密码,以及Hive和Hudi的表名和路径。 - 确保Hive和Hudi的配置正确,并且HDFS服务正常运行
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值