Hive<找不到Spark-assemblyJar包>

本文介绍了解决Hive启动时报错找不到spark-assembly相关Jar包的问题。由于Spark2之后,大JAR包被拆分成多个小JAR包,原有的spark-assembly-*.jar不再存在。文章提供了一个简单的修改方案,通过调整hive脚本来解决问题。

版本:

apache-hive-1.2.1-bin.tar.gz
spark-2.1.1-bin-hadoop2.7.tgz

1.问题陈述

启动Hive的时候会说找不到spark-assembly相关的Jar包

cannot access /usr/local/spark/lib/spark-assembly-*.jar: No such file or directory

2.原因

spark2以后,原有lib目录下的大JAR包被分散成多个小JAR包,原来的spark-assembly-*.jar已经不存在,所以hive没有办法找到这个JAR包。

3.解决办法

进入hive安装路径下的bin目录下,编辑hive
我的路径是:/usr/local/apache-hive-1.2.1-bin/bin
找到下面这行shell脚本

sparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar`  

将其修改为:

  sparkAssemblyPath=`ls ${SPARK_HOME}/jars/*.jar`
package ads import common.PortraitCommon.{ck_dim, ckdriver, ckpassword, ckurl, ckuser, dim_user_info_tmp} import org.apache.spark.sql.{DataFrame, Dataset, Row, SparkSession} import org.apache.spark.sql.functions.col import utils.{ClickhouseUtils, SparkUtils} object ads_car_brand_cut { def main(args: Array[String]): Unit = { // TODO: 构建spark环境 val spark: SparkSession = SparkUtils.getSpark("ads_car_brand_cut") // TODO: 取数据 val dwdreadData = spark.read.format("jdbc") .option("url", ckurl) .option("user", ckuser) .option("password", ckpassword) .option("dbtable", "tieta_v2_dim.dim_user_info") .option("driver", ckdriver) .load() // val mysqlurl = "jdbc:mysql://192.168.3.117:3309/lol" // val mysqlusername = "root" // val mysqlpassword = "123456" // TODO: 过滤品牌 val filterDF: Dataset[Row] = dwdreadData.filter(col("car_brand_model") =!= "未知") // TODO: 对车辆型号进行统计 val resDF: DataFrame = filterDF.groupBy("car_brand_model").count() .withColumnRenamed("count", "brand_count") resDF.show() // // TODO: 将数据写入 ClickHouse // ClickhouseUtils.writeClickHouse(resDF, "tieta_v2_ads", "ads_car_brand_cut") // println("数据成功写入 ClickHouse") // resDF.write // .format("jdbc") // .option("url", mysqlurl) // .option("dbtable", "ads_car_brand_cut") // .option("user", mysqlusername) // .option("password", mysqlpassword) // .option("driver", "com.mysql.cj.jdbc.Driver") // .mode("append") // .save() // // println("Data successfully written to mysql") } }我想在虚拟机的spark运行这个文件,我的整个项目打jar在、srv/untitled3-1.0-SNAPSHOT.jar <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.example</groupId> <artifactId>untitled3</artifactId> <version>1.0-SNAPSHOT</version> <properties> <maven.compiler.source>8</maven.compiler.source> <maven.compiler.target>8</maven.compiler.target> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <!-- 统一版本管理 --> <spark.version>3.1.1</spark.version> <hive.version>2.3.7</hive.version> <!--Spark 3.1.1 兼容的 Hive 版本 --> </properties> <dependencies> <!-- Spark Core --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.12</artifactId> <version>${spark.version}</version> </dependency> <!-- Spark SQL --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.12</artifactId> <version>${spark.version}</version> </dependency> <!-- Spark Hive Support --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.12</artifactId> <version>${spark.version}</version> </dependency> <!-- Spark MLlib --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-mllib_2.12</artifactId> <version>${spark.version}</version> </dependency> <!-- Hadoop Client --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>3.2.0</version> </dependency> <!-- Hudi Spark Bundle --> <dependency> <groupId>org.apache.hudi</groupId> <artifactId>hudi-spark3.1-bundle_2.12</artifactId> <version>0.12.0</version> </dependency> <!-- ClickHouse JDBC --> <dependency> <groupId>ru.yandex.clickhouse</groupId> <artifactId>clickhouse-jdbc</artifactId> <version>0.3.2</version> </dependency> <!-- Jackson Databind --> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.10.0</version> </dependency> <!-- Jackson Core --> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-core</artifactId> <version>2.10.0</version> </dependency> <!-- Akka Actor --> <dependency> <groupId>com.typesafe.akka</groupId> <artifactId>akka-actor_2.12</artifactId> <version>2.6.16</version> </dependency> <!-- JSch --> <dependency> <groupId>com.jcraft</groupId> <artifactId>jsch</artifactId> <version>0.1.51</version> </dependency> <!-- Typesafe Config --> <dependency> <groupId>com.typesafe</groupId> <artifactId>config</artifactId> <version>1.4.2</version> </dependency> <!-- MySQL Connector --> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.29</version> </dependency> <!-- DOM4J --> <dependency> <groupId>org.dom4j</groupId> <artifactId>dom4j</artifactId> <version>2.1.4</version> </dependency> </dependencies> <build> <plugins> <!-- Maven Shade Plugin 用于创建含所有依赖的fat JAR --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>3.2.4</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <!-- 如果需要设置主类,可以在这里指定 --> <!-- <mainClass>your.main.Class</mainClass> --> </transformer> </transformers> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> </configuration> </execution> </executions> </plugin> </plugins> </build> </project>这是我的pom文件,应该没问题吧
最新发布
08-12
### 项目打与运行配置 要在虚拟机中使用 Spark 运行指定的 Scala 类 `ads.ads_car_brand_cut`,并确保 Maven 项目能够正确打含所有依赖的可执行 JAR 文件,需完成以下配置和操作。 #### 1. Maven 项目结构与依赖配置 确保 `pom.xml` 文件中正确配置了 Spark 和 Scala 的依赖,并且 Scala 版本与 Spark 版本兼容。例如,若使用 Spark 3.3.0,则应使用 Scala 2.12,确保版本一致,否则会出现类加载问题[^2]。 ```xml <properties> <scala.version>2.12.15</scala.version> <spark.version>3.3.0</spark.version> </properties> <dependencies> <!-- Spark Core --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <!-- Spark SQL --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <!-- Hive 支持 --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <!-- ClickHouse JDBC --> <dependency> <groupId>ru.yandex.clickhouse</groupId> <artifactId>clickhouse-jdbc</artifactId> <version>0.3.2</version> </dependency> <!-- MySQL JDBC --> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.28</version> </dependency> </dependencies> ``` #### 2. 使用 Maven Shade Plugin 打含依赖的 JAR 为确保 JAR 文件含所有依赖项,使用 `maven-shade-plugin` 插件进行打,并指定主类为 `ads.ads_car_brand_cut`,以便在运行时直接调用该类[^3]。 ```xml <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>3.2.4</version> <executions> <execution> <phase>package</phase> <goals><goal>shade</goal></goals> <configuration> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>ads.ads_car_brand_cut</mainClass> </transformer> </transformers> </configuration> </execution> </executions> </plugin> </plugins> </build> ``` 执行以下命令进行打: ```bash mvn clean package ``` 生成的 JAR 文件位于 `target/` 目录下,文件名类似 `spark02-1.0-SNAPSHOT.jar`,其中含所有依赖项[^3]。 #### 3. 在虚拟机中运行 Spark 任务 将打好的 JAR 文件上传至虚拟机,并使用 `spark-submit` 命令运行指定类: ```bash spark-submit --class ads.ads_car_brand_cut \ --master yarn \ --deploy-mode cluster \ target/spark02-1.0-SNAPSHOT.jar ``` 如需指定参数,可添加到命令末尾,例如: ```bash spark-submit --class ads.ads_car_brand_cut \ --master yarn \ --deploy-mode cluster \ target/spark02-1.0-SNAPSHOT.jar \ param1 param2 ``` #### 4. 确保工具类和配置文件正确打 所有工具类(如 `sparkunits`、`clickhouseunits`、`MySQLunits`、`tableunits`)应放置在 `src/main/scala` 目录下,并在代码中正确引用。配置文件(如数据库连接信息)应放在 `src/main/resources` 下,确保在打时被含进 JAR 文件中。 #### 5. 类路径与兼容性检查 在运行过程中若出现类路径问题或类不到的错误,应检查以下内容: - Spark 版本与依赖库是否兼容,例如 Hudi、ClickHouse JDBC 等[^4]。 - 确保虚拟机上的 Spark 环境中已安装必要的依赖 JAR ,或通过 `--jars` 参数指定外部依赖: ```bash spark-submit --class ads.ads_car_brand_cut \ --master yarn \ --jars /path/to/hudi-spark-bundle.jar,/path/to/clickhouse-jdbc.jar \ target/spark02-1.0-SNAPSHOT.jar ``` --- ###
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值