#!/bin/bash
set -e
/opt/app/spark-2.2.0/bin/spark-submit \
--master yarn \
--deploy-mode client \
--executor-memory 18G \
--num-executors 50 \
--executor-cores 5 \
--driver-memory 2G \
--conf spark.default.parallelism=1000 \
--conf spark.storage.memoryFraction=0.5 \
--conf spark.shuffle.memoryFraction=0.3 \
--class org.apache.spark.examples.ml.Hello spark-gbtlr-2.4.0-jar-with-dependencies.jar
提交执行任务时,遇到两个问题
1.报错如下,原因在于:去掉spark-submit \尾部空格 注:在Linux ,一条命令过长的情况下,可手动拆成多行,即使用 任意个空格加上反斜杠(\),反斜杠后不能有任意字符,然后回车即可。
Error: Cannot load main class from JAR file:/data/zeus/job_dir/2019-07-26/manual-176529114/%09
2.解决方法:由于项目打包的时候,需要将打好包的jar文件中的 META-INF/*.RSA META-INF/*.DSA META-INF/*.SF 文件删掉
参考:https://blog.youkuaiyun.com/dai451954706/article/details/50086295
9/07/26 15:01:07 [main] WARN KafkaProducer: metadata.fetch.timeout.ms config is deprecated and will be removed soon. Please use max.block.ms
Exception in thread "main" java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
at sun.security.util.SignatureFileVerifier.processImpl(SignatureFileVerifier.java:314)
at sun.security.util.SignatureFileVerifier.process(SignatureFileVerifier.java:268)
at java.util.jar.JarVerifier.processEntry(JarVerifier.java:316)
at java.util.jar.JarVerifier.update(JarVerifier.java:228)
所以打包时候添加:
<!--打包-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.5.1</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
<encoding>utf8</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.2</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<!--使用ComponentsXmlResourceTransformer合并classpath冲突的components.xml是关键-->
<transformer implementation="org.apache.maven.plugins.shade.resource.ComponentsXmlResourceTransformer"/>
</transformers>
<!--<minimizeJar>true</minimizeJar>-->
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>jar-with-dependencies</shadedClassifierName>
<artifactSet>
<excludes>
<exclude>org.apache.hadoop:*</exclude>
</excludes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-source-plugin</artifactId>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.6</version>
<executions>
<execution>
<goals>
<goal>test-jar</goal>
</goals>
</execution>
</executions>
</plugin>
3.运行时报错
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0
其根本原因在于Hadoop版本不一致。因此最佳解决方法是修改pom文件中Hadoop的版本。

在使用 Spark-2.2.0 运行 `spark-submit` 命令时,遇到了两个问题。首先,报错提示因命令行拆行的空格引起,解决方法是确保反斜杠后没有空格。其次,由于打包的 jar 文件中的 `META-INF/*.RSA` 等证书文件导致问题,需要在打包时删除这些文件。另外,运行时出现 `UnsatisfiedLinkError`,原因是 Hadoop 版本不匹配,解决方法是更新 pom 文件中 Hadoop 的版本到一致。
1万+

被折叠的 条评论
为什么被折叠?



