wget https://archive.apache.org/dist/spark/spark-2.4.2/spark-2.4.2.tgz
spark2.4 pom.xml 修改:
<!-- Add vendor maven repositories -->
<!-- Cloudera -->
<repository>
<id>cloudera-releases</id>
<url>http://repository.cloudera.com/artifactory/cloudera-repos</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
<!-- Hortonworks -->
<repository>
<id>HDPReleases</id>
<name>HDP Releases</name>
<url>http://repo.hortonworks.com/content/repositories/releases/</url>
<snapshots><enabled>false</enabled></snapshots>
<releases><enabled>true</enabled></releases>
</repository>
<repository>
<id>HortonworksJettyHadoop</id>
<name>HDP Jetty</name>
<url>http://repo.hortonworks.com/content/repositories/jetty-hadoop</url>
<snapshots><enabled>false</enabled></snapshots>
<releases><enabled>true</enabled></releases>
</repository>
<!-- MapR -->
<repository>
<id>mapr-releases</id>
<url>https://repository.mapr.com/maven/</url>
<snapshots><enabled>false</enabled></snapshots>
<releases><enabled>true</enabled></releases>
</repository>
[root@hdp2 spark-2.4.2]# pwd
/root/spark-2.4.2
[root@hdp2 spark-2.4.2]#
执行编译命令
参数详解
Phadoop hadoop的大版本号
Dhadoop.version=2.6.0-cdh5.7.0 hadoop 的详细版本号
–pip 支持python
–r 支持r
Psparkr支持pyspark
Pkubernetes 支持k8s
Phive-thriftserver 支持hive
-Phive 支持hive
–tgz 打包方式
–name 打包后的生成的名称
-Phive -Phive-thriftserve 连接hive相关
-Pyarn 连接hadoop相关
#仅仅是为了编译源码, 编译后可以导入idea中
[root@hdp2 spark-2.4.2]# ./build/mvn -Pyarn -Phive -Phive-thriftserver -Phadoop-3.1 -Dhadoop.version=3.1.1 -DskipTests clean package
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 2.4.2:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 3.764 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 7.418 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 8.415 s]
[INFO] Spark Project Local DB ............................. SUCCESS [ 4.486 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 8.177 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 4.475 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 8.950 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 6.291 s]
[INFO] Spark Project Core ................................. SUCCESS [03:21 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [ 10.382 s]
[INFO] Spark Project GraphX ............................... SUCCESS [ 14.782 s]
[INFO] Spark Project Streaming ............................ SUCCESS [ 43.863 s]
[INFO] Spark Project Catalyst ............................. SUCCESS [02:34 min]
[INFO] Spark Project SQL .................................. SUCCESS [04:22 min]
[INFO] Spark Project ML Library ........................... SUCCESS [02:39 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 1.203 s]
[INFO] Spark Project Hive ................................. SUCCESS [01:06 min]
[INFO] Spark Project REPL ................................. SUCCESS [ 6.720 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 10.094 s]
[INFO] Spark Project YARN ................................. SUCCESS [ 16.335 s]
[INFO] Spark Project Hive Thrift Server ................... SUCCESS [ 18.117 s]
[INFO] Spark Project Assembly ............................. SUCCESS [ 5.043 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 10.603 s]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [ 16.859 s]
[INFO] Spark Project Examples ............................. SUCCESS [ 23.006 s]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 10.769 s]
[INFO] Spark Avro ......................................... SUCCESS [ 9.505 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 18:15 min
[INFO] Finished at: 2020-07-08T14:54:06+08:00
[INFO] ------------------------------------------------------------------------
[root@hdp2 spark-2.4.2]#
#编译后并打包,打包后可以丢到生产环境
[root@hdp2 spark-2.4.2]# ./dev/make-distribution.sh --name 3.1.1 --tgz -Phadoop-3.1 -Dhadoop.version=3.1.1 -Phive -Phive-thriftserver -Pyarn
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 2.4.2:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 3.506 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 8.693 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 9.827 s]
[INFO] Spark Project Local DB ............................. SUCCESS [ 8.492 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 13.254 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 5.543 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 14.309 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 11.668 s]
[INFO] Spark Project Core ................................. SUCCESS [03:21 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [ 25.752 s]
[INFO] Spark Project GraphX ............................... SUCCESS [ 20.328 s]
[INFO] Spark Project Streaming ............................ SUCCESS [ 48.529 s]
[INFO] Spark Project Catalyst ............................. SUCCESS [02:39 min]
[INFO] Spark Project SQL .................................. SUCCESS [04:56 min]
[INFO] Spark Project ML Library ........................... SUCCESS [03:03 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 8.685 s]
[INFO] Spark Project Hive ................................. SUCCESS [01:05 min]
[INFO] Spark Project REPL ................................. SUCCESS [ 7.194 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 10.414 s]
[INFO] Spark Project YARN ................................. SUCCESS [ 19.560 s]
[INFO] Spark Project Hive Thrift Server ................... SUCCESS [ 20.583 s]
[INFO] Spark Project Assembly ............................. SUCCESS [ 4.442 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 10.688 s]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [ 19.058 s]
[INFO] Spark Project Examples ............................. SUCCESS [ 24.540 s]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 10.721 s]
[INFO] Spark Avro ......................................... SUCCESS [ 11.792 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 14:57 min (Wall Clock)
[INFO] Finished at: 2020-07-08T16:15:56+08:00
[INFO] ------------------------------------------------------------------------
其中 spark-2.4.2-bin-3.1.1.tgz 为编译的安装包
[root@hdp2 spark-2.4.2]# ls
appveyor.yml data launcher python sql
assembly dev LICENSE R streaming
bin dist licenses README.md target
build docs mllib repl tools
common examples mllib-local resource-managers
conf external NOTICE sbin
CONTRIBUTING.md graphx pom.xml scalastyle-config.xml
core hadoop-cloud project spark-2.4.2-bin-3.1.1.tgz
解压 spark-2.4.2-bin-3.1.1.tgz 后 其中hadoop的依赖jar版本情况:
[root@hdp2 vdb]# cd spark-2.4.2-bin-3.1.1
[root@hdp2 spark-2.4.2-bin-3.1.1]# pwd
/mnt/vdb/spark-2.4.2-bin-3.1.1
[root@hdp2 spark-2.4.2-bin-3.1.1]# ls
bin conf data examples jars python README.md RELEASE sbin yarn
解压后目录详解
bin:客户端相关脚本,如beeline,可以删除cmd的结尾文件
conf:配置文件脚本模板,用时拷贝修改
data:存放的一些测试数据
examples:存放测试用例代码,代码非常好 强烈建议观看学习
jars:一堆jar包,所有jar包放一起,不像1.0那样就几个jar,2.0散开了(最佳实践)
LICENSE、 licenses、 NOTICE、python、README.md、RELEASE等文件夹都可以删除
sbin:服务端的相关脚本,如集群启停命令
yarn:存在yarn相关jar包
其他:
#设置内存2G
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
#编译前安装一些压缩解压缩工具
yum install -y snappy snappy-devel bzip2 bzip2-devel lzo lzo-devel lzop openssl openssl-devel