spark2.4.2+hadoop3编译

wget https://archive.apache.org/dist/spark/spark-2.4.2/spark-2.4.2.tgz

 

 

spark2.4 pom.xml 修改:

 

     <!-- Add vendor maven repositories -->

               <!-- Cloudera -->

               <repository>

                       <id>cloudera-releases</id>

                       <url>http://repository.cloudera.com/artifactory/cloudera-repos</url>

                       <releases>

                               <enabled>true</enabled>

                       </releases>

                       <snapshots>

                               <enabled>false</enabled>

                       </snapshots>

               </repository>

               <!-- Hortonworks -->

               <repository>

                       <id>HDPReleases</id>

                       <name>HDP Releases</name>

                       <url>http://repo.hortonworks.com/content/repositories/releases/</url>

                       <snapshots><enabled>false</enabled></snapshots>

                       <releases><enabled>true</enabled></releases>

               </repository>

               <repository>

                       <id>HortonworksJettyHadoop</id>

                       <name>HDP Jetty</name>

                       <url>http://repo.hortonworks.com/content/repositories/jetty-hadoop</url>

                       <snapshots><enabled>false</enabled></snapshots>

                       <releases><enabled>true</enabled></releases>

               </repository>

               <!-- MapR -->

               <repository>

                       <id>mapr-releases</id>

                       <url>https://repository.mapr.com/maven/</url>

                       <snapshots><enabled>false</enabled></snapshots>

                       <releases><enabled>true</enabled></releases>

               </repository>

 

 

[root@hdp2 spark-2.4.2]# pwd

/root/spark-2.4.2

[root@hdp2 spark-2.4.2]#

 

执行编译命令

 

参数详解

Phadoop hadoop的大版本号

Dhadoop.version=2.6.0-cdh5.7.0 hadoop 的详细版本号

–pip 支持python

–r 支持r

Psparkr支持pyspark

Pkubernetes 支持k8s

Phive-thriftserver 支持hive

-Phive 支持hive

–tgz 打包方式

–name 打包后的生成的名称

-Phive -Phive-thriftserve 连接hive相关

-Pyarn 连接hadoop相关

#仅仅是为了编译源码, 编译后可以导入idea中

 

[root@hdp2 spark-2.4.2]# ./build/mvn -Pyarn -Phive -Phive-thriftserver -Phadoop-3.1 -Dhadoop.version=3.1.1 -DskipTests clean package

 

[INFO] ------------------------------------------------------------------------

[INFO] Reactor Summary for Spark Project Parent POM 2.4.2:

[INFO]

[INFO] Spark Project Parent POM ........................... SUCCESS [  3.764 s]

[INFO] Spark Project Tags ................................. SUCCESS [  7.418 s]

[INFO] Spark Project Sketch ............................... SUCCESS [  8.415 s]

[INFO] Spark Project Local DB ............................. SUCCESS [  4.486 s]

[INFO] Spark Project Networking ........................... SUCCESS [  8.177 s]

[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  4.475 s]

[INFO] Spark Project Unsafe ............................... SUCCESS [  8.950 s]

[INFO] Spark Project Launcher ............................. SUCCESS [  6.291 s]

[INFO] Spark Project Core ................................. SUCCESS [03:21 min]

[INFO] Spark Project ML Local Library ..................... SUCCESS [ 10.382 s]

[INFO] Spark Project GraphX ............................... SUCCESS [ 14.782 s]

[INFO] Spark Project Streaming ............................ SUCCESS [ 43.863 s]

[INFO] Spark Project Catalyst ............................. SUCCESS [02:34 min]

[INFO] Spark Project SQL .................................. SUCCESS [04:22 min]

[INFO] Spark Project ML Library ........................... SUCCESS [02:39 min]

[INFO] Spark Project Tools ................................ SUCCESS [  1.203 s]

[INFO] Spark Project Hive ................................. SUCCESS [01:06 min]

[INFO] Spark Project REPL ................................. SUCCESS [  6.720 s]

[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 10.094 s]

[INFO] Spark Project YARN ................................. SUCCESS [ 16.335 s]

[INFO] Spark Project Hive Thrift Server ................... SUCCESS [ 18.117 s]

[INFO] Spark Project Assembly ............................. SUCCESS [  5.043 s]

[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 10.603 s]

[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [ 16.859 s]

[INFO] Spark Project Examples ............................. SUCCESS [ 23.006 s]

[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 10.769 s]

[INFO] Spark Avro ......................................... SUCCESS [  9.505 s]

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

[INFO] ------------------------------------------------------------------------

[INFO] Total time:  18:15 min

[INFO] Finished at: 2020-07-08T14:54:06+08:00

[INFO] ------------------------------------------------------------------------

[root@hdp2 spark-2.4.2]#

 

 

#编译后并打包,打包后可以丢到生产环境

 

[root@hdp2 spark-2.4.2]# ./dev/make-distribution.sh --name 3.1.1  --tgz -Phadoop-3.1 -Dhadoop.version=3.1.1 -Phive -Phive-thriftserver -Pyarn

 

[INFO] ------------------------------------------------------------------------

[INFO] Reactor Summary for Spark Project Parent POM 2.4.2:

[INFO]

[INFO] Spark Project Parent POM ........................... SUCCESS [  3.506 s]

[INFO] Spark Project Tags ................................. SUCCESS [  8.693 s]

[INFO] Spark Project Sketch ............................... SUCCESS [  9.827 s]

[INFO] Spark Project Local DB ............................. SUCCESS [  8.492 s]

[INFO] Spark Project Networking ........................... SUCCESS [ 13.254 s]

[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  5.543 s]

[INFO] Spark Project Unsafe ............................... SUCCESS [ 14.309 s]

[INFO] Spark Project Launcher ............................. SUCCESS [ 11.668 s]

[INFO] Spark Project Core ................................. SUCCESS [03:21 min]

[INFO] Spark Project ML Local Library ..................... SUCCESS [ 25.752 s]

[INFO] Spark Project GraphX ............................... SUCCESS [ 20.328 s]

[INFO] Spark Project Streaming ............................ SUCCESS [ 48.529 s]

[INFO] Spark Project Catalyst ............................. SUCCESS [02:39 min]

[INFO] Spark Project SQL .................................. SUCCESS [04:56 min]

[INFO] Spark Project ML Library ........................... SUCCESS [03:03 min]

[INFO] Spark Project Tools ................................ SUCCESS [  8.685 s]

[INFO] Spark Project Hive ................................. SUCCESS [01:05 min]

[INFO] Spark Project REPL ................................. SUCCESS [  7.194 s]

[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 10.414 s]

[INFO] Spark Project YARN ................................. SUCCESS [ 19.560 s]

[INFO] Spark Project Hive Thrift Server ................... SUCCESS [ 20.583 s]

[INFO] Spark Project Assembly ............................. SUCCESS [  4.442 s]

[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 10.688 s]

[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [ 19.058 s]

[INFO] Spark Project Examples ............................. SUCCESS [ 24.540 s]

[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 10.721 s]

[INFO] Spark Avro ......................................... SUCCESS [ 11.792 s]

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

[INFO] ------------------------------------------------------------------------

[INFO] Total time:  14:57 min (Wall Clock)

[INFO] Finished at: 2020-07-08T16:15:56+08:00

[INFO] ------------------------------------------------------------------------

 

其中  spark-2.4.2-bin-3.1.1.tgz  为编译的安装包

[root@hdp2 spark-2.4.2]# ls

appveyor.yml     data          launcher     python                     sql

assembly         dev           LICENSE      R                          streaming

bin              dist          licenses     README.md                  target

build            docs          mllib        repl                       tools

common           examples      mllib-local  resource-managers

conf             external      NOTICE       sbin

CONTRIBUTING.md  graphx        pom.xml      scalastyle-config.xml

core             hadoop-cloud  project      spark-2.4.2-bin-3.1.1.tgz

 

解压 spark-2.4.2-bin-3.1.1.tgz 后 其中hadoop的依赖jar版本情况:

 

 

[root@hdp2 vdb]# cd spark-2.4.2-bin-3.1.1

[root@hdp2 spark-2.4.2-bin-3.1.1]# pwd

/mnt/vdb/spark-2.4.2-bin-3.1.1

[root@hdp2 spark-2.4.2-bin-3.1.1]# ls

bin  conf  data  examples  jars  python  README.md  RELEASE  sbin  yarn

 

解压后目录详解

bin:客户端相关脚本,如beeline,可以删除cmd的结尾文件

conf:配置文件脚本模板,用时拷贝修改

data:存放的一些测试数据

examples:存放测试用例代码,代码非常好 强烈建议观看学习

jars:一堆jar包,所有jar包放一起,不像1.0那样就几个jar,2.0散开了(最佳实践)

LICENSE、 licenses、 NOTICE、python、README.md、RELEASE等文件夹都可以删除

sbin:服务端的相关脚本,如集群启停命令

yarn:存在yarn相关jar包

 

 

其他:

#设置内存2G

export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

#编译前安装一些压缩解压缩工具

yum install -y snappy snappy-devel bzip2 bzip2-devel lzo lzo-devel lzop openssl openssl-devel

 

 

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值