编译spark:[thriftServer.sh属于测试阶段 hive-0.13.1]

本文档详细介绍了如何从源代码编译Spark并配置支持Parquet文件格式的步骤,包括设置Maven内存使用、编译参数及生成部署包的过程。此外还提供了启动集群和服务的具体命令。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

编译spark:[thriftServer.sh属于测试阶段 hive-0.13.1]

说明: 目前已经发布了1.2版本此文至合适安装参考,不用自己编译了


vi  sql/hive/pom.xml 支持读取parquet
<dependency>
           <groupId>com.twitter</groupId>
           <artifactId>parquet-hive-bundle</artifactId>
           <version>1.5.0</version>
        </dependency>


Setting up Maven’s Memory Usage
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"


编译:
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive-0.13.1 -Dhive.version=0.13.1 -Phbase-0.98.7 -Dhbase.version=0.98.7 -DskipTests clean package


生成部署包:


[修改SPARK_HIVE那一段脚本,<id>hive</id>改为自己需要的版本,不然无法生成支持hive的包]
https://github.com/apache/spark/blob/master/make-distribution.sh 


./make-distribution.sh --tgz -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive-0.13.1 -Dhive.version=0.13.1 -Phbase-0.98.7 -Dhbase.version=0.98.7
[等价==> mvn clean package -DskipTests -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive-0.13.1 -Dhive.version=0.13.1 -Phbase-0.98.7 -Dhbase.version=0.98.7]


install:
修改 spark-env.sh 添加如下内容
export JAVA_HOME=/usr/local/jdk1.7.0_45
export SCALA_HOME=/usr/local/scala
export HIVE_CONF_DIR=/usr/local/hive-0.12/conf
export CLASSPATH=$CLASSPATH:/usr/local/hive-0.12/lib
export HADOOP_CONF_DIR=/usr/local/hadoop-2.5.1/etc/hadoop
export SPARK_MASTER_IP=hadoop0
export SPARK_WORKER_MEMORY=2g


start/stop cluster:
start-all.sh
stop-all.sh


JVM:
最大堆 最小堆新生代大小
-Xmx60m -Xms20m -Xmn7m -XX:+PrintGCDetails




spark-submit:
https://spark.apache.org/docs/latest/submitting-applications.html
./bin/spark-submit 
--name SparkAnalysis 
--class com.itweet.spark.Analysis 
--master spark://itr-mastertest01:7077 
--executor-memory 20G 
--total-executor-cores 20 
/program/spark-1.0-project-1.0.jar 
hdfs://itr-mastertest01:9000/labs/docword
hdfs://itr-mastertest01:9000/wc


spark-sql:
/uar/local/spark-1.2/bin/spark-sql --master spark://itr-mastertest01:7077 --executor-memory 512M --total-executor-cores 2
spark-shell:
MASTER=spark://itr-mastertest01:7077 /usr/local/spark-1.2/bin/spark-shell
spark-thriftserver:
/usr/local/spark-1.2/sbin/start-thriftserver.sh --master spark://itr-mastertest01:7077
[root@master conf]# /usr/local/spark/sbin/start-all.sh starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out slave1: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out slave2: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out slave1: failed to launch: nice -n 0 /usr/local/spark/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master:7077 slave2: failed to launch: nice -n 0 /usr/local/spark/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master:7077 slave1: Spark Command: /usr/local/jdk/bin/java -cp /usr/local/spark/conf/:/usr/local/spark/jars/*:/usr/local/hadoop/etc/hadoop/:/usr/local/hive/lib/ -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master:7077 slave1: ======================================== slave1: full log in /usr/local/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out slave2: Spark Command: /usr/local/jdk/bin/java -cp /usr/local/spark/conf/:/usr/local/spark/jars/*:/usr/local/hadoop/etc/hadoop/:/usr/local/hive/lib/ -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master:7077 slave2: ======================================== slave2: full log in /usr/local/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out
最新发布
07-29
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值