Scala安装及环境变量配置
环境变量
JAVA_HOME=/root/app/jdk
SCALA_HOME=/root/app/scala
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
PATH=$JAVA_HOME/bin:$SCALA_HOME/bin:/root/tools:/home/hadoop/tools:$PATH
export JAVA_HOME CLASSPATH PATH SCALA_HOME
运行测试
[root@cdh1 ~]# source /etc/profile
[root@cdh1 ~]# scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL
[root@cdh1 ~]# runRemoteCmd.sh "jps" all
*******************cdh1***********************
6914 Jps
1592 QuorumPeerMain
*******************cdh2***********************
1728 QuorumPeerMain
8464 Jps
*******************cdh3***********************
1649 QuorumPeerMain
7269 Jps
Scala分发slave节点并创建软连接
Master:
deploy.sh /root/app/scala-2.11.8 /root/app slave
Slave:
ln -s scala-2.11.8 scala
Spark standalone
通过远程脚本创建 spark 日志和进程目录
[root@cdh1 conf]# runRemoteCmd.sh "mkdir -p /root/data/spark/logs" all
*******************cdh1***********************
*******************cdh2***********************
*******************cdh3***********************
拷贝hdfs配置文件到spark目录(与hdfs读写数据)
[root@cdh1 ~]# cp /root/app/hadoop/etc/hadoop/core-site.xml /root/app/spark/conf/
[root@cdh1 ~]# cp /root/app/hadoop/etc/hadoop/hdfs-site.xml /root/app/spark/conf/
spark-env.sh配置文件
vi spark-env.sh
#配置 jdk
export JAVA_HOME=/root/app/jdk
#配置 hadoop 配置文件目录
export HADOOP_CONF_DIR=/root/app/hadoop/etc/hadoop
#配置 hadoop 根目录
export HADOOP_HOME=/root/app/hadoop
#spark master webui 端口,默认是 8080,跟 tomcat 冲突
SPARK_MASTER_WEBUI_PORT=8888
#配置 spark HA 配置
SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=cdh1:2181,cdh2:2181,cdh3:2181 -Dspark.deploy.zookeeper.dir=/myspark"
#spark 配置文件目录
SPARK_CONF_DIR=/root/app/spark/conf
#spark 日志目录
SPARK_LOG_DIR=/root/data/spark/logs
#spark 进程 ip 文件保存位置
SPARK_PID_DIR=/root/data/spark/logs
slaves
cdh1
cdh2
cdh3
同步spark安装目录
[root@cdh1 ~]# deploy.sh /root/app/spark-2.3.0-bin-hadoop2.6 /root/app/ slave
Slave:
ln -s spark-2.3.0-bin-hadoop2.6 spark
集群启动运行
cdh1:
[root@cdh1 sbin]# ./start-all.sh
starting org.apache.spark.deploy.master.Master,Master-1-cdh1.out
cdh1: starting org.apache.spark.deploy.worker.Worker,Worker-1-cdh1.out
cdh3: starting org.apache.spark.deploy.worker.Worker,Worker-1-cdh3.out
cdh2: starting org.apache.spark.deploy.worker.Worker,Worker-1-cdh2.out
cdh2:
[root@cdh2 sbin]# ./start-master.sh
starting org.apache.spark.deploy.master.Master,master.Master-1-cdh2.out
查看spark webui界面

提交 spark作业
启动hdfs上传txt
[root@cdh1 sbin]# ./start-dfs.sh
[root@cdh1 bin]# ./hdfs dfs -put /root/app/sparktest/test2.txt /test
提交作业
[root@cdh1 bin]# ./spark-submit --master spark://cdh1:7077,cdh2:7077 --class com.pcitc.scala.sparkscalatest.MyScalaWordCout /root/app/sparktest/sparkscalatest-0.0.1-SNAPSHOT.jar /test/test2.txt /test/output1
查看运行结果
[root@cdh1 bin]# ./hdfs dfs -cat /test/output1/*
(scala,1)
(hive,2)
(python,1)
(java,3)
(spark,6)
(hadoop,4)
(hbase,2)
配置文件Spark on yarn
vi spark-env.sh
HADOOP_CONF_DIR=/root/app/hadoop/etc/hadoop
cluster 模式启动运行
./spark-submit --class com.pcitc.scala.sparkscalatest.MyScalaWordCout --master yarn --deploy-mode cluster /root/app/sparktest/sparkscalatest-0.0.1-SNAPSHOT.jar /test/test2.txt /test/output5
./hdfs dfs -cat /test/output5/*
(scala,1)
(hive,2)
(python,1)
(java,3)
(spark,6)
(hadoop,4)
(hbase,2)
client模式
./spark-submit --class com.pcitc.scala.sparkscalatest.MyScalaWordCout --master yarn --deploy-mode client /root/app/sparktest/sparkscalatest-0.0.1-SNAPSHOT.jar /test/test2.txt /test/output6
./hdfs dfs -cat /test/output6/*
(scala,1)
(hive,2)
(python,1)
(java,3)
(spark,6)
(hadoop,4)
(hbase,2)
393

被折叠的 条评论
为什么被折叠?



