spark-submit
java程序部署到集群
在spark 目录下创建一个脚本 spark-submit.sh
/opt/spark/bin/spark-submit \
--class $1 \
--num-executors 1 \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 2 \
/opt/jars/spark/spark-wc.jar\
在集群上运行jar包的时候,有一点需要注意
spark不识别hadoop的分发设置
所以我们要在spark的配置文件中配置
cd /opt/spark/conf
cp spark-defaults.conf.template spark-defaults.conf
vim spark-defaults.conf
spark.files file:///opt/hadoop/etc/hadoop/hadfs-site.xml,file:///opt/hadoop/etc/hadoop/core-site.xml
配置完成之后,进行重启,因为spark 启动的时候会加载配置文件
spark-submit.sh com.sanmao.spark.wc.JavaWordCountRemote
Application
hdfs dfs -text hdfs://master:9000/out/part*
计算结果成功
scala程序部署到集群
重新编写spark-submit
/opt/spark/bin/spark-submit \
--class $1 \
--master yarn \
--deploy-mode $2 \
--num-executors 1 \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 2 \
/opt/jars/spark/spark-wc.jar\
运行scala程序的时候要注意,$2传参,如果是client本地模式将会报错,cluster集群模式正常运行,
spark shell操作
- val linesRDD = sc.textFile(“hdfs://ns1/nihao”)
- val wordsRDD = linesRDD.flatMap(_.split(” “))
- val wordRDD = wordsRDD.map((_,1))
- val wcRDD = wordRDD.reduceByKey(+)
- wcRDD.foreach(tuple => println(tuple._1+” “+tuple._2))
单行:
sc.textFile(“hdfs://ns1/nihao”).flatMap(.split(” “)).map((,1)).reduceByKey(+).foreach(println)
maven 打包scala程序,放到spark集群
复制mvn 打包插件,并按照自己代码修改mainClass
<!-- compiler插件, 设定JDK版本 -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<encoding>UTF-8</encoding>
<source>1.8</source>
<target>1.8</target>
<showWarnings>true</showWarnings>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>com.uplooking.bigdata.storm.kafka.KafkaLocalWCTopology</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
设置mvn 打包命令
Run-->Edit configurations-->新建个maven 打包
Command line :clean package -DskipTests
这个打包的时候需要注意的是
Hbase 需要设置 type 为pom ,因为这里他管理的,没有具体jar包
时间比较慢。。。。等等吧