1 启动hdfs
https://blog.youkuaiyun.com/ssllkkyyaa/article/details/86735817
2启动hive
https://blog.youkuaiyun.com/ssllkkyyaa/article/details/86527365
s200启动hive
$HIVE_HOME/bin/hive
3启动spark
https://blog.youkuaiyun.com/ssllkkyyaa/article/details/89703266
s200
start-all.sh
------------------
start-master.sh //RPC端口 7077
start-slave.sh spark://s201:7077
------dfs创建测试文档
touch test.txt
vi test.txt
hello world
hello world1
hello world2
hello world3
hello world4
------
hadoop fs -ls -R /
hadoop fs -mkdir -p /mycluster/user/centos
hadoop fs -put test.txt /mycluster/user/centos
hadoop fs -chmod 777 /mycluster/user/centos/test.txt
hadoop fs -cat /mycluster/user/centos/test.txt
hdfs dfs -mkdir -p /user/centos
hdfs dfs -cp /mycluster/user/centos/test.txt /user/centos
-----
spark集成hadoop ha
-------------------------
1.复制core-site.xml + hdfs-site.xml到spark/conf目录下
2.分发文件到spark所有work节点
2.2配置hive环境变量到spark
3.启动spark集群
4.启动spark-shell,连接spark集群上
$>spark-shell --master spark://s200:7077
$scala>sc.textFile("hdfs://mycluster/user/centos/test.txt").collect();
退出:ctrl+d
(注意:必须在“dfs创建测试文档”这一部进行拷贝cp , 默认mycluster位置 mycluster/user/centos/test.txt
否则会出现找不到 mycluster/user/centos/test.txt报错 )
sc.textFile("hdfs://mycluster/user/centos/test.txt").flatMap(_.split(" ")). map((_,1)).map(t=>{import scala.util.Random;val par=Random.nextInt(100);(t._1+"_"+par,1)}).reduceByKey(_+_).map(t=>{val arr=t._1.splite("_");(arr(0),t