Scala安装
下载:https://www.scala-lang.org/download/
解压
tar -zxvf scala-2.12.8.tgz -C
mv scala-2.12.8 scala
测试:
scala -version
启动:
scala
Spark安装
下载:https://www.apache.org/dyn/closer.lua/spark/spark-2.4.2/spark-2.4.2-bin-hadoop2.7.tgz
解压
tar -zxvf spark-2.4.2-bin-hadoop2.7.tgz
启动spark环境
/opt/module/spark/sbin/start-all.sh
观察进程 jps
查看spark的web控制页面:http://ip:8080/
启动Spark Shell
./bin/spark-shell
WordCount
加载本地文件:
val textFile = sc.textFile("file:///bigdata/spark/code/wordcount/word.txt")
加载hdfs文件:(先把文件上传到hdfs: hdoop fs -put ./word.txt / )
scala>val textFile = sc.textFile("hdfs://ip:9000/user/hadoop/word.txt")
scala>val textFile = sc.textFile("/user/hadoop/word.txt")
scala>val textFile = sc.textFile("word.txt")
打印文件第一行:
textFile.first()
词频统计:
val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
wordCount.collect()
能力有限,如有不详细步骤,请参照其他原创作者的教程。