SPARK_MASTER_HOST=hadoop001
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=1
master:
hadoop1
slaves:
hadoop2
hadoop3
hadoop4
....
hadoop10
==> start-all.sh 会在 hadoop1机器上启动master进程,在slaves文件配置的所有hostname的机器上启动worker进程
Spark WordCount统计
val file = spark.sparkContext.textFile(“file:///home/hadoop/data/wc.txt”)
val wordCounts = file.flatMap(line => line.split(“,”)).map((word => (word, 1))).reduceByKey(_ + _)
wordCounts.collect
本地IDE
A master URL must be set in your configuration
点击edit configuration,在左侧点击该项目。在右侧VM options中输入“-Dspark.master=local”,指示本程序本地单线程运行,再次运行即可。
package org.example
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
object SQLContextAPP {
def main(args: Array[String]): Unit = {
//1创建相应的Spark
val sparkConf = new SparkConf()
sparkConf.setAppName("SQLContextAPP")
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
//2数据处理
val people = sqlContext.read.format("json").load("people.json")
people.printSchema()
people.show()
//3关闭资源
sc.stop()
}
}
root
|-- age: long (nullable = true)
|-- name: string (nullable = true)
.........................
| age| name|
+----+-------+
|null|Michael|
| 30| Andy|
| 19| Justin|
+----+-------+
配置maven环境变量cmd控制台提示:mvn不是内部或外部命令,也不是可运行的程序或批处理文件
首先maven环境变量:
变量名:MAVEN_HOME
变量值:E:\apache-maven-3.2.3
变量名:Path
变量值:;%MAVEN_HOME%\bin
然后到项目的目录下直接执行
C:\Users\jacksun\IdeaProjects\SqarkSQL\ mvn clean package -DskipTests
在集群上测试
spark-submit \
--name SQLContextApp \
--class org.example.SQLContextApp \
--master local[2] \
/home/hadoop/lib/sql-1.0.jar \
/home/hadoop/app/spark-2.1.0-bin-2.6.0-cdh5.7.0/examples/src/main/resources/people.json
HiveContextAPP
注意:
1)To use a HiveContext, you do not need to have an existing Hive setup
2)hive-site.xml
package org.example
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
object HiveContextAPP {
def main(args: Array[String]): Unit = {
//1创建相应的Spark
val path =args(0)
val sparkConf = new SparkConf()
//测试和生产中AppName和Master是通过脚本执行的
//sparkConf.setAppName("HiveContextAPP").setMaster("local[2]&