Maven创建Spark工程
一、创建Maven项目
-
创建quickstart的maven的项目:
-
编写组名和项目名:
-
修改Maven的安装目录
点击完成
二、添加依赖
- 在pom.xml中添加依赖,选择import-change
根据当前使用的jdk和spark版本选择修改版本号
<!--修改jdk版本号-->
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.21</version>
</dependency>
</dependencies>
-
在src-main文件夹中创建scala文件夹,右击修改文件夹类型为Sources Root
添加scala SDK完成后选择apply -->ok
-
创建src同级目录resource,修改文件夹类型为Resource Root
-
复制Maven:org.apach.spark:spark-core中的log4j-defaults.properties文件到resource文件夹中
-
修改拷贝的log4j文件,将INFO修改成ERROR,即报错时打印信息
三、编写Spark源代码
import org.apache.spark.{SparkConf, SparkContext}
object wordCount {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[2]").setAppName("wordCount")
val sc = SparkContext.getOrCreate(conf)
val rdd1 = sc.parallelize(List("hello world","hello java","hello scala"))
sc.makeRDD(List(1,2,3,4,5,6))
rdd1.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect.foreach(println)
val partitions=rdd1.partitions
println("rdd分区数:"+partitions.length)
println("----------------------")
//绝对路径
println("绝对路径")
val lines1 = sc.textFile("F:\\IdeaProjects\\kb09\\sparkDemo\\in\\word.txt")
lines1.collect.foreach(println)
//相对路径
println("相对路径")
val lines2 = sc.textFile("in/word.txt")
lines2.collect.foreach(println)
//文件上传hdfs,根据hdfs路径
//需要在windows上添加虚拟机ip,具体方式如注释
println("----------hdfs------------")
val linesHDFS = sc.textFile("hdfs://hadoop001:9000/kb09space/*.txt")
linesHDFS.collect.foreach(println)
}
}
注:
windows添加虚拟机ip
C --> windows --> System32 -> drivers --> etc -->hosts
进入编辑,添加主机地址和主机名