本博文使用的环境是Ubuntu16.04+java8+Spark2.2.0+MongDB3.4.7,如果还未配置好相关环境可以参考本人前面所写博客:
Java环境配置:http://blog.youkuaiyun.com/lengconglin/article/details/77016911
MongDB安装:http://blog.youkuaiyun.com/lengconglin/article/details/77072842
Spark开发环境搭建:http://blog.youkuaiyun.com/lengconglin/article/details/77847623
创建项目前确保已经开启了spark和MongDB,
启动spark,在spark/sbin目录下执行 : ./start-all.sh 文件
启动MongDB,在MongDB安装目录的bin目录下执行 ./mongod
使用Intellij IDEA创建Scala项目,在build.sbt文件下添加依赖项,注释的可以根据自己需求添加:
libraryDependencies ++= Seq("org.apache.spark" % "spark-core_2.11" % "2.2.0")
libraryDependencies += "org.mongodb.spark" % "mongo-spark-connector_2.11" % "2.2.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"
//libraryDependencies += "org.apache.spark" % "spark-mllib_2.11" % "2.2.0"
//libraryDependencies +="org.apache.spark"%"spark-graphx_2.11"%"2.2.0"
//libraryDependencies +="org.apache.spark"%"spark-streaming_2.11"%"2.2.0"
//libraryDependencies +="org.apache.spark"%"spark-streaming-flume_2.11"%"2.2.0"
然后在src->main->scala目录下新建Scala Class,选择Kind为Object,命名为mongoTest .scala:
import com.mongodb.spark._
import org.apache.spark.{SparkConf, SparkContext}
import org.bson._
object mongoTest {
def main(args: Array[String]): Unit = {
//这里把"mongodb://127.0.0.1/forest.lengconglin"替换为自己的地址和数据库集合
val conf = new SparkConf()
.setMaster("local")
.setAppName("Mingdao-Score")
.set("spark.mongodb.input.uri", "mongodb://127.0.0.1/forest.lengconglin?readPreference=primaryPreferred")
.set("spark.mongodb.output.uri", "mongodb://127.0.0.1/forest.lengconglin")
val sc = new SparkContext(conf)
//往MongDB写入数据
val documents = sc.parallelize((1 to 10).map(i => Document.parse(s"{test: $i}")))
MongoSpark.save(documents)
//从MongDB读取数据
val rdd = MongoSpark.load(sc)
rdd.foreach(println)
println("Count:"+rdd.count
}
}
编译运行测试,可以得到结果。