1-配置
首先,我在虚拟上,搭建了一个单机spark2.4.1(无hadoop)。然后在本地的IDEA中远程运行spark,操作一个svm的小例子。
sbt文件:
name := "spark_ml_examples"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.1"
libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "2.4.1"
libraryDependencies += "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.4.1"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.4.1"
libraryDependencies += "org.apache.spark" % "spark-mllib_2.11" % "2.4.1"
libraryDependencies += "org.json4s" %% "json4s-jackson" % "{latestVersion}"
其中kafka和stream这里没有用到,因为之前写的,没有移除。关于加载的jar包,我提供几个地址去查询,直接给链接。
https://www.mvnjar.com/org.apache.spark/list.html
h