一、运行环境
- Linux
- Scala 2.11.12
- Spark 2.4.0
- sbt 1.2.8 (可以通过/usr/local/sbt/bin/sbt sbtVersion查看)
- Scala IDE for eclipse
其中,sbt的安装可以参考:https://blog.youkuaiyun.com/wangkai_123456/article/details/88928953
二、创建scala eclipse项目
2.1 创建工程目录
在Scala IDE for eclipse工作目录下创建工程目录UseRest
mkdir UseRest
然后进入UseRest目录,创建如下所示的目录结构:
UseRest
|__ src
| |__ main
| | |__ scala
|
|__ project
| |__ build.properties
| |__ plugins.sbt
|
|__ build.sbt
更加详细的工程目录可参考:https://blog.youkuaiyun.com/wangkai_123456/article/details/88929459
其中,build.properties文件中配置内容如下:
sbt.version=1.2.8
plugins.sbt文件中配置内容如下:
addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "5.2.4")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.9")
build.sbt文件中配置内容如下:
ThisBuild / scalaVersion := "2.11.12"
ThisBuild / organization := "org.kaidy"
// Dependencies for spark
// Excluding JAR files that are already part of the container (like Spark), consider scoping the dependent library to "provided" configuration
val sparkCore = "org.apache.spark" %% "spark-core" % "2.4.0"
val sparkSql = "org.apache.spark" %% "spark-sql" % "2.4.0"
// https://mvnrepository.com/artifact/org.scalaj/scalaj-http
val scalaJson = "org.scalaj" %% "scalaj-http" % "2.4.1"
lazy val root = (project in file("."))
.settings(
name := "UseRest",
version := "1.0",
libraryDependencies ++= Seq(
sparkCore % "provided",
sparkSql % "provided"
),
libraryDependencies += scalaJson
)
2.2 生成eclipse项目文件
进入工程主目录UseRest,输入如下命令创建eclipse应用程序
sbt eclipse
等上述命令执行结束之后,可以看到成功创建Eclipse应用程序信息如下:
.....省略一些信息
[info] About to create Eclipse project files for your project(s).
[info] Successfully created Eclipse project files for project(s):
[info] UseRest
sbt:UseRest>
2.3 导入Eclipse中
ok,创建好上述应用程序之后,我们就可以打开eclipse,导入这个刚创建的应用程序。具体操作为File->Import->Existing Projects into Workspace导入之前创建的工程即可。
三、Eclipse开发
导入工程之后,在src/main/scala目录下创建Package(kaidy.spark),接着在Package(kaidy.spark)下创建工程文件UseRest.scala,这里文件类型为Scala Object。输入如下代码:
package kaidy.spark
import scalaj.http.Http
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
object UseRest {
def main(args: Array[String]) {
// val sparkConf = new SparkConf().setAppName("UseRest")
val sparkConf = new SparkConf().setMaster("local").setAppName("UseRest")
val spark = SparkSession.builder().config(sparkConf).getOrCreate()
val csvFilePath = "hdfs://10.62.124.41:8020/tomasdata/ems/mobile/lte/tdd/sdr/bandwidthresource/2019-01-01/beijingshi/BandwidthResource_tmpfile_215120022913.csv.utf-8"
val df = spark.read.format("CSV").option("header","true").load(csvFilePath)
println(df.count())
val response = Http("http://10.62.124.25:9200/_cat/health?v").asString
println(response.code)
println(response.body)
spark.stop()
}
}
最后点击运行按钮,Run As->Scala Application。即可在Console看到如下输出:
4233
200
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1554023173 09:06:13 ES-cluster green 2 2 272 220 0 0 0 0 - 100.0%
四、打包发布(assembly)
sbt本身支持的package命令在打包时不支持将第三方依赖打包到one jar中,带依赖的one jar打包方式还是assembly比较成熟。
assembly插件资料:https://github.com/sbt/sbt-assembly#excluding-jars-and-files
Assembly是作为一种插件的,要在project下面的plugins.sbt里面配置,所以plugins.sbt文件里内容如下:
addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "5.2.4")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.9")
进入工程主目录UseRest,输入如下命令生成带依赖的one jar
sbt assembly
等上述命令执行结束之后,可以看到成功生成带依赖的one jar信息如下:
[info] Loading settings for project userest-build from plugins.sbt ...
[info] Loading project definition from /home/***/workspace/scala_workspace/UseRest/project
[info] Loading settings for project root from build.sbt ...
[info] Set current project to UseRest (in build file:/***/workspace/scala_workspace/UseRest/)
[info] Strategy 'discard' was applied to a file (Run the task at debug level to see details)
[info] Assembly up to date: /home/***/workspace/scala_workspace/UseRest/target/scala-2.11/UseRest-assembly-1.0.jar
[success] Total time: 7 s, completed 2019-3-31 19:23:41
可以看到,最终生成的flat jar为UseRest-assembly-1.0.jar,可以通过spark-submit方式提交到集群运行。
至此,使用SBT构建scala项目(导入Eclipse开发)教程编写完成。