最近有需求,要将spark的数据写入es.在网上查找了一番,再测试过后,顺利将任务完成,记录下.
直接上代码:
pom文件:
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.3</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.3</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.36</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-20_2.11</artifactId>
<version>6.2.4</version>
</dependency>
<!-- 读取配置文件 -->
<dependency>
<groupId>commons-configuration</groupId>
<artifactId>commons-configuration</artifactId>
<version>1.5</version>
</dependency>
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.10</version>
</dependency>
</dependencies>
代码:
package cn.demo
import java.util.Properties
import org.apache.spark.SparkConf
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.elasticsearch.spark.sql.EsSparkSQL
/**
* author:Administrator
* name:ESDemo
*/
object ESDemo {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setAppName(ESDemo.getClass.getName).setMaster("local")
sparkConf.set("es.nodes","192.168.0.61")
sparkConf.set("es.port","9200")
sparkConf.set("es.index.auto.create", "true")
sparkConf.set("es.write.operation", "index")
val sparkSession: SparkSession = SparkSession.builder().config(sparkConf).getOrCreate()
val url: String = "jdbc:mysql://localhost:3306/testdb"
val table: String = "courses"
val properties: Properties = new Properties()
properties.put("user","root")
properties.put("password","123456")
properties.put("driver","com.mysql.jdbc.Driver")
val course: DataFrame = sparkSession.read.jdbc(url,table,properties)
course.show()
EsSparkSQL.saveToEs(course,"course")
sparkSession.stop()
}
}
demo里是在数据写入es的时候自动创建的索引,mapping映射是自动创建的,在实际生产过程中一般需要先创建好es的索引,建立好mapping映射.
es对hadoop,hive, spark等这些大数据项目都有支持,具体可以看看es官方文档.
参考:es官方文档