http://blog.youkuaiyun.com/dabokele/article/details/52802150
1.通过RDD函数批量存入数据
- object RDDtoMysql {
- def myFun(iterator: Iterator[(String, Int)]): Unit = {
- var conn: Connection = null
- var ps: PreparedStatement = null
- val sql = "insert into sparktomysql(name, age) values (?, ?)"
- try {
- conn = DriverManager.getConnection("jdbc:mysql://127.0.0.1:3306/test_dw","test_dw", "123456")
- iterator.foreach(data => {
- ps = conn.prepareStatement(sql)
- ps.setString(1, data._1)
- ps.setInt(2, data._2)
- ps.executeUpdate()
- }
- )
- } catch {
- case e: Exception => println("Mysql Exception")
- } finally {
- if (ps != null) {
- ps.close()
- }
- if (conn != null) {
- conn.close()
- }
- }
- }
- def main(args: Array[String]) {
- val conf = new SparkConf().setAppName("RDDToMysql").setMaster("local")
- val sc = new SparkContext(conf)
- val data = sc.parallelize(List(("www", 10), ("iteblog", 20), ("com", 30)))
- data.foreachPartition(myFun) //批量导入
- }
- }
2.DataFrame类操作mysql存入(适用于新建表和清空原来数据)
- def main(args: Array[String]): Unit = {
- val url = "jdbc:mysql://localhost:3306/spark?user=iteblog&password=iteblog"
- val sc = new SparkContext
- val sqlContext = new org.apache.spark.sql.SQLContext(sc)
- val schema = StructType(
- StructField("name", StringType) ::
- StructField("age", IntegerType)
- :: Nil)
- val data = sc.parallelize(List(("iteblog", 30), ("iteblog", 29),("com", 40), ("bt", 33), ("www", 23))).map(item => Row.apply(item._1, item._2))
- val df = sqlContext.createDataFrame(data, schema)
- df.insertIntoJDBC(url, "sparktomysql", true)//true代表删除原来数据进行插入
- sc.stop
- }
本文介绍使用Spark通过RDD和DataFrame两种方式批量将数据写入MySQL数据库的方法。具体包括使用RDD配合foreachPartition进行数据批量插入,以及利用DataFrame的write.jdbc方法进行数据更新,后者支持清空原有数据。
709

被折叠的 条评论
为什么被折叠?




Save this DataFrame to a JDBC database at
urlunder the table nametable. Assumes the table already exists and has a compatible schema. If you passtrueforoverwrite, it willTRUNCATEthe table before performing theINSERTs.The table must already exist on the database. It must have a schema that is compatible with the schema of this RDD; inserting the rows of the RDD in order via the simple statement
INSERT INTO table VALUES (?, ?, ..., ?)should not fail.Annotations-
@deprecated
Deprecated
-
(Since version 1.4.0) Use write.jdbc(). This will be removed in Spark 2.0.