rdd和DF数据存入MYSQL

最新推荐文章于 2024-03-22 09:01:30 发布

转载最新推荐文章于 2024-03-22 09:01:30 发布 · 640 阅读

Spark 专栏收录该内容

55 篇文章

订阅专栏

本文介绍使用Spark通过RDD和DataFrame两种方式批量将数据写入MySQL数据库的方法。具体包括使用RDD配合foreachPartition进行数据批量插入，以及利用DataFrame的write.jdbc方法进行数据更新，后者支持清空原有数据。

http://blog.youkuaiyun.com/dabokele/article/details/52802150

1.通过RDD函数批量存入数据

[java]view plaincopy
object RDDtoMysql {  
  def myFun(iterator: Iterator[(String, Int)]): Unit = {  
    var conn: Connection = null  
    var ps: PreparedStatement = null  
    val sql = "insert into sparktomysql(name, age) values (?, ?)"  
    try {  
         conn = DriverManager.getConnection("jdbc:mysql://127.0.0.1:3306/test_dw","test_dw", "123456")  
         iterator.foreach(data => {  
          ps = conn.prepareStatement(sql)  
          ps.setString(1, data._1)  
          ps.setInt(2, data._2)  
          ps.executeUpdate()  
        }  
      )  
    } catch {  
      case e: Exception => println("Mysql Exception")  
    } finally {  
      if (ps != null) {  
        ps.close()  
      }  
      if (conn != null) {  
        conn.close()  
      }  
    }  
  }  
  
  def main(args: Array[String]) {  
    val conf = new SparkConf().setAppName("RDDToMysql").setMaster("local")  
    val sc = new SparkContext(conf)  
    val data = sc.parallelize(List(("www", 10), ("iteblog", 20), ("com", 30)))  
    data.foreachPartition(myFun) //批量导入  
  }  
}  

2.DataFrame类操作mysql存入(适用于新建表和清空原来数据)

[java]view plaincopy
def main(args: Array[String]): Unit = {  
val url = "jdbc:mysql://localhost:3306/spark?user=iteblog&password=iteblog"  
val sc = new SparkContext  
val sqlContext = new org.apache.spark.sql.SQLContext(sc)  
val schema = StructType(  
StructField("name", StringType) ::  
StructField("age", IntegerType)  
    :: Nil)  
val data = sc.parallelize(List(("iteblog", 30), ("iteblog", 29),("com", 40), ("bt", 33), ("www", 23))).map(item => Row.apply(item._1, item._2))  
val df = sqlContext.createDataFrame(data, schema)  
    df.insertIntoJDBC(url, "sparktomysql", true)//true代表删除原来数据进行插入  
    sc.stop  
  }  

此方法，在新版本 spark里面已经注销掉了。 .write.jdbc() 替代之了

definsertIntoJDBC(url: String, table: String, overwrite: Boolean): Unit

Save this DataFrame to a JDBC database at url under the table name table. Assumes the table already exists and has a compatible schema. If you pass truefor overwrite, it will TRUNCATE the table before performing the INSERTs.

The table must already exist on the database. It must have a schema that is compatible with the schema of this RDD; inserting the rows of the RDD in order via the simple statement INSERT INTO table VALUES (?, ?, ..., ?) should not fail.

Annotations

@deprecated

Deprecated

(Since version 1.4.0) Use write.jdbc(). This will be removed in Spark 2.0.