关于spark使用DF写入到数据库mysql-优快云博客

本文链接：https://blog.youkuaiyun.com/u012400305/article/details/74691457

package spark

import java.util.Properties

import org.apache.spark.SparkContext
import org.apache.spark.sql.{Row, SaveMode}
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}

/**
  * Created by sunfei on 2017/7/7. OK
  */
object sparkTOsql {
  def main(args: Array[String]): Unit = {

    val url = "jdbc:mysql://10.10.10.158:3306/spark?user=root&password=123456"

    val sc = new SparkContext
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    val schema = StructType(
      StructField("name", StringType) ::
        StructField("count", IntegerType)
        :: Nil)

    val data = sc.parallelize(List(("www", 30), ("baidu", 29),
      ("com", 40), ("bt", 33), ("cn", 23))).
      map(item => Row.apply(item._1, item._2))
    import sqlContext.implicits._

    val df = sqlContext.createDataFrame(data, schema)
//    df.createJDBCTable(url, "bolg", false)
//    df.insertIntoJDBC(url, "blog", false)

    val url2 = "jdbc:mysql://10.10.10.158:3306/spark"
    val connectionProperties2 = new Properties()
    connectionProperties2.setProperty("user", "root");// 设置用户名
    connectionProperties2.setProperty("password", "123456");// 设置密码
    df.write.mode(SaveMode.Append).jdbc(url2, "blog", connectionProperties2)
    sc.stop
  }
}

前面是关于rdd转换成DF的操作新建一个scheme 然后对应

SparkSQL提供了两种方式把RDD转换为DataFrame。

第一种通过反射(前提是知道schema)，第二种通过提供的接口创建schema。

通过反射：

scala提供了一种通过case class把RDD转换为DataFrame，case clasee定义了表结构，通过反射的方式读取参数并转换为字段，case class也可以是嵌套的复杂序列或数组。这样RDD就可以隐式的转换为DataFrame，df再注册为内存表，就可以通过sql操作此表。

 
  // sc is an existing SparkContext.  
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)  
 // 用来隐式转换 RDD 为  DataFrame.  
 import sqlContext.implicits._  
   
 // 通过 case class 定义schema.  
 // Note: Case classes 在 Scala 2.10 只支持最多 22 字段. 可以自定义接口突破这个限制.  
 case class Person(name: String, age: Int)  
   
 // 创建一个Person的RDD并注册成表.  
 val people = sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)).toDF()  
 people.registerTempTable("people")  
   
 // 通过sqlContext执行SQL操作内存表.  
 val teenagers = sqlContext.sql("SELECT name, age FROM people WHERE age >= 13 AND age <= 19")  
   
 // SQL的查询结果是DataFrame.  
 

[plain]view plaincopy 
   
 //字段可以通过下标来获得  
 teenagers.map(t => "Name: " + t(0)).collect().foreach(println)  
   
 // 或用字段名:  
 teenagers.map(t => "Name: " + t.getAs[String]("name")).collect().foreach(println)  
   
 // row.getValuesMap[T] retrieves multiple columns at once into a Map[String, T]  
 teenagers.map(_.getValuesMap[Any](List("name", "age"))).collect().foreach(println)  
 // Map("name" -> "Justin", "age" -> 19)  

通过接口自定义schema:

当某些情况下case class不能提前定义时，就用这种方法，一般分三步：

1.通过原始RDD创建RDD的Rows

2.通过StructType匹配RowS里的结构创建schema

3.通过SQLContext提供的createDataFrame(row,schema)方法创建DataFrame

例：scala

[plain]view plaincopy 
   
 // sc is an existing SparkContext.  
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)  
   
 // 创建 RDD  
 val people = sc.textFile("examples/src/main/resources/people.txt")  
   
 // The schema is encoded in a string  
 val schemaString = "name age"  
   
 // Import Row.  
 import org.apache.spark.sql.Row;  
   
 // Import Spark SQL data types  
 import org.apache.spark.sql.types.{StructType,StructField,StringType};  
   
 // 通过接口定义schema  
 val schema =  
   StructType(  
     schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))  
   
 // 把RDD (people) 转换为 Rows.  
 val rowRDD = people.map(_.split(",")).map(p => Row(p(0), p(1).trim))  
   
 // Apply the schema to the RDD.  
 val peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema)  
   
 // df注册内存表.  
 peopleDataFrame.registerTempTable("people")  
   
 // sqlContext执行SQL返回结果df.  
 val results = sqlContext.sql("SELECT name FROM people")  
 <pre name="code" class="plain">// SQL的查询结果是DataFrame.