RDD -》 DF
有两种方式
一、
一、Inferring the Schema Using Reflection
将 RDD[t] 转为一个 object ,然后 to df
val peopleDF = spark.sparkContext
.textFile("examples/src/main/resources/people.txt")
.map(_.split(","))
.map(attributes => Person(attributes(0), attributes(1).trim.toInt))
.toDF()
rdd 也能直接装 DATASet 要 import 隐式装换 类 import Spark.implicits._
如果 转换的对象为 tuple . 转换后 下标为 _1 _2 .....
二、Programmatically Specifying the Schema
把 columnt meta 和 rdd createDataFrame 在一起
val peopleRDD = spark.sparkContext.textFile("examples/src/main/resources/people.txt")
// The schema is encoded in a string
val schemaString = "name age"
// Generate the schema based on the string of schema
val fields = schemaString.split(" ")
.map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields)
val rowRDD = peopleRDD
.map(_.split(","))
.map(attributes => Row(attributes(0), attributes(1).trim))
// Apply the schema to the RDD
val peopleDF = spark.createDataFrame(rowRDD, schema)
// Creates a temporary view using the DataFrame
peopleDF.createOrReplaceTempView("people")
DF to RDd
val tt = teenagersDF.rdd
本文介绍了两种将Spark中的RDD转换为DataFrame的方法:一种是通过反射Inferring the Schema,另一种是Programmatically Specifying the Schema,详细阐述了转换过程,并提及了DataFrame到RDD的转换。
1735

被折叠的 条评论
为什么被折叠?



