spark rdd 和 DF 转换

最新推荐文章于 2023-08-01 07:45:12 发布

原创最新推荐文章于 2023-08-01 07:45:12 发布 · 1.8k 阅读

0 ·

CC 4.0 BY-SA版权

本文介绍了两种将Spark中的RDD转换为DataFrame的方法：一种是通过反射Inferring the Schema，另一种是Programmatically Specifying the Schema，详细阐述了转换过程，并提及了DataFrame到RDD的转换。

 
  分类：
 
  python-spark（14）  
  SPARK（54）

RDD -》 DF

有两种方式

一、

一、Inferring the Schema Using Reflection

将 RDD[t] 转为一个 object ,然后 to df

val peopleDF = spark.sparkContext
  .textFile("examples/src/main/resources/people.txt")
  .map(_.split(","))
  .map(attributes => Person(attributes(0), attributes(1).trim.toInt))
  .toDF()

rdd 也能直接装 DATASet 要 import 隐式装换类 import Spark.implicits._

如果转换的对象为 tuple . 转换后下标为 _1 _2 .....

二、Programmatically Specifying the Schema

把 columnt meta 和 rdd createDataFrame 在一起

val peopleRDD = spark.sparkContext.textFile("examples/src/main/resources/people.txt")

// The schema is encoded in a string
val schemaString = "name age"

// Generate the schema based on the string of schema
val fields = schemaString.split(" ")
  .map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields)

val rowRDD = peopleRDD
  .map(_.split(","))
  .map(attributes => Row(attributes(0), attributes(1).trim))

// Apply the schema to the RDD
val peopleDF = spark.createDataFrame(rowRDD, schema)

// Creates a temporary view using the DataFrame
peopleDF.createOrReplaceTempView("people")

DF to RDd

val tt = teenagersDF.rdd