先定义一个case clas
case class people(id:Int, name:String, age:Int)
读取txt,转化为RDD
val rddpeople = sc.textFile("source path")
利用case class给RDD一个schema
val peopleSchema = rddpeople.map(row => row.split(" ")).map(field => people(field(0).toInt,field(1),field(2).toInt))
将RDD转为Dataframe
val peopleDf = peopleSchema.toDF()
完整代码:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.log4j._
import org.apache.spark.sql.SparkSession
case class people (id:Int,name:String,age:Int)
object test {
def main(args: Array[String]): Unit = {
Logger.getLogger("org").setLevel({Level.ERROR})
val conf = new SparkConf().setAppName("Interview").setMaster("local")
val sc = new SparkContext(conf)
val spark = SparkSession.builder().appName("test").getOrCreate()
import spark.implicits._
val rddpeople = sc.textFile("src/test/people.txt")
val peopleSchema = rddpeople.map(row => row.split(" ")).map(field => people(field(0).toInt,field(1),field(2).toInt))
val peopleDf = peopleSchema.toDF()
peopleDf.show()
}
}
Spark RDD到DataFrame转换
本文介绍如何使用Apache Spark将文本文件加载为RDD,并通过case class定义模式将其转换为DataFrame的过程。涉及Spark基本配置、日志级别设置、SparkSession创建、文本文件读取、RDD模式定义及转换等关键技术点。

被折叠的 条评论
为什么被折叠?



