spark程序编写要点

最新推荐文章于 2024-04-13 08:00:00 发布

原创最新推荐文章于 2024-04-13 08:00:00 发布 · 524 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#spark #函数

本文介绍了Spark SQL编程的一些重要要点，包括导入functions包以便使用函数，避免join操作并利用window函数，详细说明了如何从csv文件中读取数据，以及SparkSession的启动方式。此外，还提到了获取DataFrame字段值的方法和关闭Spark连接的步骤。

1、import org.apache.spark.sql.functions._
sparksql中的函数需要引入这个包

2、尽量不适用join操作。推荐使用window窗口函数操作。
val resdf = sdf.withColumn(“age_avg”, avg(“weight”).over(Window.partitionBy(“sex”)))
.withColumn(“weight_min”, min(“weight”).over(Window.partitionBy(“sex”)))

3、spark2.x中读入csv文件方法，其中option可以控制header首行是否当做模式读入
val sdf1 = spark.read.option(“header”, “true”).csv(“E:\traindata\ml-100k\test.csv”).toDF

4、printSchema()或 printSchema打印模式

5、sparksesson启动方式
val spark = SparkSession
.builder()
.appName(this.getClass.getName)
.master(“local[2]”)
.getOrCreate()

7、dataframe fildeIndex获取字段对应的值

8、spark.close()

9、根据模式生成dataframe的方法

import org.apache.spark.sql.types._

// Create an RDD
val peopleRDD = spark.sparkContext.textFile("examples/src/main/resources/people.txt")

// The schema is encoded in a string
val schemaString = "name age"

// Generate the schema based on the string of schema
val fields = schemaString.split(" ")
  .map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields)

// Convert records of the RDD (people) to Rows
val rowRDD = peopleRDD
  .map(_.split(","))
  .map(attributes => Row(attributes(0), attributes(1).trim))

// Apply the schema to the RDD
val peopleDF = spark.createDataFrame(rowRDD, schema)

// Creates a temporary view using the DataFrame
peopleDF.createOrReplaceTempView("people")

// SQL can be run over a temporary view created using DataFrames
val results = spark.sql("SELECT name FROM people")

// The results of SQL queries are DataFrames and support all the normal RDD operations
// The columns of a row in the result can be accessed by field index or by field name
results.map(attributes => "Name: " + attributes(0)).show()
// +-------------+
// |        value|
// +-------------+
// |Name: Michael|
// |   Name: Andy|
// | Name: Justin|
// +-------------+