pivot和lateral view
参考:
pivot用于列转行
lateral view用于行转列
- 准备数据
val df = spark.createDataFrame(Array(("数学", "张三", 88), ("语文", "张三", 92), ("英语", "张三", 77),
("数学", "王五", 65), ("语文", "王五", 87), ("英语", "王五", 90),
("数学", "李雷", 67), ("语文", "李雷", 33), ("英语", "李雷", 24),
("数学", "宫九", 77), ("语文", "宫九", 87), ("英语", "宫九", 90)))
.toDF("type", "name", "score")
df.createOrReplaceTempView("scores")

2. pivot列转行
groupby() 值为结果行索引
pivot(“name”) name为结果列索引
for 列索引 in (values)
sum(score) 表示值
//没有行索引
//hql内置
val te=spark.sql(
"""
|select * from
|(select name,score from scores) a
|pivot (sum(score) for name in ('张三','王五','李雷','宫九'))
|""".stripMargin)
te.createOrReplaceTempView("t")
//函数
df.groupBy().pivot("name").sum("score")
=================================
//有行索引
//hql内置
val te=spark.sql(
"""
|select * from scores
|pivot (sum(score) for name in ('张三','王五','李雷','宫九'))
|""".stripMargin)
te.createOrReplaceTempView("t")
//函数
df.groupBy("type").pivot("name").sum("score")
没有行索引

有行索引
- lateral view
//数据准备
spark.createDataFrame(
Array((1,"苹果",1000,2000,3300,5000),
(2,"橘子",3000,3000,3200,1500),
(3,"香蕉",2500,3500,2200,2500),
(4,"葡萄",1500,2500,1200,3500)))
.toDF("ID","NAME","q1","q2","q3","q4")
.createOrReplaceTempView("fruit1")
spark.sql("""select * from fruit1""").show()

//lateral view实现unpivot
spark.sql(
"""
|select id , name, ts from
|(
|select id,name,concat_ws(',',q1,q2,q3,q4) as qs from fruit1
|) tt
|lateral view explode(split(qs,',')) tmp as ts
|""".stripMargin
)
.show()

- 试试把第二步的转回去
map(k,v,k,v,k,v…) :字典
str_to_map(字符串,分隔符1默认",“,分隔符2默认”=")
str_to_map(‘k:v,k:v’,‘,’,‘:’)
str_to_map(concat_ws)
除了explode,还可以
- explode([array,map])
- posexplode(array) //加上了index列
- inline([array,structs])
- stack(n,value) //n表示结果多少行,values为值
df.groupBy("type").pivot("name")
.sum("score")
.createOrReplaceTempView("testScore")
//lateral view实现unpivot
spark.sql(
s"""
|select type,name,score from
|(
|select type,map("宫九",`宫九`,"张三",`张三`,"李雷",`李雷`,"王五",`王五`) as ns from testScore
|) a
|lateral view explode(ns) tmp1 as name,score
|""".stripMargin).show()

本文介绍了使用pivot进行列转行及lateralview实现行转列的技术。通过具体示例展示了如何利用这些技术处理数据,包括数据准备、转换操作以及常见函数的使用。
1451

被折叠的 条评论
为什么被折叠?



