java dataframe map,Spark DataFrame列转换为Map类型和Map类型列表

此博客介绍如何使用Spark DataFrame将输入数据转换为两种不同格式:按customerId分组的headerLine映射和列表。作者提供了使用UDF实现的详细步骤,并展示了预期的输出样例。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

I have dataframe as below and Appreciate if someone can help me to get the output in below different format.

Input:

|customerId|transHeader|transLine|

|1001 |1001aa |1001aa1 |

|1001 |1001aa |1001aa2 |

|1001 |1001aa |1001aa3 |

|1001 |1001aa |1001aa4 |

|1002 |1002bb |1002bb1 |

|1002 |1002bb |1002bb2 |

|1002 |1002bb |1002bb3 |

|1002 |1002bb |1002bb4 |

|1003 |1003cc |1003cc1 |

|1003 |1003cc |1003cc2 |

|1003 |1003cc |1003cc3 |

+----------+-----------+---------+

Expected OutputSet 1:

customerId headerLineMapGroup

1001 Map(1001aa -> (1001aa1, 1001aa2, 1001aa3, 1001aa4))

1002 Map(1002bb -> (1002bb1, 1002bb2, 1002bb3, 1002bb4))

1003 Map(1003cc -> (1003cc1, 1003cc2, 1003cc3))

Expected OutputSet 2:

customerId headerLineListOfMapGroup

1001 List[ Map(1001aa -> 1001aa1), Map(1001aa ->1001aa2), Map(1001aa ->1001aa3), Map(1001aa ->1001aa4) ]

1002 List[ Map(1002bb -> 1002bb1), Map(1002bb -> 1002bb2), Map(1002bb -> 1002bb3), Map(1002bb -> 1002bb4)]

1003 List[ Map(1003cc -> 1003cc1), Map(1003cc ->1003cc2), Map(1003cc ->1003cc3) ]

解决方案

Here is the solution using udf.

val spark = SparkSession

.builder()

.master("local")

.appName("ParquetAppendMode")

.getOrCreate()

import spark.implicits._

val data = spark.sparkContext.parallelize(Seq(

(1001, "1001aa","1001aa1"),

(1001, "1001aa","1001aa2"),

(1001, "1001aa","1001aa3")

)).toDF("customerId", "transHeader", "transLine")

val toMap = udf((header: String, line: Seq[String]) => {

Map(header -> line)

})

val toMapList = udf((header: String, line: Seq[String]) => {

line.map(l => Map(header -> l)).toList

})

val grouped = data.groupBy("customerId", "transHeader").agg(collect_list("transLine").alias("transLine"))

grouped.withColumn("headerLineMapGroup", toMap($"transHeader", $"transLine"))

.drop("transHeader", "transLine")

.show(false)

grouped.withColumn("headerLineMapGroupList", toMapList($"transHeader", $"transLine"))

.drop("transHeader", "transLine")

.show(false)

Hope this helps!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值