Spark Scala重命名所有列名

该博客展示了如何在Scala和Spark中创建DataFrame,然后重命名列。首先,通过定义一个数据列表和对应的模式创建DataFrame。接着,通过`withColumnRenamed`方法重命名单个列,将'Category'列改为'category_new'。最后,通过遍历原始列名并添加'_new'后缀,重命名所有列并将新的列名应用于DataFrame。

构造一个dataframe

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val data = 
Array(List("Category A", 100, "This is category A"),
List("Category B", 120, "This is category B"),
List("Category C", 150, "This is category C"))

// Create a schema for the dataframe
val schema =
  StructType(
    StructField("Category", StringType, true) ::
    StructField("Count", IntegerType, true) ::
    StructField("Description", StringType, true) :: Nil)

// Convert list to List of Row
val rows = data.map(t=>Row(t(0),t(1),t(2))).toList

// Create RDD
val rdd = spark.sparkContext.parallelize(rows)

// Create data frame
val df = spark.createDataFrame(rdd,schema)
print(df.schema)
df.show()
+----------+-----+------------------+
|  Category|Count|       Description|
+----------+-----+------------------+
|Category A|  100|This is category A|
|Category B|  120|This is category B|
|Category C|  150|This is category C|
+----------+-----+------------------+

重命名一列:

val df2 = df.withColumnRenamed("Category", "category_new")
df2.show()
scala> df2.show()
+------------+-----+------------------+
|category_new|Count|       Description|
+------------+-----+------------------+
|  Category A|  100|This is category A|
|  Category B|  120|This is category B|
|  Category C|  150|This is category C|
+------------+-----+------------------+

重命名所有列:(每个列名转为小写并在末尾增加_new)

# Rename columns
val new_column_names=df.columns.map(c=>c.toLowerCase() + "_new")
val df3 = df.toDF(new_column_names:_*)
df3.show()
scala> df3.show()
+------------+---------+------------------+
|category_new|count_new|   description_new|
+------------+---------+------------------+
|  Category A|      100|This is category A|
|  Category B|      120|This is category B|
|  Category C|      150|This is category C|
+------------+---------+------------------+

还可以替换列名中的.或者其它字符

c.toLowerCase().replaceAll("\\.", "_") + "_new"

转自:https://kontext.tech/column/spark/527/scala-change-data-frame-column-names-in-spark

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值