Spark Scala重命名所有列名

最新推荐文章于 2025-12-07 00:41:20 发布

原创最新推荐文章于 2025-12-07 00:41:20 发布 · 92 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#spark #scala #大数据 #分布式 #开发语言

该博客展示了如何在Scala和Spark中创建DataFrame，然后重命名列。首先，通过定义一个数据列表和对应的模式创建DataFrame。接着，通过`withColumnRenamed`方法重命名单个列，将'Category'列改为'category_new'。最后，通过遍历原始列名并添加'_new'后缀，重命名所有列并将新的列名应用于DataFrame。

构造一个dataframe

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val data = 
Array(List("Category A", 100, "This is category A"),
List("Category B", 120, "This is category B"),
List("Category C", 150, "This is category C"))

// Create a schema for the dataframe
val schema =
  StructType(
    StructField("Category", StringType, true) ::
    StructField("Count", IntegerType, true) ::
    StructField("Description", StringType, true) :: Nil)

// Convert list to List of Row
val rows = data.map(t=>Row(t(0),t(1),t(2))).toList

// Create RDD
val rdd = spark.sparkContext.parallelize(rows)

// Create data frame
val df = spark.createDataFrame(rdd,schema)
print(df.schema)
df.show()

+----------+-----+------------------+
|  Category|Count|       Description|
+----------+-----+------------------+
|Category A|  100|This is category A|
|Category B|  120|This is category B|
|Category C|  150|This is category C|
+----------+-----+------------------+

重命名一列:

val df2 = df.withColumnRenamed("Category", "category_new")
df2.show()

scala> df2.show()
+------------+-----+------------------+
|category_new|Count|       Description|
+------------+-----+------------------+
|  Category A|  100|This is category A|
|  Category B|  120|This is category B|
|  Category C|  150|This is category C|
+------------+-----+------------------+

重命名所有列:(每个列名转为小写并在末尾增加_new)

# Rename columns
val new_column_names=df.columns.map(c=>c.toLowerCase() + "_new")
val df3 = df.toDF(new_column_names:_*)
df3.show()

scala> df3.show()
+------------+---------+------------------+
|category_new|count_new|   description_new|
+------------+---------+------------------+
|  Category A|      100|This is category A|
|  Category B|      120|This is category B|
|  Category C|      150|This is category C|
+------------+---------+------------------+

还可以替换列名中的.或者其它字符

c.toLowerCase().replaceAll("\\.", "_") + "_new"

转自:https://kontext.tech/column/spark/527/scala-change-data-frame-column-names-in-spark