Spark算子之map、flatMap

最新推荐文章于 2025-03-20 16:18:13 发布

Plume_WZ

最新推荐文章于 2025-03-20 16:18:13 发布

阅读量572

点赞数

本文链接：https://blog.youkuaiyun.com/wz272343078/article/details/94571628

版权

本文深入解析了RDD（弹性分布式数据集）中的两种核心转换操作：map和flatMap。通过具体示例和源码分析，阐述了这两种操作如何应用于数据处理流程中，帮助读者理解其工作原理及应用场景。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

map(func)：
源码

  /**
   * Return a new RDD by applying a function to all elements of this RDD.
   */
  def map[U: ClassTag](f: T => U): RDD[U] = withScope {
    val cleanF = sc.clean(f)
    new MapPartitionsRDD[U, T](this, (context, pid, iter) => iter.map(cleanF))
  }

作用 : 返回一个新的RDD，该RDD由每一个输入元素经过func函数转换后组成。

val arr: RDD[Int] = sc.makeRDD(Array(1, 2, 3, 4, 5))
val arrMap: RDD[Int] = arr.map(_ * 2)
val arrColl: Array[Int] = arrMap.collect()
for (aa <- arrColl) {
  	 print(aa + " ")
}

结果是：2 4 6 8 10

flatMap：
源码

  /**
   *  Return a new RDD by first applying a function to all elements of this
   *  RDD, and then flattening the results.
   */
  def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U] = withScope {
    val cleanF = sc.clean(f)
    new MapPartitionsRDD[U, T](this, (context, pid, iter) => iter.flatMap(cleanF))
  }

作用：通过首先将函数应用于此RDD的所有元素，然后展平结果，返回新的RDD。