Flink代码之数据源，算子，分流，富函数(一)_flinksql使用富函数-优快云博客

本文链接：https://blog.youkuaiyun.com/YellowXiuHui/article/details/106823728

本文介绍了Apache Flink的基础用法，包括从元素集合和文件中创建数据源，使用Map、FlatMap、Filter等算子进行数据处理，详细探讨了KeyBy分流操作，并讲解了富函数的运用。通过实例展示了Flink的基本操作流程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、入门，helloworld级别

1.worldCount 来自元素集合

import org.apache.flink.streaming.api.scala._

object WordCountFromBatch {
   
  def main(args: Array[String]): Unit = {
   
    // 获取运行时环境，类似SparkContext
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    // 并行任务的数量设置为1
    // 全局并行度
    env.setParallelism(1) // 只有DataStream可以设置并行度

    val stream = env
      .fromElements(
        "zuoyuan",
        "hello world",
        "zuoyuan",
        "zuoyuan"
      )
      .setParallelism(1)

    // 对数据流进行转换算子操作
    val textStream = stream
      // 使用空格来进行切割输入流中的字符串
      .flatMap(r => r.split("\\s"))
      .setParallelism(2)
      // 做map操作, w => (w, 1)
      .map(w => WordWithCount(w, 1))
      .setParallelism(2)
      // 使用word字段进行分组操作，也就是shuffle
      .keyBy(0)
      // 做聚合操作，类似与reduce
      .sum(1)
        .setParallelism(2)

    // 将数据流输出到标准输出，也就是打印
    // 设置并行度为1，print算子的并行度就是1，覆盖了全局并行度
    textStream.print().setParallelism(2)

    // 不要忘记执行！
    env.execute()
  }

  case class WordWithCount(word: String, count: Int)
}

2.worldCount 来自元素集合

import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.time.Time

object WordCountFromSocket {
   
  def main(args: Array[String]): Unit = {
   
    // 获取运行时环境，类似SparkContext
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    // 并行任务的数量设置为1
    env.setParallelism(1)

    // 数据源来自socket端口
    // 本地是localhost，如果是虚拟机的话，可能是`hadoop102`之类的
    // 本地启动一个`nc -lk 9999`
    // 你们可能需要在hadoop102的终端启动`nc -lk 9999`
    val stream = env.socketTextStream("localhost", 9999, '\n')

    // 对数据流进行转换算子操作
    val textStream = stream
      // 使用空格来进行切割输入流中的字符串
      .flatMap(r => r.split("\\s"))
      // 做map操作, w => (w, 1)
      .map(w => WordWithCount(w, 1))
      // 使用word字段进行分组操作，也就是shuffle
      .keyBy(0)
      // 分流后的每一条流上，开5s的滚动窗口
      .timeWindow(Time.seconds(5))
      // 做聚合操作，类似与reduce
      .sum(1)

    // 将数据流输出到标准输出，也就是打印
    textStream.print()

    // 不要忘记执行！
    env.execute()
  }

  case class WordWithCount(word: String, count: Int)
}

二、数据源系列

1.来自文件

import org.apache.flink.streaming.api.scala._

object SourceFromFile {
   
  def main(args: Array[String]): Unit = {
   
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)

    val stream = env
      .readTextFile("/Users/yuanzuo/Desktop/flink-tutorial/FlinkSZ1128/src/main/resources/sensor.txt")
      .map(r => {
   
        // 使用逗号切割字符串
        val arr = r.split(",")
        SensorReading(arr(0), arr(1).toLong, arr(2).toDouble)
      })

    stream.print()
    env.execute()
  }
}

2.来自自定义数据源

import org.apache.flink.streaming.api.scala._

object SourceFromCustomDataSource {
   
  def main(args: Array[String]): Unit = {
   
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)

    val stream = env
      // 添加数据源
      .addSource(new SensorSource)

    stream.print()

    env.execute()
  }
}

样例类

// （温度传感器ID， 时间戳，温度值）
case class SensorReading(id: String,
                         timestamp: Long,
                         temperature: Double)

造的假数据，温度传感器

import java.util.Calendar

import org.apache.flink.streaming.api.functions.source