Apache Flink(四):Flink对接Sink

Apache Flink的数据Sink负责将Data Stream写入外围系统,包括File Based(如writeAsText、Bucketing File Sink)、print()/printToErr、自定义Sink、Redis及Kafka Sink。文章介绍了各种Sink的使用,如File Based的至少一次处理语义,以及如何自定义和配置Redis及Kafka Sink。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Data Sink

Data sinks负责消费Data Stream的数据,将数据写出到外围系统。例如:文件/网络/NoSQL/RDBMS/Message Queue等。Flink底层也预定义了一些常用的Sinks,同时用户也可以根据实际需求定制Data Sink通过集成SinkFunction或者RichSinkFunction。

File Based

  • writeAsText()|writeAsCsv(…)|writeUsingOutputFormat() 处理语义: at-least-once(至少一次)
    //1.创建StreamExecutionEnvironment
    val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
    
    val output = new CsvOutputFormat[Tuple2[String, Int]](new Path("file:///D:/fink-results"))
    //2.创建DataStream -细化
    val dataStream: DataStream[String] = fsEnv.socketTextStream("Spark",9999)
    //3.对数据做转换
    dataStream.flatMap(_.split("\\s+"))
        .map((_,1))
        .keyBy(0)
        .sum(1)
        .map(t=> new Tuple2(t._1,t._2))
        .writeUsingOutputFormat(output)
    
    fsEnv.execute("FlinkWordCountsQuickStart")
  • Bucketing File Sink 处理语义:exactly-once(精确一次)

引入依赖

    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-connector-filesystem_2.11</artifactId>
        <version>1.8.1</version>
    </dependency>
    
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.9.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>2.9.2</version>
    </dependency>

实例代码

    //1.创建StreamExecutionEnvironment
    val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
    
    val bucketSink = new BucketingSink[String]("hdfs://Spark:9000/bucketSink")
    bucketSink.setBucketer(new DateTimeBucketer("yyyy-MM-dd-HH", ZoneId.of("Asia/Shanghai")))
    
    //2.创建DataStream -细化
    val dataStream: DataStream[String] = fsEnv.socketTextStream("Spark",9999)
    //3.对数据做转换
    dataStream.flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .sum(1)
    .map(t=>t._1+"\t"+t._2)
    .addSink(bucketSink)
    
    fsEnv.execute("FlinkWordCountsQuickStart")

print()/ printToErr()

    //1.创建StreamExecutionEnvironment
    val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
    
    //2.创建DataStream -细化
    val dataStream: DataStream[String] = fsEnv.socketTextStream("Spark",9999)
    //3.对数据做转换
    dataStream.flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .sum(1)
    .print("error")
    
    fsEnv.execute("FlinkWordCountsQuickStart")

自定义Sink

class UserDefineSink extends RichSinkFunction[(String,Int)]{

  override def open(parameters: Configuration): Unit = {
    println("打开连接")
  }

  override def invoke(value: (String, Int)): Unit = {
    println("insert into 数据" + value)
  }

  override def close(): Unit = {
    println("关闭连接")
  }

}
object FlinkUserDefineSink {

  def main(args: Array[String]): Unit = {

    // 1.创建StreamExecutionEnvironment
    val flinkEnv = StreamExecutionEnvironment.getExecutionEnvironment

    // 使用用户自定义的数据源
    val dataStream : DataStream[String] = flinkEnv.addSource[String](
      new UserDefineDataSource
    )

    dataStream
      .flatMap(_.split("\\s+"))
      .map((_, 1))
      .keyBy(0)
      .sum(1)
      // 使用用户自定义的sink组件
      .addSink(new UserDefineSink)

    // 执行计算
    flinkEnv.execute("FlinkWordCount")

  }

}

Redis Sink

参考:https://bahir.apache.org/docs/flink/current/flink-streaming-redis/

引入依赖

    <dependency>
        <groupId>org.apache.bahir</groupId>
        <artifactId>flink-connector-redis_2.11</artifactId>
        <version>1.0</version>
    </dependency>

实例代码

 object FlinkRedisSink {

  def main(args: Array[String]): Unit = {

    // 1.创建StreamExecutionEnvironment
    val flinkEnv = StreamExecutionEnvironment.getExecutionEnvironment

    val flinkJedis = new FlinkJedisPoolConfig.Builder().setHost("Spark").setPort(6379).build()

    // 2.创建DataStream
    val dataStream : DataStream[String] = flinkEnv.socketTextStream("Spark", 6666)

    // 3.对数据做转换
    dataStream
      .flatMap(_.split("\\s+"))
      .map((_, 1))
      .keyBy(0)
      .sum(1)
      // 把数据存储到redis中
      .addSink(new RedisSink(flinkJedis, new UserDefineRedisMapper))

    // 执行计算
    flinkEnv.execute("FlinkWordCount")

  }

}

class UserDefineRedisMapper extends RedisMapper[(String,Int)]{

  override def getCommandDescription: RedisCommandDescription = {
    new RedisCommandDescription(RedisCommand.HSET, "word-count")
  }

  override def getKeyFromData(t: (String, Int)): String = {
    t._1
  }

  override def getValueFromData(t: (String, Int)): String = {
    t._2.toString
  }

}

在安装Redis如果访问不到,需要关闭Redis protect-model:no

Kafka Sink

引入依赖

    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-connector-kafka_2.11</artifactId>
        <version>1.8.1</version>
    </dependency>

实例代码

 object FlinkKafkaSink {

  def main(args: Array[String]): Unit = {

    // 1.创建StreamExecutionEnvironment
    val flinkEnv = StreamExecutionEnvironment.getExecutionEnvironment

    // 2.创建DataStream
    val dataStream : DataStream[String] = flinkEnv.socketTextStream("Spark", 6666)

    val prop = new Properties()
    prop.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "Spark:9092")
    //不建议覆盖
    prop.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, classOf[ByteArraySerializer])
    prop.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, classOf[ByteArraySerializer])

    prop.put(ProducerConfig.RETRIES_CONFIG, "3")
    prop.put(ProducerConfig.ACKS_CONFIG, "-1")
    prop.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true")
    prop.put(ProducerConfig.BATCH_SIZE_CONFIG, "100")
    prop.put(ProducerConfig.LINGER_MS_CONFIG, "500")

    // 3.对数据做转换
    dataStream
      .flatMap(_.split("\\s+"))
      .map((_, 1))
      .keyBy(0)
      .sum(1)
      // 使用Kafka的sink组件
      .addSink(new FlinkKafkaProducer[(String, Int)]("flink", new UserDefineKafkaSchema, prop))

    // 执行计算
    flinkEnv.execute("FlinkWordCount")

  }

}

自定义转换规则

class UserDefineKafkaSchema extends KeyedSerializationSchema[(String, Int)]{

  override def serializeKey(t: (String, Int)): Array[Byte] = {
    t._1.getBytes()
  }

  override def serializeValue(t: (String, Int)): Array[Byte] = {
    t._2.toString.getBytes()
  }

  override def getTargetTopic(t: (String, Int)): String = {
    "flink"
  }

}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值