记录spark-streaming-kafka-0-10_2.11的2.3.2版本StructuredStreaming水印除重操作OOM解决

本文介绍了如何通过调整Spark SQL Streaming的配置参数,减少HDFS Backed State Store在内存上的消耗,以优化Structured Streaming任务的性能。重点讨论了`spark.sql.streaming.minBatchesToRetain`和`spark.sql.streaming.maxBatchesToRetainInMemory`在内存管理中的作用,以及如何根据工作负载调整以降低10倍至80倍的内存占用。

代码主要部分:

    val df = kafkaReadStream(spark, KAFKA_INIT_OFFSETS, KAFKA_TOPIC)
      .option("maxOffsetsPerTrigger",1000)//限流:对每个触发器间隔处理的最大偏移量的速率限制。指定的偏移量总数将按比例划分到不同卷的topicPartitions上。
      .option("fetchOffset.numRetries",3)//尝试次数
      .option("failOnDataLoss",false) //数据丢失警告
      .load()
      .selectExpr("cast (value as string) as json")
      .select(from_json($"json", schema = getKafkaDNSLogSchema()).as("data"))
      //      .select("data.time","data.host","data.content")
      .select("data.content")
      .filter($"content".isNotNull)
      .map(row => {
        val content = JsonDNSDataHandler(row.getString(0))
        val date1 = CommonUtils.timeStamp2Date(content.split("\t")(0).toLong, "yyyy-MM-dd HH:mm:ss.SSSSSS")
        val timestamp = java.sql.Timestamp.valueOf(date1)

        (timestamp,content)
      }).as[(Timestamp,String)].toDF("timestamp","content")
      .withWatermark("timestamp", "10 minutes") //重复记录设置水印时间为十分钟
      .dropDuplicates("content")

    val query = df.writeStream
      .outputMode(OutputMode.Update()) //保留更新的数据
            .trigger(Trigger.ProcessingTime("2 minutes")) //默认0sThe default value is `ProcessingTime(0)` and it will run the query as fast as possible.
//      .trigger(Trigger.ProcessingTime(0))
      //      .format("console") // 输出到控制台 debug用
      .format("cn.pcl.csrc.spark.streaming.HiveSinkProvider") //自定义HiveSinkProvider
      .option("checkpointLocation", KAFKA_CHECK_POINTS)
      .start()
    query.awaitTermination()

出错日志:

21/09/09 09:24:35 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
21/09/09 11:02:17 WARN ProcessingTimeExecutor: Current batch is falling behind. The trigger interval is 120000 milliseconds, but spent 137183 milliseconds
21/09/09 11:06:04 WARN ProcessingTimeExecutor: Current batch is falling behind. The trigger interval is 120000 milliseconds, but spent 227294 milliseconds
21/09/09 11:12:35 WARN ProcessingTimeExecutor: Current batch is falling behind. The trigger interval is 120000 milliseconds, but spent 155791 milliseconds
21/09/09 11:18:41 WARN ProcessingTimeExecutor: Current batch is falling behind. The trigger interval is 120000 milliseconds, but spent 161555 milliseconds
21/09/09 11:19:29 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to remove executor 2 for reason Container marked as failed: container_e32_1631077447110_0028_01_000003 on host: hdp03.p
cl-test.com. Exit status: 143. Diagnostics: [2021-09-09 11:17:45.087]Container killed on request. Exit code is 143
[2021-09-09 11:17:45.088]Container exited with a non-zero exit code 143.
[2021-09-09 11:17:45.090]Killed by external signal

21/09/09 11:19:29 ERROR YarnScheduler: Lost executor 2 on hdp03.pcl-test.com: Container marked as failed: container_e32_1631077447110_0028_01_000003 on host: hdp03.pcl-test.com. Exit status: 143. Diagnost
ics: [2021-09-09 11:17:45.087]Container killed on request. Exit code is 143
[2021-09-09 11:17:45.088]Container exited with a non-zero exit code 143.
[2021-09-09 11:17:45.090]Killed by external signal

21/09/09 11:19:29 WARN TaskSetManager: Lost task 0.0 in stage 112.0 (TID 11256, hdp03.pcl-test.com, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Containe
r marked as failed: container_e32_1631077447110_0028_01_000003 on host: hdp03.pcl-test.com. Exit status: 143. Diagnostics: [2021-09-09 11:17:45.087]Container killed on request. Exit code is 143
[2021-09-09 11:17:45.088]Container exited with a non-zero exit code 143.
[2021-09-09 11:17:45.090]Killed by external signal

21/09/09 11:19:29 ERROR TaskSetManager: Task 0 in stage 112.0 failed 1 times; aborting job
21/09/09 11:19:29 ERROR WriteToDataSourceV2Exec: Data source writer com.hortonworks.spark.sql.hive.llap.HiveStreamingDataSourceWriter@7050cc3f is aborting.
21/09/09 11:19:29 ERROR WriteToDataSourceV2Exec: Data source writer com.hortonworks.spark.sql.hive.llap.HiveStreamingDataSourceWriter@7050cc3f aborted.
21/09/09 11:19:29 ERROR MicroBatchExecution: Query [id = 60ca7eca-727c-494c-84a8-aa542340eb53, runId = 2987ae03-058c-4c86-bc68-421083b72fab] terminated with error
org.apache.spark.SparkException: Writing job aborted.
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2.scala:112)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:256)
        at cn.pcl.csrc.spark.streaming.HiveSink.addBatch(HiveSink.scala:39)
        at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$3$$anonfun$apply$16.apply(MicroBatchExecution.scala:
475)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
        at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$3.apply(MicroBatchExecution.scala:473)
        at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
        at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
        at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:472)
        at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:133)
        at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
        at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
        at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
        at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
        at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:121)
        at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
        at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:117)
        at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 112.0 failed 1 times, most recent failure: Lost task 0.0 in stage 112.0 (TID 11256, hdp03.pcl-test.com, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container marked as failed: container_e32_1631077447110_0028_01_000003 on host: hdp03.pcl-test.com. Exit status: 143. Diagnostics: [2021-09-09 11:17:45.087]Container killed on request. Exit code is 143
[2021-09-09 11:17:45.088]Container exited with a non-zero exit code 143.
[2021-09-09 11:17:45.090]Killed by external signal

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1651)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1639)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1638)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1638)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1872)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1821)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1810)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2.scala:82)
        ... 30 more
Exception in thread "main" org.apache.spark.sql.streaming.StreamingQueryException: Writing job aborted.
=== Streaming Query ===
Identifier: [id = 60ca7eca-727c-494c-84a8-aa542340eb53, runId = 2987ae03-058c-4c86-bc68-421083b72fab]
Current Committed Offsets: {KafkaSource[Subscribe[recursive-log]]: {"recursive-log":{"0":2504047}}}
Current Available Offsets: {KafkaSource[Subscribe[recursive-log]]: {"recursive-log":{"0":2504247}}}

Current State: ACTIVE
Thread State: RUNNABLE

Logical Plan:
Deduplicate [content#39]
+- EventTimeWatermark timestamp#38: timestamp, interval 10 minutes
   +- Project [_1#32 AS timestamp#38, _2#33 AS content#39]
      +- SerializeFromObject [staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, TimestampType, fromJavaTimestamp, assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1, true, false) AS _1#32, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, true, false) AS _2#33]
         +- MapElements <function1>, interface org.apache.spark.sql.Row, [StructField(content,StringType,true)], obj#31: scala.Tuple2
            +- DeserializeToObject createexternalrow(content#25.toString, StructField(content,StringType,true)), obj#30: org.apache.spark.sql.Row
               +- Filter isnotnull(content#25)
                  +- Project [data#23.content AS content#25]
                     +- Project [jsontostructs(StructField(time,StringType,true), StructField(host,StringType,true), StructField(content,StringType,true), json#21, Some(Asia/Shanghai), true) AS data#23]
                        +- Project [cast(value#8 as string) AS json#21]
                           +- StreamingExecutionRelation KafkaSource[Subscribe[recursive-log]], [key#7, value#8, topic#9, partition#10, offset#11L, timestamp#12, timestampType#13]

        at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: org.apache.spark.SparkException: Writing job aborted.
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2.scala:112)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:256)
        at cn.pcl.csrc.spark.streaming.HiveSink.addBatch(HiveSink.scala:39)
        at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$3$$anonfun$apply$16.apply(MicroBatchExecution.scala:475)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
        at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$3.apply(MicroBatchExecution.scala:473)
        at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
        at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
        at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:472)
        at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:133)
        at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
        at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
        at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
        at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
        at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:121)
        at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
        at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:117)
        at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)
        ... 1 more
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 112.0 failed 1 times, most recent failure: Lost task 0.0 in stage 112.0 (TID 11256, hdp03.pcl-test.com, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container marked as failed: container_e32_1631077447110_0028_01_000003 on host: hdp03.pcl-test.com. Exit status: 143. Diagnostics: [2021-09-09 11:17:45.087]Container killed on request. Exit code is 143
[2021-09-09 11:17:45.088]Container exited with a non-zero exit code 143.
[2021-09-09 11:17:45.090]Killed by external signal

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1651)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1639)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1638)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1638)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1872)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1821)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1810)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2.scala:82)
        ... 30 more

问题解惑链接:

问题描述

目前只是用这部分,已经解决问题。

Spark2.2(三十八):Spark Structured Streaming2.4之前版本使用agg和dropduplication消耗内存比较多的问题(Memory issue with spark structured streaming)调研
在spark中《Memory usage of state in Spark Structured Streaming》讲解Spark内存分配情况,以及提到了HDFSBackedStateStoreProvider存储多个版本的影响;从stackoverflow上也可以看到别人遇到了structured streaming中内存问题,同时也对问题做了分析《Memory issue with spark structured streaming》;另外可以从spark的官网问题修复列表中查看到如下内容:

1)在流聚合中从值中删除冗余密钥数据(Split out min retain version of state for memory in HDFSBackedStateStoreProvider)
问题描述:

HDFSBackedStateStoreProvider has only one configuration for minimum versions to retain of state which applies to both memory cache and files. As default version of "spark.sql.streaming.minBatchesToRetain" is set to high (100), which doesn't require strictly 100x of memory, but I'm seeing 10x ~ 80x of memory consumption for various workloads. In addition, in some cases, requiring 2x of memory is even unacceptable, so we should split out configuration for memory and let users adjust to trade-off memory usage vs cache miss.

In normal case, default value '2' would cover both cases: success and restoring failure with less than or around 2x of memory usage, and '1' would only cover success case but no longer require more than 1x of memory. In extreme case, user can set the value to '0' to completely disable the map cache to maximize executor memory.

修复情况:

对应官网bug情况概述《[SPARK-24717][SS] Split out max retain version of state for memory in HDFSBackedStateStoreProvider #21700》、《Split out min retain version of state for memory in HDFSBackedStateStoreProvider》

相关知识:

《Spark Structrued Streaming源码分析--(三)Aggreation聚合状态存储与更新》

HDFSBackedStateStoreProvider存储state的目录结构在该文章中介绍的,另外这些文件是一个系列,建议可以多读读,下边借用作者文章中的图展示下state存储目录结构

优化配置

配置描述

解决方式:

运行提交任务时,增加两配置

--conf spark.sql.streaming.minBatchesToRetain=3 --conf spark.sql.streaming.maxBatchesToRetainInMemory=0 

目前运行状态:

package com.cn.tz16.warehouse.realtime.process import com.cn.tz16.warehouse.hbase.HbaseConf import com.cn.tz16.warehouse.realtime.bean.{DimOrgDBEntity, DimShopsDBEntity, _} import com.cn.tz16.warehouse.realtime.utils.{JedisUtil, RedisUtil} import org.apache.flink.streaming.api.scala.DataStream import org.apache.flink.streaming.api.windowing.time.Time import org.apache.flink.util.Collector import redis.clients.jedis.Jedis import org.apache.flink.api.scala._ import org.apache.flink.configuration.Configuration import org.apache.flink.streaming.api.functions.sink.{RichSinkFunction, SinkFunction} import org.apache.hadoop.hbase.TableName import org.apache.hadoop.hbase.client.{Connection, Put, Table} import org.apache.hadoop.hbase.util.Bytes import scala.collection.JavaConversions._ /** * @author: KING * @description: 事实表拉宽实现 * @Date:Created in 2020-09-21 21:12 */ object OrderDetailWideBizProcess { def process(canalEntityDataStream: DataStream[CanalEntity]): Unit = { //只处理订单详情数据 val orderGoodesCanalEntityDataStream: DataStream[CanalEntity] = canalEntityDataStream.filter(canalEntity => { canalEntity.table == "shop.order_goods" || canalEntity.table == "order_goods" }) // 添加调试信息 orderGoodesCanalEntityDataStream.map { entity => println(s"=== Processing order_goods entity ===") println(s"Event Type: ${entity.event_type}") println(s"Table: ${entity.table}") println(s"Columns available: ${entity.colMap.keySet().mkString(", ")}") println("================================") entity } val orderGoodesWideEntityDataStream: DataStream[OrderGoodsWideEntity] = orderGoodesCanalEntityDataStream .timeWindowAll(Time.seconds(5)) .apply((timeWindow, iterable, collector: Collector[OrderGoodsWideEntity]) => { println("===========进入拉宽任务") val jedis: Jedis = JedisUtil.getJedis() val iterator: Iterator[CanalEntity] = iterable.iterator //遍历迭代器,进行数据遍历,拉宽 while (iterator.hasNext) { val canalEntity: CanalEntity = iterator.next() try { // 安全获取字段值 val ogId = safeGetColumnValue(canalEntity, "ogId") val orderId = safeGetColumnValue(canalEntity, "orderId") val goodsId = safeGetColumnValue(canalEntity, "goodsId") val goodsNum = safeGetColumnValue(canalEntity, "goodsNum") val goodsPrice = safeGetColumnValue(canalEntity, "goodsPrice") println(s"Processing: ogId=$ogId, orderId=$orderId, goodsId=$goodsId") //与redis中的维度进行比较,拉宽 val dim_goods_key = "shop:dim_goods" if (goodsId.nonEmpty && jedis.hexists(dim_goods_key, goodsId)) { println(s"Goods $goodsId exists in Redis") //获取维度表数据 val str: String = jedis.hget("shop:dim_goods", goodsId) //商品实例 商品实例包含了 val dimGoodsDBEntity: DimGoodsDBEntity = DimGoodsDBEntity(str) //通过goodsCatId 和商品维度表的 三级分类ID 进行关联,关联到商品分类维度表 val goodsCatJson: String = jedis.hget("shop:dim_goods_cats", dimGoodsDBEntity.goodsCatId + "") val thirdDimGoodsCatDBEntity: DimGoodsCatDBEntity = DimGoodsCatDBEntity(goodsCatJson) //三级找二级 二级维度表 val secondGoodsCatJson: String = jedis.hget("shop:dim_goods_cats", thirdDimGoodsCatDBEntity.parentId + "") val secondDimGoodsCatDBEntity: DimGoodsCatDBEntity = DimGoodsCatDBEntity(secondGoodsCatJson) //二级找一级 一级维度表 val firstGoodsCatJson: String = jedis.hget("shop:dim_goods_cats", secondDimGoodsCatDBEntity.parentId + "") val firstDimGoodsCatDBEntity: DimGoodsCatDBEntity = DimGoodsCatDBEntity(firstGoodsCatJson) //拉宽区域维度 关联到店铺 val shopJson: String = jedis.hget("shop:dim_shops", dimGoodsDBEntity.shopId + "") println("shopJson=====" + shopJson) val dimShopsDBEntity: DimShopsDBEntity = DimShopsDBEntity(shopJson) //店铺找city val cityOrgJson: String = jedis.hget("shop:dim_org", dimShopsDBEntity.areaId + "") println("cityOrgJson=====" + cityOrgJson) val cityDimOrgDBEntity: DimOrgDBEntity = DimOrgDBEntity(cityOrgJson) //city找大区 val regionOrgJson: String = jedis.hget("shop:dim_org", cityDimOrgDBEntity.parentId + "") println("regionOrgJson=====" + regionOrgJson) val regionDimOrgDBEntity: DimOrgDBEntity = DimOrgDBEntity(regionOrgJson) //构建大宽表 val entity = OrderGoodsWideEntity( ogId, orderId, goodsId, goodsNum, goodsPrice, dimGoodsDBEntity.goodsName, dimShopsDBEntity.shopId + "", thirdDimGoodsCatDBEntity.catId + "", thirdDimGoodsCatDBEntity.catName, secondDimGoodsCatDBEntity.catId + "", secondDimGoodsCatDBEntity.catName, firstDimGoodsCatDBEntity.catId + "", firstDimGoodsCatDBEntity.catName, dimShopsDBEntity.areaId + "", dimShopsDBEntity.shopName, dimShopsDBEntity.shopCompany, cityDimOrgDBEntity.orgId + "", cityDimOrgDBEntity.orgName, regionDimOrgDBEntity.orgId + "", regionDimOrgDBEntity.orgName ) collector.collect(entity) println(s"Successfully created wide entity for goods: $goodsId") } else { println(s"Goods $goodsId not found in Redis or goodsId is empty") } } catch { case e: Exception => println(s"Error processing canal entity: ${e.getMessage}") e.printStackTrace() } } jedis.close() }) // 添加宽表数据调试 orderGoodesWideEntityDataStream.map { wideEntity => println(s"=== Generated Wide Entity ===") println(s"Goods: ${wideEntity.goodsName}, Shop: ${wideEntity.shopName}") println(s"City: ${wideEntity.cityName}, Region: ${wideEntity.regionName}") println("================================") wideEntity } //写入hbase. 构建sink orderGoodesWideEntityDataStream.addSink(new RichSinkFunction[OrderGoodsWideEntity] { var connection: Connection = _ override def open(parameters: Configuration) = { //定义 hbase连接 connection = HbaseConf.getInstance().getHconnection println("初始化hbase连接=======" + connection) } override def close() = { //hbase关闭 if (connection != null && !connection.isClosed) { connection.close() } } override def invoke(orderGoodsWideEntity: OrderGoodsWideEntity, context: SinkFunction.Context[_]) = { try { //实现数据处理逻辑 //1.获取table val table: Table = connection.getTable(TableName.valueOf("dwd_order_detail")) //构建put对象 val rowkey = Bytes.toBytes(orderGoodsWideEntity.ogId.toString) val put = new Put(rowkey) val colFamilyName = Bytes.toBytes("detail") //往put中添加列 put.addColumn(colFamilyName, Bytes.toBytes("ogId"), Bytes.toBytes(orderGoodsWideEntity.ogId.toString)) put.addColumn(colFamilyName, Bytes.toBytes("orderId"), Bytes.toBytes(orderGoodsWideEntity.orderId.toString)) put.addColumn(colFamilyName, Bytes.toBytes("goodsId"), Bytes.toBytes(orderGoodsWideEntity.goodsId.toString)) put.addColumn(colFamilyName, Bytes.toBytes("goodsNum"), Bytes.toBytes(orderGoodsWideEntity.goodsNum.toString)) put.addColumn(colFamilyName, Bytes.toBytes("goodsPrice"), Bytes.toBytes(orderGoodsWideEntity.goodsPrice.toString)) put.addColumn(colFamilyName, Bytes.toBytes("goodsName"), Bytes.toBytes(orderGoodsWideEntity.goodsName.toString)) put.addColumn(colFamilyName, Bytes.toBytes("shopId"), Bytes.toBytes(orderGoodsWideEntity.shopId.toString)) put.addColumn(colFamilyName, Bytes.toBytes("goodsThirdCatId"), Bytes.toBytes(orderGoodsWideEntity.goodsThirdCatId.toString)) put.addColumn(colFamilyName, Bytes.toBytes("goodsThirdCatName"), Bytes.toBytes(orderGoodsWideEntity.goodsThirdCatName.toString)) put.addColumn(colFamilyName, Bytes.toBytes("goodsSecondCatId"), Bytes.toBytes(orderGoodsWideEntity.goodsSecondCatId.toString)) put.addColumn(colFamilyName, Bytes.toBytes("goodsSecondCatName"), Bytes.toBytes(orderGoodsWideEntity.goodsSecondCatName.toString)) put.addColumn(colFamilyName, Bytes.toBytes("goodsFirstCatId"), Bytes.toBytes(orderGoodsWideEntity.goodsFirstCatId.toString)) put.addColumn(colFamilyName, Bytes.toBytes("goodsFirstCatName"), Bytes.toBytes(orderGoodsWideEntity.goodsFirstCatName.toString)) put.addColumn(colFamilyName, Bytes.toBytes("areaId"), Bytes.toBytes(orderGoodsWideEntity.areaId.toString)) put.addColumn(colFamilyName, Bytes.toBytes("shopName"), Bytes.toBytes(orderGoodsWideEntity.shopName.toString)) put.addColumn(colFamilyName, Bytes.toBytes("shopCompany"), Bytes.toBytes(orderGoodsWideEntity.shopCompany.toString)) put.addColumn(colFamilyName, Bytes.toBytes("cityId"), Bytes.toBytes(orderGoodsWideEntity.cityId.toString)) put.addColumn(colFamilyName, Bytes.toBytes("cityName"), Bytes.toBytes(orderGoodsWideEntity.cityName.toString)) put.addColumn(colFamilyName, Bytes.toBytes("regionId"), Bytes.toBytes(orderGoodsWideEntity.regionId.toString)) put.addColumn(colFamilyName, Bytes.toBytes("regionName"), Bytes.toBytes(orderGoodsWideEntity.regionName.toString)) table.put(put) table.close() println(s"Successfully wrote to HBase: ${orderGoodsWideEntity.ogId}") } catch { case e: Exception => println(s"Error writing to HBase: ${e.getMessage}") e.printStackTrace() } } }) } // 安全获取列值的方法 private def safeGetColumnValue(canalEntity: CanalEntity, columnName: String): String = { try { val colValue = canalEntity.colMap.get(columnName) if (colValue != null) colValue.value else "" } catch { case e: Exception => println(s"Error getting column $columnName: ${e.getMessage}") "" } } } "C:\Program Files\Java\jdk1.8.0_202\bin\java.exe" "-javaagent:E:\IDEA\IntelliJ IDEA 2025.2.2\lib\idea_rt.jar=58542" -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk1.8.0_202\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_202\jre\lib\rt.jar;E:\IDEA\a\warehouse_root\project\shop_realtime\target\classes;C:\Users\zhoujianbang\.m2\repository\org\apache\hadoop\hadoop-common\3.3.6\hadoop-common-3.3.6.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hadoop\thirdparty\hadoop-shaded-protobuf_3_7\1.1.1\hadoop-shaded-protobuf_3_7-1.1.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hadoop\hadoop-annotations\3.3.6\hadoop-annotations-3.3.6.jar;C:\Program Files\Java\jdk1.8.0_202\lib\tools.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hadoop\thirdparty\hadoop-shaded-guava\1.1.1\hadoop-shaded-guava-1.1.1.jar;C:\Users\zhoujianbang\.m2\repository\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\commons\commons-math3\3.1.1\commons-math3-3.1.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\httpcomponents\httpclient\4.5.13\httpclient-4.5.13.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\httpcomponents\httpcore\4.4.13\httpcore-4.4.13.jar;C:\Users\zhoujianbang\.m2\repository\commons-codec\commons-codec\1.15\commons-codec-1.15.jar;C:\Users\zhoujianbang\.m2\repository\commons-io\commons-io\2.8.0\commons-io-2.8.0.jar;C:\Users\zhoujianbang\.m2\repository\commons-net\commons-net\3.9.0\commons-net-3.9.0.jar;C:\Users\zhoujianbang\.m2\repository\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;C:\Users\zhoujianbang\.m2\repository\javax\servlet\javax.servlet-api\3.1.0\javax.servlet-api-3.1.0.jar;C:\Users\zhoujianbang\.m2\repository\jakarta\activation\jakarta.activation-api\1.2.1\jakarta.activation-api-1.2.1.jar;C:\Users\zhoujianbang\.m2\repository\org\eclipse\jetty\jetty-server\9.4.51.v20230217\jetty-server-9.4.51.v20230217.jar;C:\Users\zhoujianbang\.m2\repository\org\eclipse\jetty\jetty-http\9.4.51.v20230217\jetty-http-9.4.51.v20230217.jar;C:\Users\zhoujianbang\.m2\repository\org\eclipse\jetty\jetty-io\9.4.51.v20230217\jetty-io-9.4.51.v20230217.jar;C:\Users\zhoujianbang\.m2\repository\org\eclipse\jetty\jetty-util\9.4.51.v20230217\jetty-util-9.4.51.v20230217.jar;C:\Users\zhoujianbang\.m2\repository\org\eclipse\jetty\jetty-servlet\9.4.51.v20230217\jetty-servlet-9.4.51.v20230217.jar;C:\Users\zhoujianbang\.m2\repository\org\eclipse\jetty\jetty-security\9.4.51.v20230217\jetty-security-9.4.51.v20230217.jar;C:\Users\zhoujianbang\.m2\repository\org\eclipse\jetty\jetty-webapp\9.4.51.v20230217\jetty-webapp-9.4.51.v20230217.jar;C:\Users\zhoujianbang\.m2\repository\org\eclipse\jetty\jetty-xml\9.4.51.v20230217\jetty-xml-9.4.51.v20230217.jar;C:\Users\zhoujianbang\.m2\repository\javax\servlet\jsp\jsp-api\2.1\jsp-api-2.1.jar;C:\Users\zhoujianbang\.m2\repository\com\sun\jersey\jersey-core\1.19.4\jersey-core-1.19.4.jar;C:\Users\zhoujianbang\.m2\repository\javax\ws\rs\jsr311-api\1.1.1\jsr311-api-1.1.1.jar;C:\Users\zhoujianbang\.m2\repository\com\sun\jersey\jersey-servlet\1.19.4\jersey-servlet-1.19.4.jar;C:\Users\zhoujianbang\.m2\repository\com\github\pjfanning\jersey-json\1.20\jersey-json-1.20.jar;C:\Users\zhoujianbang\.m2\repository\org\codehaus\jettison\jettison\1.1\jettison-1.1.jar;C:\Users\zhoujianbang\.m2\repository\com\sun\xml\bind\jaxb-impl\2.2.3-1\jaxb-impl-2.2.3-1.jar;C:\Users\zhoujianbang\.m2\repository\com\sun\jersey\jersey-server\1.19.4\jersey-server-1.19.4.jar;C:\Users\zhoujianbang\.m2\repository\commons-logging\commons-logging\1.1.3\commons-logging-1.1.3.jar;C:\Users\zhoujianbang\.m2\repository\ch\qos\reload4j\reload4j\1.2.22\reload4j-1.2.22.jar;C:\Users\zhoujianbang\.m2\repository\commons-beanutils\commons-beanutils\1.9.4\commons-beanutils-1.9.4.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\commons\commons-configuration2\2.8.0\commons-configuration2-2.8.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\commons\commons-lang3\3.12.0\commons-lang3-3.12.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\commons\commons-text\1.10.0\commons-text-1.10.0.jar;C:\Users\zhoujianbang\.m2\repository\org\slf4j\slf4j-api\1.7.36\slf4j-api-1.7.36.jar;C:\Users\zhoujianbang\.m2\repository\org\slf4j\slf4j-reload4j\1.7.36\slf4j-reload4j-1.7.36.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\avro\avro\1.7.7\avro-1.7.7.jar;C:\Users\zhoujianbang\.m2\repository\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;C:\Users\zhoujianbang\.m2\repository\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;C:\Users\zhoujianbang\.m2\repository\com\thoughtworks\paranamer\paranamer\2.3\paranamer-2.3.jar;C:\Users\zhoujianbang\.m2\repository\com\google\re2j\re2j\1.1\re2j-1.1.jar;C:\Users\zhoujianbang\.m2\repository\com\google\code\gson\gson\2.9.0\gson-2.9.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hadoop\hadoop-auth\3.3.6\hadoop-auth-3.3.6.jar;C:\Users\zhoujianbang\.m2\repository\com\nimbusds\nimbus-jose-jwt\9.8.1\nimbus-jose-jwt-9.8.1.jar;C:\Users\zhoujianbang\.m2\repository\com\github\stephenc\jcip\jcip-annotations\1.0-1\jcip-annotations-1.0-1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\curator\curator-framework\5.2.0\curator-framework-5.2.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\kerby\kerb-simplekdc\1.0.1\kerb-simplekdc-1.0.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\kerby\kerb-client\1.0.1\kerb-client-1.0.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\kerby\kerby-config\1.0.1\kerby-config-1.0.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\kerby\kerb-common\1.0.1\kerb-common-1.0.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\kerby\kerb-crypto\1.0.1\kerb-crypto-1.0.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\kerby\kerb-util\1.0.1\kerb-util-1.0.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\kerby\token-provider\1.0.1\token-provider-1.0.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\kerby\kerb-admin\1.0.1\kerb-admin-1.0.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\kerby\kerb-server\1.0.1\kerb-server-1.0.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\kerby\kerb-identity\1.0.1\kerb-identity-1.0.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\kerby\kerby-xdr\1.0.1\kerby-xdr-1.0.1.jar;C:\Users\zhoujianbang\.m2\repository\com\jcraft\jsch\0.1.55\jsch-0.1.55.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\curator\curator-client\5.2.0\curator-client-5.2.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\curator\curator-recipes\5.2.0\curator-recipes-5.2.0.jar;C:\Users\zhoujianbang\.m2\repository\com\google\code\findbugs\jsr305\3.0.2\jsr305-3.0.2.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\zookeeper\zookeeper\3.6.3\zookeeper-3.6.3.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\zookeeper\zookeeper-jute\3.6.3\zookeeper-jute-3.6.3.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-handler\4.1.63.Final\netty-handler-4.1.63.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-transport-native-epoll\4.1.63.Final\netty-transport-native-epoll-4.1.63.Final.jar;C:\Users\zhoujianbang\.m2\repository\org\slf4j\slf4j-log4j12\1.7.25\slf4j-log4j12-1.7.25.jar;C:\Users\zhoujianbang\.m2\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar;C:\Users\zhoujianbang\.m2\repository\io\dropwizard\metrics\metrics-core\3.2.4\metrics-core-3.2.4.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\commons\commons-compress\1.21\commons-compress-1.21.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\kerby\kerb-core\1.0.1\kerb-core-1.0.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\kerby\kerby-pkix\1.0.1\kerby-pkix-1.0.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\kerby\kerby-asn1\1.0.1\kerby-asn1-1.0.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\kerby\kerby-util\1.0.1\kerby-util-1.0.1.jar;C:\Users\zhoujianbang\.m2\repository\com\fasterxml\jackson\core\jackson-databind\2.12.7.1\jackson-databind-2.12.7.1.jar;C:\Users\zhoujianbang\.m2\repository\com\fasterxml\jackson\core\jackson-annotations\2.12.7\jackson-annotations-2.12.7.jar;C:\Users\zhoujianbang\.m2\repository\com\fasterxml\jackson\core\jackson-core\2.12.7\jackson-core-2.12.7.jar;C:\Users\zhoujianbang\.m2\repository\org\codehaus\woodstox\stax2-api\4.2.1\stax2-api-4.2.1.jar;C:\Users\zhoujianbang\.m2\repository\com\fasterxml\woodstox\woodstox-core\5.4.0\woodstox-core-5.4.0.jar;C:\Users\zhoujianbang\.m2\repository\dnsjava\dnsjava\2.1.7\dnsjava-2.1.7.jar;C:\Users\zhoujianbang\.m2\repository\org\xerial\snappy\snappy-java\1.1.8.2\snappy-java-1.1.8.2.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hadoop\hadoop-client\3.3.6\hadoop-client-3.3.6.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hadoop\hadoop-hdfs-client\3.3.6\hadoop-hdfs-client-3.3.6.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hadoop\hadoop-yarn-api\3.3.6\hadoop-yarn-api-3.3.6.jar;C:\Users\zhoujianbang\.m2\repository\javax\xml\bind\jaxb-api\2.2.11\jaxb-api-2.2.11.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hadoop\hadoop-yarn-client\3.3.6\hadoop-yarn-client-3.3.6.jar;C:\Users\zhoujianbang\.m2\repository\org\eclipse\jetty\websocket\websocket-client\9.4.51.v20230217\websocket-client-9.4.51.v20230217.jar;C:\Users\zhoujianbang\.m2\repository\org\eclipse\jetty\jetty-client\9.4.51.v20230217\jetty-client-9.4.51.v20230217.jar;C:\Users\zhoujianbang\.m2\repository\org\eclipse\jetty\websocket\websocket-common\9.4.51.v20230217\websocket-common-9.4.51.v20230217.jar;C:\Users\zhoujianbang\.m2\repository\org\eclipse\jetty\websocket\websocket-api\9.4.51.v20230217\websocket-api-9.4.51.v20230217.jar;C:\Users\zhoujianbang\.m2\repository\org\jline\jline\3.9.0\jline-3.9.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-core\3.3.6\hadoop-mapreduce-client-core-3.3.6.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hadoop\hadoop-yarn-common\3.3.6\hadoop-yarn-common-3.3.6.jar;C:\Users\zhoujianbang\.m2\repository\com\sun\jersey\jersey-client\1.19.4\jersey-client-1.19.4.jar;C:\Users\zhoujianbang\.m2\repository\com\fasterxml\jackson\module\jackson-module-jaxb-annotations\2.12.7\jackson-module-jaxb-annotations-2.12.7.jar;C:\Users\zhoujianbang\.m2\repository\jakarta\xml\bind\jakarta.xml.bind-api\2.3.2\jakarta.xml.bind-api-2.3.2.jar;C:\Users\zhoujianbang\.m2\repository\com\fasterxml\jackson\jaxrs\jackson-jaxrs-json-provider\2.12.7\jackson-jaxrs-json-provider-2.12.7.jar;C:\Users\zhoujianbang\.m2\repository\com\fasterxml\jackson\jaxrs\jackson-jaxrs-base\2.12.7\jackson-jaxrs-base-2.12.7.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-jobclient\3.3.6\hadoop-mapreduce-client-jobclient-3.3.6.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-common\3.3.6\hadoop-mapreduce-client-common-3.3.6.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hadoop\hadoop-hdfs\3.3.6\hadoop-hdfs-3.3.6.jar;C:\Users\zhoujianbang\.m2\repository\org\eclipse\jetty\jetty-util-ajax\9.4.51.v20230217\jetty-util-ajax-9.4.51.v20230217.jar;C:\Users\zhoujianbang\.m2\repository\commons-daemon\commons-daemon\1.0.13\commons-daemon-1.0.13.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty\3.10.6.Final\netty-3.10.6.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-all\4.1.89.Final\netty-all-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-buffer\4.1.89.Final\netty-buffer-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-codec\4.1.89.Final\netty-codec-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-codec-dns\4.1.89.Final\netty-codec-dns-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-codec-haproxy\4.1.89.Final\netty-codec-haproxy-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-codec-http\4.1.89.Final\netty-codec-http-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-codec-http2\4.1.89.Final\netty-codec-http2-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-codec-memcache\4.1.89.Final\netty-codec-memcache-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-codec-mqtt\4.1.89.Final\netty-codec-mqtt-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-codec-redis\4.1.89.Final\netty-codec-redis-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-codec-smtp\4.1.89.Final\netty-codec-smtp-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-codec-socks\4.1.89.Final\netty-codec-socks-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-codec-stomp\4.1.89.Final\netty-codec-stomp-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-codec-xml\4.1.89.Final\netty-codec-xml-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-common\4.1.89.Final\netty-common-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-transport-native-unix-common\4.1.89.Final\netty-transport-native-unix-common-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-handler-proxy\4.1.89.Final\netty-handler-proxy-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-handler-ssl-ocsp\4.1.89.Final\netty-handler-ssl-ocsp-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-resolver\4.1.89.Final\netty-resolver-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-resolver-dns\4.1.89.Final\netty-resolver-dns-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-transport\4.1.89.Final\netty-transport-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-transport-rxtx\4.1.89.Final\netty-transport-rxtx-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-transport-sctp\4.1.89.Final\netty-transport-sctp-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-transport-udt\4.1.89.Final\netty-transport-udt-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-transport-classes-epoll\4.1.89.Final\netty-transport-classes-epoll-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-transport-classes-kqueue\4.1.89.Final\netty-transport-classes-kqueue-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-resolver-dns-classes-macos\4.1.89.Final\netty-resolver-dns-classes-macos-4.1.89.Final.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-transport-native-epoll\4.1.89.Final\netty-transport-native-epoll-4.1.89.Final-linux-x86_64.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-transport-native-epoll\4.1.89.Final\netty-transport-native-epoll-4.1.89.Final-linux-aarch_64.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-transport-native-kqueue\4.1.89.Final\netty-transport-native-kqueue-4.1.89.Final-osx-x86_64.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-transport-native-kqueue\4.1.89.Final\netty-transport-native-kqueue-4.1.89.Final-osx-aarch_64.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-resolver-dns-native-macos\4.1.89.Final\netty-resolver-dns-native-macos-4.1.89.Final-osx-x86_64.jar;C:\Users\zhoujianbang\.m2\repository\io\netty\netty-resolver-dns-native-macos\4.1.89.Final\netty-resolver-dns-native-macos-4.1.89.Final-osx-aarch_64.jar;C:\Users\zhoujianbang\.m2\repository\org\fusesource\leveldbjni\leveldbjni-all\1.8\leveldbjni-all-1.8.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hbase\hbase-client\2.4.9\hbase-client-2.4.9.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hbase\thirdparty\hbase-shaded-protobuf\3.5.1\hbase-shaded-protobuf-3.5.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hbase\hbase-common\2.4.9\hbase-common-2.4.9.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hbase\hbase-logging\2.4.9\hbase-logging-2.4.9.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hbase\thirdparty\hbase-shaded-gson\3.5.1\hbase-shaded-gson-3.5.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hbase\hbase-hadoop-compat\2.4.9\hbase-hadoop-compat-2.4.9.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hbase\hbase-metrics-api\2.4.9\hbase-metrics-api-2.4.9.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hbase\hbase-hadoop2-compat\2.4.9\hbase-hadoop2-compat-2.4.9.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hbase\hbase-metrics\2.4.9\hbase-metrics-2.4.9.jar;C:\Users\zhoujianbang\.m2\repository\javax\activation\javax.activation-api\1.2.0\javax.activation-api-1.2.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hbase\hbase-protocol-shaded\2.4.9\hbase-protocol-shaded-2.4.9.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hbase\hbase-protocol\2.4.9\hbase-protocol-2.4.9.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hbase\thirdparty\hbase-shaded-miscellaneous\3.5.1\hbase-shaded-miscellaneous-3.5.1.jar;C:\Users\zhoujianbang\.m2\repository\com\google\errorprone\error_prone_annotations\2.7.1\error_prone_annotations-2.7.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\hbase\thirdparty\hbase-shaded-netty\3.5.1\hbase-shaded-netty-3.5.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\htrace\htrace-core4\4.2.0-incubating\htrace-core4-4.2.0-incubating.jar;C:\Users\zhoujianbang\.m2\repository\org\jruby\jcodings\jcodings\1.0.55\jcodings-1.0.55.jar;C:\Users\zhoujianbang\.m2\repository\org\jruby\joni\joni\2.1.31\joni-2.1.31.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\commons\commons-crypto\1.0.0\commons-crypto-1.0.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\yetus\audience-annotations\0.5.0\audience-annotations-0.5.0.jar;C:\Users\zhoujianbang\.m2\repository\com\mysql\mysql-connector-j\8.0.33\mysql-connector-j-8.0.33.jar;C:\Users\zhoujianbang\.m2\repository\redis\clients\jedis\2.7.0\jedis-2.7.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\commons\commons-pool2\2.3\commons-pool2-2.3.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-connector-kafka_2.11\1.9.0\flink-connector-kafka_2.11-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-connector-kafka-base_2.11\1.9.0\flink-connector-kafka-base_2.11-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\kafka\kafka-clients\2.2.0\kafka-clients-2.2.0.jar;C:\Users\zhoujianbang\.m2\repository\com\github\luben\zstd-jni\1.3.8-1\zstd-jni-1.3.8-1.jar;C:\Users\zhoujianbang\.m2\repository\org\lz4\lz4-java\1.5.0\lz4-java-1.5.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\force-shading\1.9.0\force-shading-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-statebackend-rocksdb_2.11\1.9.0\flink-statebackend-rocksdb_2.11-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\com\data-artisans\frocksdbjni\5.17.2-artisans-1.0\frocksdbjni-5.17.2-artisans-1.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-jdbc_2.11\1.9.0\flink-jdbc_2.11-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-scala_2.11\1.9.0\flink-scala_2.11-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-core\1.9.0\flink-core-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-annotations\1.9.0\flink-annotations-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-metrics-core\1.9.0\flink-metrics-core-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\com\esotericsoftware\kryo\kryo\2.24.0\kryo-2.24.0.jar;C:\Users\zhoujianbang\.m2\repository\com\esotericsoftware\minlog\minlog\1.2\minlog-1.2.jar;C:\Users\zhoujianbang\.m2\repository\org\objenesis\objenesis\2.1\objenesis-2.1.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-java\1.9.0\flink-java-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-shaded-asm-6\6.2.1-7.0\flink-shaded-asm-6-6.2.1-7.0.jar;C:\Users\zhoujianbang\.m2\repository\org\scala-lang\scala-reflect\2.11.12\scala-reflect-2.11.12.jar;C:\Users\zhoujianbang\.m2\repository\org\scala-lang\scala-compiler\2.11.12\scala-compiler-2.11.12.jar;C:\Users\zhoujianbang\.m2\repository\org\scala-lang\modules\scala-xml_2.11\1.0.5\scala-xml_2.11-1.0.5.jar;C:\Users\zhoujianbang\.m2\repository\org\scala-lang\modules\scala-parser-combinators_2.11\1.0.4\scala-parser-combinators_2.11-1.0.4.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-streaming-scala_2.11\1.9.0\flink-streaming-scala_2.11-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-streaming-java_2.11\1.9.0\flink-streaming-java_2.11-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-runtime_2.11\1.9.0\flink-runtime_2.11-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-queryable-state-client-java\1.9.0\flink-queryable-state-client-java-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-hadoop-fs\1.9.0\flink-hadoop-fs-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\com\typesafe\akka\akka-actor_2.11\2.5.21\akka-actor_2.11-2.5.21.jar;C:\Users\zhoujianbang\.m2\repository\com\typesafe\config\1.3.3\config-1.3.3.jar;C:\Users\zhoujianbang\.m2\repository\org\scala-lang\modules\scala-java8-compat_2.11\0.7.0\scala-java8-compat_2.11-0.7.0.jar;C:\Users\zhoujianbang\.m2\repository\com\typesafe\akka\akka-stream_2.11\2.5.21\akka-stream_2.11-2.5.21.jar;C:\Users\zhoujianbang\.m2\repository\org\reactivestreams\reactive-streams\1.0.2\reactive-streams-1.0.2.jar;C:\Users\zhoujianbang\.m2\repository\com\typesafe\ssl-config-core_2.11\0.3.7\ssl-config-core_2.11-0.3.7.jar;C:\Users\zhoujianbang\.m2\repository\com\typesafe\akka\akka-protobuf_2.11\2.5.21\akka-protobuf_2.11-2.5.21.jar;C:\Users\zhoujianbang\.m2\repository\com\typesafe\akka\akka-slf4j_2.11\2.5.21\akka-slf4j_2.11-2.5.21.jar;C:\Users\zhoujianbang\.m2\repository\org\clapper\grizzled-slf4j_2.11\1.3.2\grizzled-slf4j_2.11-1.3.2.jar;C:\Users\zhoujianbang\.m2\repository\com\github\scopt\scopt_2.11\3.5.0\scopt_2.11-3.5.0.jar;C:\Users\zhoujianbang\.m2\repository\com\twitter\chill_2.11\0.7.6\chill_2.11-0.7.6.jar;C:\Users\zhoujianbang\.m2\repository\com\twitter\chill-java\0.7.6\chill-java-0.7.6.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-clients_2.11\1.9.0\flink-clients_2.11-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-optimizer_2.11\1.9.0\flink-optimizer_2.11-1.9.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-shaded-netty\4.1.32.Final-7.0\flink-shaded-netty-4.1.32.Final-7.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-shaded-guava\18.0-7.0\flink-shaded-guava-18.0-7.0.jar;C:\Users\zhoujianbang\.m2\repository\org\apache\flink\flink-shaded-jackson\2.9.8-7.0\flink-shaded-jackson-2.9.8-7.0.jar;C:\Users\zhoujianbang\.m2\repository\org\javassist\javassist\3.19.0-GA\javassist-3.19.0-GA.jar;C:\Users\zhoujianbang\.m2\repository\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;C:\Users\zhoujianbang\.m2\repository\com\google\guava\guava\18.0\guava-18.0.jar;C:\Users\zhoujianbang\.m2\repository\org\scala-lang\scala-library\2.11.12\scala-library-2.11.12.jar;C:\Users\zhoujianbang\.m2\repository\com\alibaba\fastjson\1.2.5\fastjson-1.2.5.jar" com.cn.tz16.warehouse.realtime.apps.FlinkWideTest SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/C:/Users/zhoujianbang/.m2/repository/org/slf4j/slf4j-reload4j/1.7.36/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/C:/Users/zhoujianbang/.m2/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory] === Starting FlinkWideTest with Configuration === Bootstrap Servers: demo1:9092,demo2:9092,demo3:9092 Input Topic: canal_test Output Topic Druid: druid_output_topic === Starting Data Widening Process === WARN - Property [transaction.timeout.ms] not specified. Setting it to 3600000 ms === Flink Job Started === WARN - No hostname could be resolved for the IP address 127.0.0.1, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. WARN - No hostname could be resolved for the IP address 127.0.0.1, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. WARN - Log file environment variable &#39;log.file&#39; is not set. WARN - JobManager log files are unavailable in the web dashboard. Log file location not found in environment variable &#39;log.file&#39; or configuration key &#39;Key: &#39;web.log.path&#39; , default: null (fallback keys: [{key=jobmanager.web.log.path, isDeprecated=true}])&#39;. WARN - The operator name TriggerWindow(TumblingProcessingTimeWindows(5000), ListStateDescriptor{name=window-contents, defaultValue=null, serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@afde79b}, ProcessingTimeTrigger(), AllWindowedStream.apply(AllWindowedStream.scala:629)) exceeded the 80 characters length limit and was truncated. WARN - Using AT_LEAST_ONCE semantic, but checkpointing is not enabled. Switching to NONE semantic. 初始化hbase连接=======hconnection-0x33a2f361 为什么没有进行拉宽同步到hbase
最新发布
12-02
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值