SparkStreaming 消费阿里云日志服务,出现消费中断的情况,跑着跑着就不消费了...

最新推荐文章于 2023-03-23 14:11:25 发布

Apache_Jerry

最新推荐文章于 2023-03-23 14:11:25 发布

阅读量2k

点赞数

分类专栏：各种坑文章标签： spark streaming aliyun

本文链接：https://blog.youkuaiyun.com/Apache_Jerry/article/details/105439874

版权

各种坑专栏收录该内容

4 篇文章

订阅专栏

日志信息

WARN metadata.Hive: No partition is generated by dynamic partitioning

WriteAheadLogBasedStoreResult

Futures timed out after

这边是表在HDFS中的路径,可以看出,有数据时,会有.hive_stagexxxx 这样的文件, 无数据进来就没有,这时候Streaming 继续跑,Hive Metadata 会报一下WARN: No partition is generated by dynamic partitioning,偶尔出现是可以理解的,但是后面一直都是这样就不对劲了.

获得自动分区的路径列表 第一次跑:
20/04/07 00:42:56 INFO metadata.Hive: New loading path = hdfs://ns1/user/hive/warehouse/tropic.db/apm_log/.hive-staging_hive_2020-04-07_00-30-05_439_8394444377943465526-1/-ext-10000/year=2020/month=4/day=6 with PartSpec {year=2020, month=4, day=6}
20/04/07 00:42:56 INFO metadata.Hive: New loading path = hdfs://ns1/user/hive/warehouse/tropic.db/apm_log/.hive-staging_hive_2020-04-07_00-30-05_439_8394444377943465526-1/-ext-10000/year=2020/month=4/day=7 with PartSpec {year=2020, month=4, day=7}
20/04/07 00:42:56 INFO metadata.Hive: New loading path = hdfs://ns1/user/hive/warehouse/tropic.db/apm_log/.hive-staging_hive_2020-04-07_00-30-05_439_8394444377943465526-1/-ext-10000/year=2020/month=4/day=5 with PartSpec {year=2020, month=4, day=5}
20/04/07 00:42:56 INFO metadata.Hive: New loading path = hdfs://ns1/user/hive/warehouse/tropic.db/apm_log/.hive-staging_hive_2020-04-07_00-30-05_439_8394444377943465526-1/-ext-10000/year=2020/month=4/day=4 with PartSpec {year=2020, month=4, day=4}
20/04/07 00:42:56 INFO metadata.Hive: New loading path = hdfs://ns1/user/hive/warehouse/tropic.db/apm_log/.hive-staging_hive_2020-04-07_00-30-05_439_8394444377943465526-1/-ext-10000/year=2020/month=4/day=1 with PartSpec {year=2020, month=4, day=1}
20/04/07 00:42:56 INFO metadata.Hive: New loading path = hdfs://ns1/user/hive/warehouse/tropic.db/apm_log/.hive-staging_hive_2020-04-07_00-30-05_439_8394444377943465526-1/-ext-10000/year=2020/month=4/day=2 with PartSpec {year=2020, month=4, day=2}
20/04/07 00:42:56 INFO metadata.Hive: New loading path = hdfs://ns1/user/hive/warehouse/tropic.db/apm_log/.hive-staging_hive_2020-04-07_00-30-05_439_8394444377943465526-1/-ext-10000/year=2020/month=4/day=3 with PartSpec {year=2020, month=4, day=3}
20/04/07 00:42:57 INFO metadata.Hive: Loaded 7 partitions


第二次跑:
20/04/07 01:02:54 INFO metadata.Hive: New loading path = hdfs://ns1/user/hive/warehouse/tropic.db/apm_log/.hive-staging_hive_2020-04-07_01-00-00_640_8171968960099790643-1/-ext-10000/year=2020/month=4/day=7 with PartSpec {year=2020, month=4, day=7}
20/04/07 01:02:54 INFO metadata.Hive: Loaded 1 partitions


获取失败n
20/04/07 01:30:02 WARN metadata.Hive: No partition is generated by dynamic partitioning
20/04/07 01:30:02 INFO metadata.Hive: Loaded 0 partitions


有数据的情况下:
[wangyoujun@etl1 shell]$ hadoop fs -ls -R /user/hive/warehouse/tropic.db/aliyun_survey_log/
drwxrwx--x+  - hive hive          0 2020-04-08 11:02 /user/hive/warehouse/tropic.db/aliyun_survey_log/.hive-staging_hive_2020-04-08_11-02-24_147_1325387100997424892-2
drwxrwx--x+  - hive hive          0 2020-04-08 11:02 /user/hive/warehouse/tropic.db/aliyun_survey_log/.hive-staging_hive_2020-04-08_11-02-24_147_1325387100997424892-2/-ext-10000
drwxrwx--x+  - hive hive          0 2020-04-08 11:02 /user/hive/warehouse/tropic.db/aliyun_survey_log/.hive-staging_hive_2020-04-08_11-02-24_147_1325387100997424892-2/-ext-10000/_temporary
drwxrwx--x+  - hive hive          0 2020-04-08 11:02 /user/hive/warehouse/tropic.db/aliyun_survey_log/.hive-staging_hive_2020-04-08_11-02-24_147_1325387100997424892-2/-ext-10000/_temporary/0
drwxrwx--x+  - hive hive          0 2020-04-03 18:19 /user/hive/warehouse/tropic.db/aliyun_survey_log/idn_year=2020
drwxrwx--x+  - hive hive          0 2020-04-08 10:30 /user/hive/warehouse/tropic.db/aliyun_survey_log/idn_year=2020/idn_month=4
drwxrwx--x+  - hive hive          0 2020-04-08 10:28 /user/hive/warehouse/tropic.db/aliyun_survey_log/idn_year=2020/idn_month=4/idn_day=1




spark executor de log

h; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@etl1.ecreditpal.local:34518)
20/04/08 13:04:54 INFO spark.MapOutputTrackerWorker: Got the output locations
20/04/08 13:04:54 INFO storage.ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks
20/04/08 13:04:54 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
20/04/08 13:04:54 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
20/04/08 13:04:54 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
20/04/08 13:04:54 INFO datasources.SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
20/04/08 13:04:54 INFO mapred.SparkHadoopMapRedUtil: No need to commit output of task because needsTaskCommit=false: attempt_20200408130451_2438_m_000000_0
20/04/08 13:04:54 INFO executor.Executor: Finished task 0.0 in stage 2438.0 (TID 12237). 2728 bytes result sent to driver
20/04/08 13:04:54 INFO storage.BlockManager: Removing RDD 6172
20/04/08 13:04:54 INFO storage.BlockManager: Removing RDD 6174
20/04/08 13:04:54 INFO storage.BlockManager: Removing RDD 6173
20/04/08 13:05:02 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 12238
20/04/08 13:05:02 INFO executor.Executor: Running task 0.0 in stage 2440.0 (TID 12238)
20/04/08 13:05:02 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 2435
20/04/08 13:05:02 INFO memory.MemoryStore: Block broadcast_2435_piece0 stored as bytes in memory (estimated size 166.6 KB, free 911.3 MB)
20/04/08 13:05:02 INFO broadcast.TorrentBroadcast: Reading broadcast variable 2435 took 43 ms
20/04/08 13:05:02 INFO memory.MemoryStore: Block broadcast_2435 stored as values in memory (estimated size 440.3 KB, free 910.9 MB)
20/04/08 13:05:02 INFO spark.MapOutputTrackerWorker: Don't have map outputs for shuffle 317, fetching them
20/04/08 13:05:02 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@etl1.ecreditpal.local:34518)
20/04/08 13:05:02 INFO spark.MapOutputTrackerWorker: Got the output locations
20/04/08 13:05:02 INFO storage.ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks
20/04/08 13:05:02 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
20/04/08 13:05:02 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
20/04/08 13:05:02 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
20/04/08 13:05:02 INFO datasources.SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
20/04/08 13:05:02 INFO mapred.SparkHadoopMapRedUtil: No need to commit output of task because needsTaskCommit=false: attempt_20200408130500_2440_m_000000_0
20/04/08 13:05:02 INFO executor.Executor: Finished task 0.0 in stage 2440.0 (TID 12238). 2728 bytes result sent to driver
20/04/08 13:05:03 INFO storage.BlockManager: Removing RDD 6203
20/04/08 13:05:03 INFO storage.BlockManager: Removing RDD 6204
20/04/08 13:05:03 INFO storage.BlockManager: Removing RDD 6205
20/04/08 13:05:33 INFO storage.BlockManager: Removing RDD 6232
20/04/08 13:05:33 INFO storage.BlockManager: Removing RDD 6233
20/04/08 13:05:33 INFO storage.BlockManager: Removing RDD 6234
20/04/08 13:06:02 INFO storage.BlockManager: Removing RDD 6233
20/04/08 13:06:02 INFO storage.BlockManager: Removing RDD 6173
20/04/08 13:06:03 INFO storage.BlockManager: Removing RDD 6246
20/04/08 13:06:03 INFO storage.BlockManager: Removing RDD 6247
20/04/08 13:06:03 INFO storage.BlockManager: Removing RDD 6248
20/04/08 13:06:04 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
20/04/08 13:06:04 INFO storage.DiskBlockManager: Shutdown hook called
20/04/08 13:06:04 INFO util.ShutdownHookManager: Shutdown hook called




aliyunlog log get_end_cursor --project_name="tropic" --logstore_name="apm-prod" --shard_id=0
MTU4NTY0MjU2NjExMjUwNDQ2NA==

aliyunlog log get_cursor_time --project_name="tropic" --logstore_name="apm-prod" --shard_id=0 --cursor="MTU4NTY0MjU2NjExMjUwNDQ2NA=="

开始的时候我就是根据WARN信息到Hive和Spark 的源码中找，发现Hive 的元数据这里会报这个Warn，然后我就一直以为是元数据的更新问题，我就在每次跑批的开始添加一行刷新元数据的代码，玩了发现这个Warn还是存在，消费还是会被中断。

最后发现是资源分配不够,CPU不够了，就是线程不够了, aliyun应该是推数据给streaming.CPU被占死了.
Spark-submit 里面我没有设置核心数，使用的是系统默认的2个CPU，然后我的 batch_interval 和 block_interval 这两个参数设置的很不合理，因为我的代码实时性要求不高，所以 batch_interval 设置的是5分钟，但是 block_interval 设置的还是默认的200ms，后面并行处理的时候会很耗CPU，最后把这两个参数都修改成5分钟，然后分配1个executor，分配4个核心、内存8G。
之前的配置时 3个executor，每个2核心、4G内存。

我对spark 的理解还是很浅薄，仍需多多努力。

今天在spark 官网看到了这句话:

在这里插入图片描述
纸上得来终觉浅呐.