日志信息
WARN metadata.Hive: No partition is generated by dynamic partitioning
WriteAheadLogBasedStoreResult
Futures timed out after
这边是表在HDFS中的路径,可以看出,有数据时,会有.hive_stagexxxx 这样的文件, 无数据进来就没有,这时候Streaming 继续跑,Hive Metadata 会报一下WARN: No partition is generated by dynamic partitioning,偶尔出现是可以理解的,但是后面一直都是这样就不对劲了.
获得自动分区的路径列表 第一次跑:
20/04/07 00:42:56 INFO metadata.Hive: New loading path = hdfs://ns1/user/hive/warehouse/tropic.db/apm_log/.hive-staging_hive_2020-04-07_00-30-05_439_8394444377943465526-1/-ext-10000/year=2020/month=4/day=6 with PartSpec {year=2020, month=4, day=6}
20/04/07 00:42:56 INFO metadata.Hive: New loading path = hdfs://ns1/user/hive/warehouse/tropic.db/apm_log/.hive-staging_hive_2020-04-07_00-30-05_439_8394444377943465526-1/-ext-10000/year=2020/month=4/day=7 with PartSpec {year=2020, month=4, day=7}
20/04/07 00:42:56 INFO metadata.Hive: New loading path = hdfs://ns1/user/hive/warehouse/tropic.db/apm_log/.hive-staging_hive_2020-04-07_00-30-05_439_8394444377943465526-1/-ext-10000/year=2020/month=4/day=5 with PartSpec {year=2020, month=4, day=5}
20/04/07 00:42:56 INFO metadata.Hive: New loading path = hdfs://ns1/user/hive/warehouse/tropic.db/apm_log/.hive-staging_hive_2020-04-07_00-30-05_439_8394444377943465526-1/-ext-10000/year=2020/month=4/day=4 with PartSpec {year=2020, month=4, day=4}
20/04/07 00:42:56 INFO metadata.Hive: New loading path = hdfs://ns1/user/hive/warehouse/tropic.db/apm_log/.hive-staging_hive_2020-04-07_00-30-05_439_8394444377943465526-1/-ext-10000/year=2020/month=4/day=1 with PartSpec {year=2020, month=4, day=1}
20/04/07 00:42:56 INFO metadata.Hive: New loading path = hdfs://ns1/user/hive/warehouse/tropic.db/apm_log/.hive-staging_hive_2020-04-07_00-30-05_439_8394444377943465526-1/-ext-10000/year=2020/month=4/day=2 with PartSpec {year=2020, month=4, day=2}
20/04/07 00:42:56 INFO metadata.Hive: New loading path = hdfs://ns1/user/hive/warehouse/tropic.db/apm_log/.hive-staging_hive_2020-04-07_00-30-05_439_8394444377943465526-1/-ext-10000/year=2020/month=4/day=3 with PartSpec {year=2020, month=4, day=3}
20/04/07 00:42:57 INFO metadata.Hive: Loaded 7 partitions
第二次跑:
20/04/07 01:02:54 INFO metadata.Hive: New loading path = hdfs://ns1/user/hive/warehouse/tropic.db/apm_log/.hive-staging_hive_2020-04-07_01-00-00_640_8171968960099790643-1/-ext-10000/year=2020/month=4/day=7 with PartSpec {year=2020, month=4, day=7}
20/04/07 01:02:54 INFO metadata.Hive: Loaded 1 partitions
获取失败n
20/04/07 01:30:02 WARN metadata.Hive: No partition is generated by dynamic partitioning
20/04/07 01:30:02 INFO metadata.Hive: Loaded 0 partitions
有数据的情况下:
[wangyoujun@etl1 shell]$ hadoop fs -ls -R /user/hive/warehouse/tropic.db/aliyun_survey_log/
drwxrwx--x+ - hive hive 0 2020-04-08 11:02 /user/hive/warehouse/tropic.db/aliyun_survey_log/.hive-staging_hive_2020-04-08_11-02-24_147_1325387100997424892-2
drwxrwx--x+ - hive hive 0 2020-04-08 11:02 /user/hive/warehouse/tropic.db/aliyun_survey_log/.hive-staging_hive_2020-04-08_11-02-24_147_1325387100997424892-2/-ext-10000
drwxrwx--x+ - hive hive 0 2020-04-08 11:02 /user/hive/warehouse/tropic.db/aliyun_survey_log/.hive-staging_hive_2020-04-08_11-02-24_147_1325387100997424892-2/-ext-10000/_temporary
drwxrwx--x+ - hive hive 0 2020-04-08 11:02 /user/hive/warehouse/tropic.db/aliyun_survey_log/.hive-staging_hive_2020-04-08_11-02-24_147_1325387100997424892-2/-ext-10000/_temporary/0
drwxrwx--x+ - hive hive 0 2020-04-03 18:19 /user/hive/warehouse/tropic.db/aliyun_survey_log/idn_year=2020
drwxrwx--x+ - hive hive 0 2020-04-08 10:30 /user/hive/warehouse/tropic.db/aliyun_survey_log/idn_year=2020/idn_month=4
drwxrwx--x+ - hive hive 0 2020-04-08 10:28 /user/hive/warehouse/tropic.db/aliyun_survey_log/idn_year=2020/idn_month=4/idn_day=1
spark executor de log
h; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@etl1.ecreditpal.local:34518)
20/04/08 13:04:54 INFO spark.MapOutputTrackerWorker: Got the output locations
20/04/08 13:04:54 INFO storage.ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks
20/04/08 13:04:54 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
20/04/08 13:04:54 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
20/04/08 13:04:54 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
20/04/08 13:04:54 INFO datasources.SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
20/04/08 13:04:54 INFO mapred.SparkHadoopMapRedUtil: No need to commit output of task because needsTaskCommit=false: attempt_20200408130451_2438_m_000000_0
20/04/08 13:04:54 INFO executor.Executor: Finished task 0.0 in stage 2438.0 (TID 12237). 2728 bytes result sent to driver
20/04/08 13:04:54 INFO storage.BlockManager: Removing RDD 6172
20/04/08 13:04:54 INFO storage.BlockManager: Removing RDD 6174
20/04/08 13:04:54 INFO storage.BlockManager: Removing RDD 6173
20/04/08 13:05:02 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 12238
20/04/08 13:05:02 INFO executor.Executor: Running task 0.0 in stage 2440.0 (TID 12238)
20/04/08 13:05:02 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 2435
20/04/08 13:05:02 INFO memory.MemoryStore: Block broadcast_2435_piece0 stored as bytes in memory (estimated size 166.6 KB, free 911.3 MB)
20/04/08 13:05:02 INFO broadcast.TorrentBroadcast: Reading broadcast variable 2435 took 43 ms
20/04/08 13:05:02 INFO memory.MemoryStore: Block broadcast_2435 stored as values in memory (estimated size 440.3 KB, free 910.9 MB)
20/04/08 13:05:02 INFO spark.MapOutputTrackerWorker: Don't have map outputs for shuffle 317, fetching them
20/04/08 13:05:02 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@etl1.ecreditpal.local:34518)
20/04/08 13:05:02 INFO spark.MapOutputTrackerWorker: Got the output locations
20/04/08 13:05:02 INFO storage.ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks
20/04/08 13:05:02 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
20/04/08 13:05:02 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
20/04/08 13:05:02 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
20/04/08 13:05:02 INFO datasources.SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
20/04/08 13:05:02 INFO mapred.SparkHadoopMapRedUtil: No need to commit output of task because needsTaskCommit=false: attempt_20200408130500_2440_m_000000_0
20/04/08 13:05:02 INFO executor.Executor: Finished task 0.0 in stage 2440.0 (TID 12238). 2728 bytes result sent to driver
20/04/08 13:05:03 INFO storage.BlockManager: Removing RDD 6203
20/04/08 13:05:03 INFO storage.BlockManager: Removing RDD 6204
20/04/08 13:05:03 INFO storage.BlockManager: Removing RDD 6205
20/04/08 13:05:33 INFO storage.BlockManager: Removing RDD 6232
20/04/08 13:05:33 INFO storage.BlockManager: Removing RDD 6233
20/04/08 13:05:33 INFO storage.BlockManager: Removing RDD 6234
20/04/08 13:06:02 INFO storage.BlockManager: Removing RDD 6233
20/04/08 13:06:02 INFO storage.BlockManager: Removing RDD 6173
20/04/08 13:06:03 INFO storage.BlockManager: Removing RDD 6246
20/04/08 13:06:03 INFO storage.BlockManager: Removing RDD 6247
20/04/08 13:06:03 INFO storage.BlockManager: Removing RDD 6248
20/04/08 13:06:04 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
20/04/08 13:06:04 INFO storage.DiskBlockManager: Shutdown hook called
20/04/08 13:06:04 INFO util.ShutdownHookManager: Shutdown hook called
aliyunlog log get_end_cursor --project_name="tropic" --logstore_name="apm-prod" --shard_id=0
MTU4NTY0MjU2NjExMjUwNDQ2NA==
aliyunlog log get_cursor_time --project_name="tropic" --logstore_name="apm-prod" --shard_id=0 --cursor="MTU4NTY0MjU2NjExMjUwNDQ2NA=="
开始的时候我就是根据WARN信息到Hive和Spark 的源码中找,发现Hive 的元数据这里会报这个Warn,然后我就一直以为是元数据的更新问题,我就在每次跑批的开始添加一行刷新元数据的代码,玩了发现这个Warn还是存在,消费还是会被中断。
最后发现是资源分配不够,CPU不够了,就是线程不够了, aliyun应该是 推 数据给streaming.CPU被占死了.
Spark-submit 里面我没有设置核心数,使用的是系统默认的2个CPU,然后我的 batch_interval 和 block_interval 这两个参数设置的很不合理,因为我的代码实时性要求不高,所以 batch_interval 设置的是5分钟,但是 block_interval 设置的还是默认的200ms,后面并行处理的时候会很耗CPU,最后把这两个参数都修改成5分钟,然后分配1个executor, 分配4个核心、内存8G。
之前的配置时 3个executor,每个2核心、4G内存。
我对spark 的理解还是很浅薄,仍需多多努力。
今天在spark 官网看到了这句话:
纸上得来终觉浅呐.