有时Spark job启动之后想看看还有多久才能运行完成,可以通过上下不stage的write和read数量来粗略的判断。
例如stage 9
Details for Stage 9 (Attempt 0)
Total Time Across All Tasks: 34 min
Locality Level Summary: Node local: 16; Rack local: 184
Shuffle Read: 354.1 MB / 499543
Shuffle Write: 30.1 MB / 14219
下一个stage 10
Details for Stage 10 (Attempt 0)
Total Time Across All Tasks: 0 ms
Locality Level Summary: Process local: 5
Shuffle Read: 27.6 MB / 12770
也就是还剩下 14219 - 12770 = 1449条数据没有处理