原文如下
原文地址 https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sparkcontext-ConsoleProgressBar.html
ConsoleProgressBar
ConsoleProgressBar shows the progress of active stages to standard error, i.e. stderr. It uses SparkStatusTracker to poll the status of stages periodically and print out active stages with more than one task. It keeps overwriting itself to hold in one line for at most 3 first concurrent stages at a time.
[Stage 0:====> (316 + 4) / 1000][Stage 1:> (0 + 0) / 1000][Stage 2:> (0 + 0) / 1000]]]
The progress includes the stage id, the number of completed, active, and total tasks.
| Tip | ConsoleProgressBar may be useful when you ssh to workers and want to see the progress of active stages. |
ConsoleProgressBar is created when SparkContext starts with spark.ui.showConsoleProgressenabled and the logging level of org.apache.spark.SparkContext logger as WARN or higher (i.e. less messages are printed out and so there is a "space" for ConsoleProgressBar).
import org.apache.log4j._
Logger.getLogger("org.apache.spark.SparkContext").setLevel(Level.WARN)
To print the progress nicely ConsoleProgressBar uses COLUMNS environment variable to know the width of the terminal. It assumes 80 columns.
The progress bar prints out the status after a stage has ran at least 500 milliseconds every spark.ui.consoleProgress.update.interval milliseconds.
| Note | The initial delay of 500 milliseconds before ConsoleProgressBar show the progress is not configurable. |
See the progress bar in Spark shell with the following:
$ ./bin/spark-shell --conf spark.ui.showConsoleProgress=true (1)
scala> sc.setLogLevel("OFF") (2)
scala> import org.apache.log4j._
import org.apache.log4j._
scala> Logger.getLogger("org.apache.spark.SparkContext").setLevel(Level.WARN) (3)
scala> sc.parallelize(1 to 4, 4).map { n => Thread.sleep(500 + 200 * n); n }.count (4)
[Stage 2:> (0 + 4) / 4]
[Stage 2:==============> (1 + 3) / 4]
[Stage 2:=============================> (2 + 2) / 4]
[Stage 2:============================================> (3 + 1) / 4]
-
Make sure
spark.ui.showConsoleProgressistrue. It is by default. -
Disable (
OFF) the root logger (that includes Spark’s logger) -
Make sure
org.apache.spark.SparkContextlogger is at leastWARN. -
Run a job with 4 tasks with 500ms initial sleep and 200ms sleep chunks to see the progress bar.
简言之:
1、如果是使用idea、eclipse等ide编写代码,需要以下2步:
1) SparkSession 、SparkConf 创建之前,加入如下代码
import org.apache.log4j._
Logger.getLogger("org.apache.spark.SparkContext").setLevel(Level.WARN)
2) 创建SparkSession时设置 spark.ui.showConsoleProgress 为 true
2、如果是使用 spark-shell 则需要 如下操作:
$ ./bin/spark-shell --conf spark.ui.showConsoleProgress=true (1)
scala> sc.setLogLevel("OFF") (2)
scala> import org.apache.log4j._
import org.apache.log4j._
scala> Logger.getLogger("org.apache.spark.SparkContext").setLevel(Level.WARN) (3)
scala> sc.parallelize(1 to 4, 4).map { n => Thread.sleep(500 + 200 * n); n }.count (4)
[Stage 2:> (0 + 4) / 4]
[Stage 2:==============> (1 + 3) / 4]
[Stage 2:=============================> (2 + 2) / 4]
[Stage 2:============================================> (3 + 1) / 4]
本文深入解析了Apache Spark中ConsoleProgressBar的功能与配置,介绍了如何在IDE和Spark Shell中启用进度条显示,以及其工作原理和环境变量的影响。
82





