spark-run apps on yarn mode

本文详细介绍了如何使用YARN在Spark中运行JavaWordCount示例,包括配置、提交命令及查看应用程序信息的过程。通过设置HADOOP_CONF_DIR,指定Spark的应用程序,以及使用spark-submit命令来启动任务,并且可以调整执行器的数量和内存大小。此外,还提供了从YARN主节点检查应用状态的方法,以及如何通过命令行参数来定制任务执行环境。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

  run on a yarn ensemble is straightforward,

  1.setup HADOOP_CONF_DIR

   u can use command export HADOOP_CONF_DIR=xx

   or add it to spark-env.sh

   2.

spark-submit  --master yarn --class org.apache.spark.examples.JavaWordCount --verbose --deploy-mode client ~/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar RELEASE 2

  u can goto yarn master ui to check the app info.

  also,if u wanna specify # of executors(containers?) ,just add this property in the command above

 --num-executors 2

 

--AppMaster logs.

hadoop    2758 13206  0 16:52 ?        00:00:00 /bin/bash -c /usr/local/jdk/jdk1.6.0_31/bin/java -server -Xmx512m -Djava.io.tmpdir=/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0029/container_1441038159113_0029_01_000001/tmp '-Dspark.eventLog.enabled=true' '-Dspark.externalBlockStore.folderName=spark-a5761a0d-2f87-4afc-b4eb-dbaf1fd86ef4' '-Dspark.executor.memory=2g' '-Dspark.jars=file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0.jar' '-Dspark.master.ui.port=7102' '-Dspark.ui.port=7108' '-Dspark.worker.ui.port=7105' '-Dspark.driver.appUIAddress=http://192.168.100.4:7108' '-Dspark.master=yarn-client' '-Dspark.driver.allowMultipleContexts=true' '-Dspark.driver.port=52394' '-Dspark.eventLog.dir=hdfs://hd02:8020/user/hadoop/spark-eventlog' '-Dspark.executor.id=driver' '-Dspark.executor.extraJavaOptions=-Xloggc:~/spark-executor.gc -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=2 -XX:CMSInitiatingOccupancyFraction=65 -XX:+UseCMSInitiatingOccupancyOnly -XX:PermSize=64m -XX:MaxPermSize=256m -XX:NewRatio=5 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:ParallelGCThreads=5' '-Dspark.executor.cores=2' '-Dspark.driver.host=192.168.100.4' '-Dspark.driver.memory=6g' '-Dspark.storage.memoryFraction=0.5' '-Dspark.app.name=JavaWordCount' '-Dspark.fileserver.uri=http://192.168.100.4:48227' '-Dspark.cores.max=50' -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0029/container_1441038159113_0029_01_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg '192.168.100.4:52394' --executor-memory 2048m --executor-cores 2 --num-executors  2 1> /usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0029/container_1441038159113_0029_01_000001/stdout 2> /usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0029/container_1441038159113_0029_01_000001/stderr

hadoop    2763  2758 23 16:52 ?        00:00:06 /usr/local/jdk/jdk1.6.0_31/bin/java -server -Xmx512m -Djava.io.tmpdir=/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0029/container_1441038159113_0029_01_000001/tmp -Dspark.eventLog.enabled=true -Dspark.externalBlockStore.folderName=spark-a5761a0d-2f87-4afc-b4eb-dbaf1fd86ef4 -Dspark.executor.memory=2g -Dspark.jars=file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0.jar -Dspark.master.ui.port=7102 -Dspark.ui.port=7108 -Dspark.worker.ui.port=7105 -Dspark.driver.appUIAddress=http://192.168.100.4:7108 -Dspark.master=yarn-client -Dspark.driver.allowMultipleContexts=true -Dspark.driver.port=52394 -Dspark.eventLog.dir=hdfs://hd02:8020/user/hadoop/spark-eventlog -Dspark.executor.id=driver -Dspark.executor.extraJavaOptions=-Xloggc:~/spark-executor.gc -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=2 -XX:CMSInitiatingOccupancyFraction=65 -XX:+UseCMSInitiatingOccupancyOnly -XX:PermSize=64m -XX:MaxPermSize=256m -XX:NewRatio=5 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:ParallelGCThreads=5 -Dspark.executor.cores=2 -Dspark.driver.host=192.168.100.4 -Dspark.driver.memory=6g -Dspark.storage.memoryFraction=0.5 -Dspark.app.name=JavaWordCount -Dspark.fileserver.uri=http://192.168.100.4:48227 -Dspark.cores.max=50 -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0029/container_1441038159113_0029_01_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg 192.168.100.4:52394 --executor-memory 2048m --executor-cores 2 --num-executors 2

  --task container logs

hadoop   10382  1055  0 17:20 ?        00:00:00 /bin/bash -c /usr/local/jdk/jdk1.6.0_31/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms2048m -Xmx2048m '-Xloggc:~/spark-executor.gc' '-XX:+UseCMSCompactAtFullCollection' '-XX:CMSFullGCsBeforeCompaction=2' '-XX:CMSInitiatingOccupancyFraction=65' '-XX:+UseCMSInitiatingOccupancyOnly' '-XX:PermSize=64m' '-XX:MaxPermSize=256m' '-XX:NewRatio=5' '-XX:+UseParNewGC' '-XX:+UseConcMarkSweepGC' '-XX:+PrintGCDateStamps' '-XX:+PrintGCDetails' '-XX:ParallelGCThreads=5' -Djava.io.tmpdir=/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0031/container_1441038159113_0031_01_000003/tmp '-Dspark.master.ui.port=7102' '-Dspark.ui.port=7108' '-Dspark.worker.ui.port=7105' '-Dspark.driver.port=44382' -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0031/container_1441038159113_0031_01_000003 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@192.168.100.4:44382/user/CoarseGrainedScheduler --executor-id 2 --hostname gzsw-13 --cores 2 --app-id application_1441038159113_0031 --user-class-path file:/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0031/container_1441038159113_0031_01_000003/__app__.jar 1> /usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0031/container_1441038159113_0031_01_000003/stdout 2> /usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0031/container_1441038159113_0031_01_000003/stderr
hadoop   10386 10382 99 17:20 ?        00:00:25 /usr/local/jdk/jdk1.6.0_31/bin/java -server -XX:OnOutOfMemoryError=kill %p -Xms2048m -Xmx2048m -Xloggc:~/spark-executor.gc -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=2 -XX:CMSInitiatingOccupancyFraction=65 -XX:+UseCMSInitiatingOccupancyOnly -XX:PermSize=64m -XX:MaxPermSize=256m -XX:NewRatio=5 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:ParallelGCThreads=5 -Djava.io.tmpdir=/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0031/container_1441038159113_0031_01_000003/tmp -Dspark.master.ui.port=7102 -Dspark.ui.port=7108 -Dspark.worker.ui.port=7105 -Dspark.driver.port=44382 -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0031/container_1441038159113_0031_01_000003 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@192.168.100.4:44382/user/CoarseGrainedScheduler --executor-id 2 --hostname gzsw-13 --cores 2 --app-id application_1441038159113_0031 --user-class-path file:/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0031/container_1441038159113_0031_01_000003/__app__.jar

 

  corresponding figures:



 

 

   the logs from driver :(u will two tasks are run on host-05 on first stage; one for each of both host-05,06 for second stage)

hadoop@GZsw04:~/spark/spark-1.4.1-bin-hadoop2.4$ spark-submit  --master yarn --class org.apache.spark.examples.JavaWordCount --verbose --deploy-mode client ~/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar RELEASE 2
Using properties file: /home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/conf/spark-defaults.conf
Adding default property: spark.executor.extraJavaOptions=-Xloggc:/home/hadoop/spark-executor.gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.ui.port=7106
Adding default property: spark.deploy.spreadOut=false
Adding default property: spark.worker.ui.port=7105
Adding default property: spark.master.ui.port=7102
Adding default property: spark.eventLog.dir=/home/hadoop/spark/spark-eventlog
Adding default property: spark.driver.allowMultipleContexts=true
Parsed arguments:
  master                  yarn
  deployMode              client
  executorMemory          null
  executorCores           null
  totalExecutorCores      null
  propertiesFile          /home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/conf/spark-defaults.conf
  driverMemory            1g
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               org.apache.spark.examples.JavaWordCount
  primaryResource         file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar
  name                    org.apache.spark.examples.JavaWordCount
  childArgs               [RELEASE 2]
  jars                    null
  packages                null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file /home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/conf/spark-defaults.conf:
  spark.eventLog.enabled -> true
  spark.driver.allowMultipleContexts -> true
  spark.ui.port -> 7106
  spark.executor.extraJavaOptions -> -Xloggc:/home/hadoop/spark-executor.gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails
  spark.deploy.spreadOut -> false
  spark.eventLog.dir -> /home/hadoop/spark/spark-eventlog
  spark.worker.ui.port -> 7105
  spark.master.ui.port -> 7102

    
Main class:
org.apache.spark.examples.JavaWordCount
Arguments:
RELEASE
2
System properties:
spark.driver.memory -> 1g
spark.eventLog.enabled -> true
spark.driver.allowMultipleContexts -> true
SPARK_SUBMIT -> true
spark.ui.port -> 7106
spark.executor.extraJavaOptions -> -Xloggc:/home/hadoop/spark-executor.gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails
spark.deploy.spreadOut -> false
spark.app.name -> org.apache.spark.examples.JavaWordCount
spark.jars -> file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar
spark.eventLog.dir -> /home/hadoop/spark/spark-eventlog
spark.master -> yarn-client
spark.worker.ui.port -> 7105
spark.master.ui.port -> 7102
Classpath elements:
file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar


15/11/25 16:46:55 INFO spark.SparkContext: Running Spark version 1.4.1
15/11/25 16:46:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/11/25 16:46:55 INFO spark.SecurityManager: Changing view acls to: hadoop
15/11/25 16:46:55 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/11/25 16:46:55 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/11/25 16:46:56 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/11/25 16:46:56 INFO Remoting: Starting remoting
15/11/25 16:46:56 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.100.4:45880]
15/11/25 16:46:56 INFO util.Utils: Successfully started service 'sparkDriver' on port 45880.
15/11/25 16:46:56 INFO spark.SparkEnv: Registering MapOutputTracker
15/11/25 16:46:56 INFO spark.SparkEnv: Registering BlockManagerMaster
15/11/25 16:46:56 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-52cdfa49-3560-40bd-9540-107f059b5d95/blockmgr-a25e4dc5-b8e0-4877-ad63-b0e32880e187
15/11/25 16:46:56 INFO storage.MemoryStore: MemoryStore started with capacity 529.9 MB
15/11/25 16:46:57 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-52cdfa49-3560-40bd-9540-107f059b5d95/httpd-8b586e36-69a3-46c1-880d-5f294a643833
15/11/25 16:46:57 INFO spark.HttpServer: Starting HTTP Server
15/11/25 16:46:57 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/11/25 16:46:57 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:51033
15/11/25 16:46:57 INFO util.Utils: Successfully started service 'HTTP file server' on port 51033.
15/11/25 16:46:57 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/11/25 16:46:57 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/11/25 16:46:57 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:7106
15/11/25 16:46:57 INFO util.Utils: Successfully started service 'SparkUI' on port 7106.
15/11/25 16:46:57 INFO ui.SparkUI: Started SparkUI at http://192.168.100.4:7106
15/11/25 16:46:57 INFO spark.SparkContext: Added JAR file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar at http://192.168.100.4:51033/jars/spark-examples-1.4.1-hadoop2.4.0-my.jar with timestamp 1448441217526
15/11/25 16:46:57 WARN cluster.YarnClientSchedulerBackend: NOTE: SPARK_WORKER_MEMORY is deprecated. Use SPARK_EXECUTOR_MEMORY or --executor-memory through spark-submit instead.
15/11/25 16:46:57 WARN cluster.YarnClientSchedulerBackend: NOTE: SPARK_WORKER_CORES is deprecated. Use SPARK_EXECUTOR_CORES or --executor-cores through spark-submit instead.
15/11/25 16:46:57 INFO client.RMProxy: Connecting to ResourceManager at hd02/192.168.100.4:8032
15/11/25 16:46:57 INFO yarn.Client: Requesting a new application from cluster with 10 NodeManagers
15/11/25 16:46:57 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/11/25 16:46:57 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/11/25 16:46:57 INFO yarn.Client: Setting up container launch context for our AM
15/11/25 16:46:57 INFO yarn.Client: Preparing resources for our AM container
15/11/25 16:46:58 INFO yarn.Client: Uploading resource file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar -> hdfs://mycluster/user/hadoop/.sparkStaging/application_1441038159113_0003/spark-assembly-1.4.1-hadoop2.4.0.jar
15/11/25 16:47:00 INFO yarn.Client: Uploading resource file:/tmp/spark-52cdfa49-3560-40bd-9540-107f059b5d95/__hadoop_conf__6446760494119929942.zip -> hdfs://mycluster/user/hadoop/.sparkStaging/application_1441038159113_0003/__hadoop_conf__6446760494119929942.zip
15/11/25 16:47:00 INFO yarn.Client: Setting up the launch environment for our AM container
15/11/25 16:47:00 INFO spark.SecurityManager: Changing view acls to: hadoop
15/11/25 16:47:00 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/11/25 16:47:00 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/11/25 16:47:00 INFO yarn.Client: Submitting application 3 to ResourceManager
15/11/25 16:47:00 INFO impl.YarnClientImpl: Submitted application application_1441038159113_0003
15/11/25 16:47:01 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED)
15/11/25 16:47:01 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1448441220409
	 final status: UNDEFINED
	 tracking URL: http://hd02:7104/proxy/application_1441038159113_0003/
	 user: hadoop
15/11/25 16:47:02 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED)
15/11/25 16:47:03 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED)
15/11/25 16:47:04 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED)
15/11/25 16:47:05 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED)
15/11/25 16:47:06 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED)
15/11/25 16:47:06 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as AkkaRpcEndpointRef(Actor[akka.tcp://sparkYarnAM@192.168.100.14:46652/user/YarnAM#-1250321572])
15/11/25 16:47:06 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hd02, PROXY_URI_BASES -> http://hd02:7104/proxy/application_1441038159113_0003), /proxy/application_1441038159113_0003
15/11/25 16:47:06 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15/11/25 16:47:07 INFO yarn.Client: Application report for application_1441038159113_0003 (state: RUNNING)
15/11/25 16:47:07 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: 192.168.100.14
	 ApplicationMaster RPC port: 0
	 queue: default
	 start time: 1448441220409
	 final status: UNDEFINED
	 tracking URL: http://hd02:7104/proxy/application_1441038159113_0003/
	 user: hadoop
15/11/25 16:47:07 INFO cluster.YarnClientSchedulerBackend: Application application_1441038159113_0003 has started running.
15/11/25 16:47:07 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52047.
15/11/25 16:47:07 INFO netty.NettyBlockTransferService: Server created on 52047
15/11/25 16:47:07 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/11/25 16:47:07 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.100.4:52047 with 529.9 MB RAM, BlockManagerId(driver, 192.168.100.4, 52047)
15/11/25 16:47:07 INFO storage.BlockManagerMaster: Registered BlockManager
15/11/25 16:47:07 INFO scheduler.EventLoggingListener: Logging events to file:/home/hadoop/spark/spark-eventlog/application_1441038159113_0003
15/11/25 16:47:17 INFO cluster.YarnClientSchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@gzsw-05:55796/user/Executor#-2059071929]) with ID 1
15/11/25 16:47:17 INFO storage.BlockManagerMasterEndpoint: Registering block manager gzsw-05:52897 with 2.1 GB RAM, BlockManagerId(1, gzsw-05, 52897)
15/11/25 16:47:17 INFO cluster.YarnClientSchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@gzsw-06:56733/user/Executor#261866940]) with ID 2
15/11/25 16:47:17 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
15/11/25 16:47:17 INFO storage.BlockManagerMasterEndpoint: Registering block manager gzsw-06:38994 with 2.1 GB RAM, BlockManagerId(2, gzsw-06, 38994)
15/11/25 16:47:17 INFO storage.MemoryStore: ensureFreeSpace(228640) called with curMem=0, maxMem=555684986
15/11/25 16:47:17 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 223.3 KB, free 529.7 MB)
15/11/25 16:47:17 INFO storage.MemoryStore: ensureFreeSpace(18166) called with curMem=228640, maxMem=555684986
15/11/25 16:47:17 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 17.7 KB, free 529.7 MB)
15/11/25 16:47:17 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.100.4:52047 (size: 17.7 KB, free: 529.9 MB)
15/11/25 16:47:17 INFO spark.SparkContext: Created broadcast 0 from textFile at JavaWordCount.java:49
15/11/25 16:47:17 INFO mapred.FileInputFormat: Total input paths to process : 1
15/11/25 16:47:17 INFO spark.SparkContext: Starting job: collect at JavaWordCount.java:72
15/11/25 16:47:17 INFO scheduler.DAGScheduler: Registering RDD 3 (mapToPair at JavaWordCount.java:58)
15/11/25 16:47:17 INFO scheduler.DAGScheduler: Got job 0 (collect at JavaWordCount.java:72) with 2 output partitions (allowLocal=false)
15/11/25 16:47:17 INFO scheduler.DAGScheduler: Final stage: ResultStage 1(collect at JavaWordCount.java:72)
15/11/25 16:47:17 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
15/11/25 16:47:17 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
15/11/25 16:47:17 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at mapToPair at JavaWordCount.java:58), which has no missing parents
15/11/25 16:47:18 INFO storage.MemoryStore: ensureFreeSpace(4736) called with curMem=246806, maxMem=555684986
15/11/25 16:47:18 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.6 KB, free 529.7 MB)
15/11/25 16:47:18 INFO storage.MemoryStore: ensureFreeSpace(2644) called with curMem=251542, maxMem=555684986
15/11/25 16:47:18 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.6 KB, free 529.7 MB)
15/11/25 16:47:18 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.100.4:52047 (size: 2.6 KB, free: 529.9 MB)
15/11/25 16:47:18 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:874
15/11/25 16:47:18 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at mapToPair at JavaWordCount.java:58)
15/11/25 16:47:18 INFO cluster.YarnScheduler: Adding task set 0.0 with 2 tasks
15/11/25 16:47:18 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, gzsw-05, NODE_LOCAL, 1479 bytes)
15/11/25 16:47:18 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, gzsw-05, NODE_LOCAL, 1479 bytes)
15/11/25 16:47:19 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on gzsw-05:52897 (size: 2.6 KB, free: 2.1 GB)
15/11/25 16:47:19 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on gzsw-05:52897 (size: 17.7 KB, free: 2.1 GB)
15/11/25 16:47:20 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 2705 ms on gzsw-05 (1/2)
15/11/25 16:47:20 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2725 ms on gzsw-05 (2/2)
15/11/25 16:47:20 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/11/25 16:47:20 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (mapToPair at JavaWordCount.java:58) finished in 2.733 s
15/11/25 16:47:20 INFO scheduler.DAGScheduler: looking for newly runnable stages
15/11/25 16:47:20 INFO scheduler.DAGScheduler: running: Set()
15/11/25 16:47:20 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
15/11/25 16:47:20 INFO scheduler.DAGScheduler: failed: Set()
15/11/25 16:47:20 INFO scheduler.DAGScheduler: Missing parents for ResultStage 1: List()
15/11/25 16:47:20 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at JavaWordCount.java:65), which is now runnable
15/11/25 16:47:20 INFO storage.MemoryStore: ensureFreeSpace(2408) called with curMem=254186, maxMem=555684986
15/11/25 16:47:20 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.4 KB, free 529.7 MB)
15/11/25 16:47:20 INFO storage.MemoryStore: ensureFreeSpace(1459) called with curMem=256594, maxMem=555684986
15/11/25 16:47:20 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1459.0 B, free 529.7 MB)
15/11/25 16:47:20 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.100.4:52047 (size: 1459.0 B, free: 529.9 MB)
15/11/25 16:47:20 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:874
15/11/25 16:47:20 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at JavaWordCount.java:65)
15/11/25 16:47:20 INFO cluster.YarnScheduler: Adding task set 1.0 with 2 tasks
15/11/25 16:47:20 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, gzsw-06, PROCESS_LOCAL, 1246 bytes)
15/11/25 16:47:20 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, gzsw-05, PROCESS_LOCAL, 1246 bytes)
15/11/25 16:47:20 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on gzsw-05:52897 (size: 1459.0 B, free: 2.1 GB)
15/11/25 16:47:20 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to gzsw-05:55796
15/11/25 16:47:20 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 147 bytes
15/11/25 16:47:20 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 98 ms on gzsw-05 (1/2)
15/11/25 16:47:22 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on gzsw-06:38994 (size: 1459.0 B, free: 2.1 GB)
15/11/25 16:47:22 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to gzsw-06:56733
15/11/25 16:47:22 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 1748 ms on gzsw-06 (2/2)
15/11/25 16:47:22 INFO scheduler.DAGScheduler: ResultStage 1 (collect at JavaWordCount.java:72) finished in 1.749 s
15/11/25 16:47:22 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/11/25 16:47:22 INFO scheduler.DAGScheduler: Job 0 finished: collect at JavaWordCount.java:72, took 4.603967 s
total items 14
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
15/11/25 16:47:22 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.100.4:7106
15/11/25 16:47:22 INFO scheduler.DAGScheduler: Stopping DAGScheduler
15/11/25 16:47:22 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
15/11/25 16:47:22 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
15/11/25 16:47:22 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
15/11/25 16:47:22 INFO cluster.YarnClientSchedulerBackend: Stopped
15/11/25 16:47:22 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/11/25 16:47:22 INFO util.Utils: path = /tmp/spark-52cdfa49-3560-40bd-9540-107f059b5d95/blockmgr-a25e4dc5-b8e0-4877-ad63-b0e32880e187, already present as root for deletion.
15/11/25 16:47:22 INFO storage.MemoryStore: MemoryStore cleared
15/11/25 16:47:22 INFO storage.BlockManager: BlockManager stopped
15/11/25 16:47:22 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
15/11/25 16:47:22 INFO spark.SparkContext: Successfully stopped SparkContext
15/11/25 16:47:22 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
15/11/25 16:47:22 INFO util.Utils: Shutdown hook called
15/11/25 16:47:22 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/11/25 16:47:22 INFO util.Utils: Deleting directory /tmp/spark-52cdfa49-3560-40bd-9540-107f059b5d95
15/11/25 16:47:22 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值