shell脚本log-yarn.sh如下:
export HADOOP_CONF_DIR=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop
$SPARK_HOME/bin/spark-submit \
--master yarn \
--class www.ruozedata.bigdata.SparkCore02.LocalServeApp \
--name LocalServeApp \
/home/hadoop/lib/g5-spark-1.0.jar_log \
hdfs://hadoop001:9000/data/logs/input/secondhomework.txt hdfs://hadoop001:9000/data/logs/output
jar包如下:
/*
* 输入:args(0)
* 输出:args(1)
*
* */
import java.net.URI
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.spark.{SparkConf, SparkContext}
object LocalServeApp {
def main(args: Array[String]): Unit = {
val sparkConf=new SparkConf()
val sc=new SparkContext(sparkConf)
val lines=sc.textFile(args(0))
val configuration=new Configuration()
val uri = new URI("hdfs://192.168.2.65:9000")
val fileSystem = FileSystem.get(uri, configuration)
if (fileSystem.exists(new Path(args(1)))){
fileSystem.delete(new Path(args(1)),true)
}
lines.map(x=>{
val temp=x.split("\t")
val domain=temp(0)
var response=0L
try{
response=temp(2).toLong}catch{
case e:Exception=>println("...")
}
(domain,response)
}).reduceByKey(_+_).saveAsTextFile(args(1))
sc.stop()//最后一定要关掉
fileSystem.close()//文件系统最后也一定要关掉
}
}
执行代码:
[hadoop@hadoop001 shell]$ ./log-yarn.sh
15/01/11 14:27:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/11 14:27:24 INFO spark.SparkContext: Running Spark version 2.4.0
15/01/11 14:27:24 INFO spark.SparkContext: Submitted application: LocalServeApp
15/01/11 14:27:24 INFO spark.SecurityManager: Changing view acls to: hadoop
15/01/11 14:27:24 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/01/11 14:27:24 INFO spark.SecurityManager: Changing view acls groups to:
15/01/11 14:27:24 INFO spark.SecurityManager: Changing modify acls groups to:
15/01/11 14:27:24 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
15/01/11 14:27:24 INFO util.Utils: Successfully started service 'sparkDriver' on port 34021.
15/01/11 14:27:24 INFO spark.SparkEnv: Registering MapOutputTracker
15/01/11 14:27:24 INFO spark.SparkEnv: Registering BlockManagerMaster
15/01/11 14:27:24 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
15/01/11 14:27:24 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
15/01/11 14:27:24 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-56bdc67f-95b8-43a5-a4c8-9a64a6071228
15/01/11 14:27:24 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
15/01/11 14:27:24 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/01/11 14:27:24 INFO util.log: Logging initialized @1653ms
15/01/11 14:27:24 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
15/01/11 14:27:24 INFO server.Server: Started @1721ms
15/01/11 14:27:24 INFO server.AbstractConnector: Started ServerConnector@619bd14c{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
15/01/11 14:27:24 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6a175569{/jobs,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7b02e036{/jobs/json,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@25243bc1{/jobs/job,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2e6ee0bc{/jobs/job/json,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4201a617{/stages,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@467f77a5{/stages/json,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1bb9aa43{/stages/stage,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@308a6984{/stages/stage/json,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66b72664{/stages/pool,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7a34b7b8{/stages/pool/json,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@58cd06cb{/storage,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3be8821f{/storage/json,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@64b31700{/storage/rdd,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3b65e559{/storage/rdd/json,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@bae47a0{/environment,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@74a9c4b0{/environment/json,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@85ec632{/executors,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1c05a54d{/executors/json,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@65ef722a{/executors/threadDump,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5fd9b663{/executors/threadDump/json,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@214894fc{/static,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@71652c98{/,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@51bde877{/api,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2fb68ec6{/jobs/job/kill,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@d71adc2{/stages/stage/kill,null,AVAILABLE,@Spark}
15/01/11 14:27:24 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://hadoop001:4040
15/01/11 14:27:24 INFO spark.SparkContext: Added JAR file:/home/hadoop/lib/g5-spark-1.0.jar_log at spark://hadoop001:34021/jars/g5-spark-1.0.jar_log with timestamp 1420957644657
15/01/11 14:27:25 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/01/11 14:27:25 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
15/01/11 14:27:25 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/01/11 14:27:25 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/01/11 14:27:25 INFO yarn.Client: Setting up container launch context for our AM
15/01/11 14:27:25 INFO yarn.Client: Setting up the launch environment for our AM container
15/01/11 14:27:25 INFO yarn.Client: Preparing resources for our AM container
15/01/11 14:27:25 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
15/01/11 14:27:27 INFO yarn.Client: Uploading resource file:/tmp/spark-328ba892-38b4-4988-82b2-55f1dce94035/__spark_libs__600224909650099919.zip -> hdfs://hadoop001:9000/user/hadoop/.sparkStaging/application_1420082274401_0009/__spark_libs__600224909650099919.zip
15/01/11 14:27:29 INFO yarn.Client: Uploading resource file:/tmp/spark-328ba892-38b4-4988-82b2-55f1dce94035/__spark_conf__9196421366520971915.zip -> hdfs://hadoop001:9000/user/hadoop/.sparkStaging/application_1420082274401_0009/__spark_conf__.zip
15/01/11 14:27:29 INFO spark.SecurityManager: Changing view acls to: hadoop
15/01/11 14:27:29 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/01/11 14:27:29 INFO spark.SecurityManager: Changing view acls groups to:
15/01/11 14:27:29 INFO spark.SecurityManager: Changing modify acls groups to:
15/01/11 14:27:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
15/01/11 14:27:30 INFO yarn.Client: Submitting application application_1420082274401_0009 to ResourceManager
15/01/11 14:27:30 INFO impl.YarnClientImpl: Submitted application application_1420082274401_0009
15/01/11 14:27:30 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1420082274401_0009 and attemptId None
15/01/11 14:27:31 INFO yarn.Client: Application report for application_1420082274401_0009 (state: ACCEPTED)
15/01/11 14:27:31 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hadoop
start time: 1420957650096
final status: UNDEFINED
tracking URL: http://hadoop001:8088/proxy/application_1420082274401_0009/
user: hadoop
15/01/11 14:27:32 INFO yarn.Client: Application report for application_1420082274401_0009 (state: ACCEPTED)
15/01/11 14:27:33 INFO yarn.Client: Application report for application_1420082274401_0009 (state: ACCEPTED)
15/01/11 14:27:34 INFO yarn.Client: Application report for application_1420082274401_0009 (state: ACCEPTED)
15/01/11 14:27:35 INFO yarn.Client: Application report for application_1420082274401_0009 (state: FAILED)
15/01/11 14:27:35 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1420082274401_0009 failed 2 times due to AM Container for appattempt_1420082274401_0009_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://hadoop001:8088/proxy/application_1420082274401_0009/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1420082274401_0009_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
at org.apache.hadoop.util.Shell.run(Shell.java:478)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hadoop
start time: 1420957650096
final status: FAILED
tracking URL: http://hadoop001:8088/cluster/app/application_1420082274401_0009
user: hadoop
15/01/11 14:27:35 INFO yarn.Client: Deleted staging directory hdfs://hadoop001:9000/user/hadoop/.sparkStaging/application_1420082274401_0009
15/01/11 14:27:35 ERROR cluster.YarnClientSchedulerBackend: The YARN application has already ended! It might have been killed or the Application Master may have failed to start. Check the YARN application logs for more details.
15/01/11 14:27:35 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Application application_1420082274401_0009 failed 2 times due to AM Container for appattempt_1420082274401_0009_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://hadoop001:8088/proxy/application_1420082274401_0009/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1420082274401_0009_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
at org.apache.hadoop.util.Shell.run(Shell.java:478)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:94)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:178)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
at www.ruozedata.bigdata.SparkCore02.LocalServeApp$.main(LocalServeApp.scala:16)
at www.ruozedata.bigdata.SparkCore02.LocalServeApp.main(LocalServeApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/01/11 14:27:35 INFO server.AbstractConnector: Stopped Spark@619bd14c{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
15/01/11 14:27:35 INFO ui.SparkUI: Stopped Spark web UI at http://hadoop001:4040
15/01/11 14:27:35 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
15/01/11 14:27:35 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
15/01/11 14:27:35 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
15/01/11 14:27:35 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
15/01/11 14:27:35 INFO cluster.YarnClientSchedulerBackend: Stopped
15/01/11 14:27:35 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/01/11 14:27:35 INFO memory.MemoryStore: MemoryStore cleared
15/01/11 14:27:35 INFO storage.BlockManager: BlockManager stopped
15/01/11 14:27:35 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
15/01/11 14:27:35 WARN metrics.MetricsSystem: Stopping a MetricsSystem that is not running
15/01/11 14:27:35 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
15/01/11 14:27:35 INFO spark.SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: Application application_1420082274401_0009 failed 2 times due to AM Container for appattempt_1420082274401_0009_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://hadoop001:8088/proxy/application_1420082274401_0009/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1420082274401_0009_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
at org.apache.hadoop.util.Shell.run(Shell.java:478)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:94)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:178)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
at www.ruozedata.bigdata.SparkCore02.LocalServeApp$.main(LocalServeApp.scala:16)
at www.ruozedata.bigdata.SparkCore02.LocalServeApp.main(LocalServeApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/01/11 14:27:35 INFO util.ShutdownHookManager: Shutdown hook called
15/01/11 14:27:35 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-52270e76-a553-4d7e-9f1b-dc6f754accb2
15/01/11 14:27:35 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-328ba892-38b4-4988-82b2-55f1dce94035
原因:hadoop的java依赖有问题,spark要求的jdk的版本为1.8,而我之前环境变量在hadoop-env.sh里面配置的为1.7.虽然全局变量和hadoop的家目录的环境变量已经改成1.8,但是hadoop用的Java是在hadoop-env.sh下面指定的。所以会出现这种问题
修改完之后jps看下,先重启所有节点,再去运行
[hadoop@hadoop001 shell]$ ./log-yarn.sh
再次报错,错误结果如下:
15/01/12 01:32:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/12 01:32:15 INFO spark.SparkContext: Running Spark version 2.4.0
15/01/12 01:32:15 INFO spark.SparkContext: Submitted application: LocalServeApp
15/01/12 01:32:15 INFO spark.SecurityManager: Changing view acls to: hadoop
15/01/12 01:32:15 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/01/12 01:32:15 INFO spark.SecurityManager: Changing view acls groups to:
15/01/12 01:32:15 INFO spark.SecurityManager: Changing modify acls groups to:
15/01/12 01:32:15 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
15/01/12 01:32:15 INFO util.Utils: Successfully started service 'sparkDriver' on port 55831.
15/01/12 01:32:16 INFO spark.SparkEnv: Registering MapOutputTracker
15/01/12 01:32:16 INFO spark.SparkEnv: Registering BlockManagerMaster
15/01/12 01:32:16 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
15/01/12 01:32:16 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
15/01/12 01:32:16 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-72816f55-fb2d-444a-a8ae-1c62119a11b5
15/01/12 01:32:16 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
15/01/12 01:32:16 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/01/12 01:32:16 INFO util.log: Logging initialized @1866ms
15/01/12 01:32:16 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
15/01/12 01:32:16 INFO server.Server: Started @1953ms
15/01/12 01:32:16 INFO server.AbstractConnector: Started ServerConnector@38453f9b{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
15/01/12 01:32:16 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e041f0c{/jobs,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5cad8b7d{/jobs/json,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7b02e036{/jobs/job,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1e287667{/jobs/job/json,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2e6ee0bc{/stages,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4201a617{/stages/json,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@467f77a5{/stages/stage,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@df5f5c0{/stages/stage/json,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@308a6984{/stages/pool,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66b72664{/stages/pool/json,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7a34b7b8{/storage,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@58cd06cb{/storage/json,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3be8821f{/storage/rdd,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@64b31700{/storage/rdd/json,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3b65e559{/environment,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@bae47a0{/environment/json,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@74a9c4b0{/executors,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@85ec632{/executors/json,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1c05a54d{/executors/threadDump,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@65ef722a{/executors/threadDump/json,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5fd9b663{/static,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3d829787{/,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@71652c98{/api,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@117632cf{/jobs/job/kill,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2fb68ec6{/stages/stage/kill,null,AVAILABLE,@Spark}
15/01/12 01:32:16 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://hadoop001:4040
15/01/12 01:32:16 INFO spark.SparkContext: Added JAR file:/home/hadoop/lib/g5-spark-1.0.jar_log at spark://hadoop001:55831/jars/g5-spark-1.0.jar_log with timestamp 1420997536357
15/01/12 01:32:17 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/01/12 01:32:17 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
15/01/12 01:32:17 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/01/12 01:32:17 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/01/12 01:32:17 INFO yarn.Client: Setting up container launch context for our AM
15/01/12 01:32:17 INFO yarn.Client: Setting up the launch environment for our AM container
15/01/12 01:32:17 INFO yarn.Client: Preparing resources for our AM container
15/01/12 01:32:17 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
15/01/12 01:32:19 INFO yarn.Client: Uploading resource file:/tmp/spark-b2160c68-f14d-4e5e-bf01-f54add178cd4/__spark_libs__5568805464265044403.zip -> hdfs://hadoop001:9000/user/hadoop/.sparkStaging/application_1420997455428_0001/__spark_libs__5568805464265044403.zip
15/01/12 01:32:21 INFO yarn.Client: Uploading resource file:/tmp/spark-b2160c68-f14d-4e5e-bf01-f54add178cd4/__spark_conf__6800975015765015056.zip -> hdfs://hadoop001:9000/user/hadoop/.sparkStaging/application_1420997455428_0001/__spark_conf__.zip
15/01/12 01:32:21 INFO spark.SecurityManager: Changing view acls to: hadoop
15/01/12 01:32:21 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/01/12 01:32:21 INFO spark.SecurityManager: Changing view acls groups to:
15/01/12 01:32:21 INFO spark.SecurityManager: Changing modify acls groups to:
15/01/12 01:32:21 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
15/01/12 01:32:22 INFO yarn.Client: Submitting application application_1420997455428_0001 to ResourceManager
15/01/12 01:32:22 INFO impl.YarnClientImpl: Submitted application application_1420997455428_0001
15/01/12 01:32:22 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1420997455428_0001 and attemptId None
15/01/12 01:32:23 INFO yarn.Client: Application report for application_1420997455428_0001 (state: ACCEPTED)
15/01/12 01:32:23 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hadoop
start time: 1420997542681
final status: UNDEFINED
tracking URL: http://hadoop001:8088/proxy/application_1420997455428_0001/
user: hadoop
15/01/12 01:32:24 INFO yarn.Client: Application report for application_1420997455428_0001 (state: ACCEPTED)
15/01/12 01:32:25 INFO yarn.Client: Application report for application_1420997455428_0001 (state: ACCEPTED)
15/01/12 01:32:26 INFO yarn.Client: Application report for application_1420997455428_0001 (state: ACCEPTED)
15/01/12 01:32:27 INFO yarn.Client: Application report for application_1420997455428_0001 (state: ACCEPTED)
15/01/12 01:32:28 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hadoop001, PROXY_URI_BASES -> http://hadoop001:8088/proxy/application_1420997455428_0001), /proxy/application_1420997455428_0001
15/01/12 01:32:28 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill.
15/01/12 01:32:28 INFO yarn.Client: Application report for application_1420997455428_0001 (state: RUNNING)
15/01/12 01:32:28 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.2.65
ApplicationMaster RPC port: -1
queue: root.hadoop
start time: 1420997542681
final status: UNDEFINED
tracking URL: http://hadoop001:8088/proxy/application_1420997455428_0001/
user: hadoop
15/01/12 01:32:28 INFO cluster.YarnClientSchedulerBackend: Application application_1420997455428_0001 has started running.
15/01/12 01:32:28 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51532.
15/01/12 01:32:28 INFO netty.NettyBlockTransferService: Server created on hadoop001:51532
15/01/12 01:32:28 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
15/01/12 01:32:28 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
15/01/12 01:32:28 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, hadoop001, 51532, None)
15/01/12 01:32:28 INFO storage.BlockManagerMasterEndpoint: Registering block manager hadoop001:51532 with 366.3 MB RAM, BlockManagerId(driver, hadoop001, 51532, None)
15/01/12 01:32:28 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, hadoop001, 51532, None)
15/01/12 01:32:28 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, hadoop001, 51532, None)
15/01/12 01:32:29 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /metrics/json.
15/01/12 01:32:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@16a3cc88{/metrics/json,null,AVAILABLE,@Spark}
15/01/12 01:32:32 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.2.65:50696) with ID 1
15/01/12 01:32:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager hadoop001:34329 with 366.3 MB RAM, BlockManagerId(1, hadoop001, 34329, None)
15/01/12 01:32:34 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.2.65:50699) with ID 2
15/01/12 01:32:34 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
15/01/12 01:32:34 INFO storage.BlockManagerMasterEndpoint: Registering block manager hadoop001:57341 with 366.3 MB RAM, BlockManagerId(2, hadoop001, 57341, None)
15/01/12 01:32:34 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 226.4 KB, free 366.1 MB)
15/01/12 01:32:34 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 21.4 KB, free 366.1 MB)
15/01/12 01:32:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop001:51532 (size: 21.4 KB, free: 366.3 MB)
15/01/12 01:32:34 INFO spark.SparkContext: Created broadcast 0 from textFile at LocalServeApp.scala:17
Exception in thread "main" java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:190)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:78)
at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:78)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:296)
at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:78)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:326)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:326)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:325)
at www.ruozedata.bigdata.SparkCore02.LocalServeApp$.main(LocalServeApp.scala:37)
at www.ruozedata.bigdata.SparkCore02.LocalServeApp.main(LocalServeApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 46 more
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:175)
at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
... 51 more
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128)
... 53 more
15/01/12 01:32:35 INFO spark.SparkContext: Invoking stop() from shutdown hook
15/01/12 01:32:35 INFO server.AbstractConnector: Stopped Spark@38453f9b{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
15/01/12 01:32:35 INFO ui.SparkUI: Stopped Spark web UI at http://hadoop001:4040
15/01/12 01:32:35 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
15/01/12 01:32:35 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
15/01/12 01:32:35 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
15/01/12 01:32:35 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
15/01/12 01:32:35 INFO cluster.YarnClientSchedulerBackend: Stopped
15/01/12 01:32:35 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/01/12 01:32:35 INFO memory.MemoryStore: MemoryStore cleared
15/01/12 01:32:35 INFO storage.BlockManager: BlockManager stopped
15/01/12 01:32:35 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
15/01/12 01:32:35 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
15/01/12 01:32:35 INFO spark.SparkContext: Successfully stopped SparkContext
15/01/12 01:32:35 INFO util.ShutdownHookManager: Shutdown hook called
15/01/12 01:32:35 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-e9fc1e0b-9e8c-4829-a6e5-c80cac6f1179
15/01/12 01:32:35 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-b2160c68-f14d-4e5e-bf01-f54add178cd4
原因是之前在尝试压缩的时候,把压缩格式设置成了com.hadoop.compression.lzo.LzoCodec,而spark本身不支持,需要把 hadoop-lzo-0.4.21-SNAPSHOT.jar 拷贝 到 sparkhome 下的jars 下面
解决如下:
15/01/12 01:32:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable(根据util.NativeCodeLoader类去源码搜索该类,在该类下面全局搜索Unable to load native-hadoop,找到对应的源码)
解决方法如下:
[hadoop@hadoop001 common]$ cp hadoop-lzo-0.4.21-SNAPSHOT.jar $SPARK_HOME/jars/
[hadoop@hadoop001 shell]$ ./log-yarn.sh
15/01/12 02:10:33 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library from the embedded binaries
15/01/12 02:10:33 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev f1deea9a313f4017dd5323cb8bbb3732c1aaccc5]
查看输出结果:
[hadoop@hadoop001 common]$ hdfs dfs -ls hdfs://hadoop001:9000/data/logs/output
Found 3 items
-rw-r--r-- 1 hadoop supergroup 0 2015-01-12 02:10 hdfs://hadoop001:9000/data/logs/output/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 154 2015-01-12 02:10 hdfs://hadoop001:9000/data/logs/output/part-00000.lzo
-rw-r--r-- 1 hadoop supergroup 42 2015-01-12 02:10 hdfs://hadoop001:9000/data/logs/output/part-00001.lzo