【spark】due to AM Container for x exited with exitCode: -1000.which is below the minimum allowed 1000

在开发一个Spark处理Kudu的程序时,通过spark-submit提交到YARN集群出现错误,AM Container退出码为-1000,低于最小允许的1000。解决方法是检查并修改YARN配置中的`min.user.id`参数,将其值设置为500,然后重启YARN服务,最终成功运行Spark程序并完成Kudu操作。

        (上方报错由于字数限制,显示不全,具体报错搜索下面)

        这次,在实际开发测试过程中,开发一个spark操作kudu的程序。打包完成后,上传到集群上,使用spark-submit提交程序,发生报错。

        提交程序的命令如下:

 spark-submit \
 --master yarn \
 --deploy-mode cluster \
 --queue spark \
 --driver-cores 1 \
 --driver-memory 1G \
 --executor-cores 1 \
 --executor-memory 1G \
 --num-executors 1 \
 --class kudu.spark.SparkKuduTest \
 Kudu-1.0-SNAPSHOT-jar-with-dependencies.jar

        运行报错如下:

[root@cdh01 ~]# spark-submit --master yarn --deploy-mode client --queue spark --driver-cores 1 --driver-memory 1G --executor-cores 1 --executor-memory 1G --num-executors 1 --class kudu.spark.SparkKuduTest Kudu-1.0-SNAPSHOT-jar-with-dependencies.jar
23/05/03 20:30:33 INFO security.UserGroupInformation: Login successful for user spark/spark@HADOOP.COM using keytab file /etc/security/keytab/spark.keytab. Keytab auto renewal enabled : false
23/05/03 20:30:33 INFO spark.SparkContext: Running Spark version 2.4.0-cdh6.3.2
23/05/03 20:30:33 INFO logging.DriverLogger: Added a local log appender at: /tmp/spark-6d3361b1-8525-4274-bca3-88c0d8e70dac/__driver_logs__/driver.log
23/05/03 20:30:33 INFO spark.SparkContext: Submitted application: SparkKudu
23/05/03 20:30:33 INFO spark.SecurityManager: Changing view acls to: root,spark
23/05/03 20:30:33 INFO spark.SecurityManager: Changing modify acls to: root,spark
23/05/03 20:30:33 INFO spark.SecurityManager: Changing view acls groups to: 
23/05/03 20:30:33 INFO spark.SecurityManager: Changing modify acls groups to: 
23/05/03 20:30:33 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root, spark); groups with view permissions: Set(); users  with modify permissions: Set(root, spark); groups with modify permissions: Set()
23/05/03 20:30:33 INFO util.Utils: Successfully started service 'sparkDriver' on port 45095.
23/05/03 20:30:33 INFO spark.SparkEnv: Registering MapOutputTracker
23/05/03 20:30:33 INFO spark.SparkEnv: Registering BlockManagerMaster
23/05/03 20:30:33 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/05/03 20:30:33 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/05/03 20:30:33 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-59fc6e85-710a-49d9-b50f-55959dc5b217
23/05/03 20:30:33 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
23/05/03 20:30:33 INFO spark.SparkEnv: Registering OutputCommitCoordinator
23/05/03 20:30:33 INFO util.log: Logging initialized @1686ms
23/05/03 20:30:33 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: 2018-09-05T05:11:46+08:00, git hash: 3ce520221d0240229c862b122d2b06c12a625732
23/05/03 20:30:33 INFO server.Server: Started @1747ms
23/05/03 20:30:33 INFO server.AbstractConnector: Started ServerConnector@232024b9{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
23/05/03 20:30:33 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6aa648b9{/jobs,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f6bbeb0{/jobs/json,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1b1637e1{/jobs/job,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@64711bf2{/jobs/job/json,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@169da7f2{/stages,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3c1e23ff{/stages/json,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@ceb4bd2{/stages/stage,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@56bc3fac{/stages/stage/json,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@df4b72{/stages/pool,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2ba45490{/stages/pool/json,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@37ff4054{/storage,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@894858{/storage/json,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7af707e0{/storage/rdd,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@737edcfa{/storage/rdd/json,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3ecedf21{/environment,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@74cf8b28{/environment/json,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@36c54a56{/executors,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3359c978{/executors/json,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7ba63fe5{/executors/threadDump,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73386d72{/executors/threadDump/json,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6f330eb9{/static,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@79c5636f{/,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1929425f{/api,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2617f816{/jobs/job/kill,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@676f0a60{/stages/stage/kill,null,AVAILABLE,@Spark}
23/05/03 20:30:33 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://cdh01.shokod.com:4040
23/05/03 20:30:33 INFO spark.SparkContext: Added JAR file:/root/Kudu-1.0-SNAPSHOT-jar-with-dependencies.jar at spark://cdh01.shokod.com:45095/jars/Kudu-1.0-SNAPSHOT-jar-with-dependencies.jar with timestamp 1683117033813
23/05/03 20:30:33 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
23/05/03 20:30:33 INFO conf.HiveConf: Found configuration file file:/etc/hive/conf.cloudera.hive/hive-site.xml
23/05/03 20:30:33 INFO security.YARNHadoopDelegationTokenManager: Attempting to load user's ticket cache.
23/05/03 20:30:35 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
23/05/03 20:30:35 INFO security.HadoopFSDelegationTokenProvider: getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-144794697_1, ugi=spark/spark@HADOOP.COM (auth:KERBEROS)]] with renewer yarn/cdh02.shokod.com@HADOOP.COM
23/05/03 20:30:35 INFO hdfs.DFSClient: Created token for spark: HDFS_DELEGATION_TOKEN owner=spark/spark@HADOOP.COM, renewer=yarn, realUser=, issueDate=1683117035838, maxDate=1683721835838, sequenceNumber=21, masterKeyId=46 on ha-hdfs:nameserviceHA
23/05/03 20:30:35 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
23/05/03 20:30:35 INFO security.HadoopFSDelegationTokenProvider: getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-144794697_1, ugi=spark/spark@HADOOP.COM (auth:KERBEROS)]] with renewer spark/spark@HADOOP.COM
23/05/03 20:30:35 INFO hdfs.DFSClient: Created token for spark: HDFS_DELEGATION_TOKEN owner=spark/spark@HADOOP.COM, renewer=spark, realUser=, issueDate=1683117035867, maxDate=1683721835867, sequenceNumber=22, masterKeyId=46 on ha-hdfs:nameserviceHA
23/05/03 20:30:35 INFO security.HadoopFSDelegationTokenProvider: Renewal interval is 86400035 for token HDFS_DELEGATION_TOKEN
23/05/03 20:30:36 INFO deploy.SparkHadoopUtil: Updating delegation tokens for current user.
23/05/03 20:30:36 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm43
23/05/03 20:30:36 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers
23/05/03 20:30:36 INFO conf.Configuration: resource-types.xml not found
23/05/03 20:30:36 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
23/05/03 20:30:36 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (4096 MB per container)
23/05/03 20:30:36 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
23/05/03 20:30:36 INFO yarn.Client: Setting up container launch context for our AM
23/05/03 20:30:36 INFO yarn.Client: Setting up the launch environment for our AM container
23/05/03 20:30:36 INFO yarn.Client: Preparing resources for our AM container
23/05/03 20:30:36 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
23/05/03 20:30:36 INFO yarn.Client: Uploading resource file:/tmp/spark-6d3361b1-8525-4274-bca3-88c0d8e70dac/__spark_conf__6944142613266959994.zip -> hdfs://nameserviceHA/user/spark/.sparkStaging/application_1683105646105_0007/__spark_conf__.zip
23/05/03 20:30:36 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
23/05/03 20:30:36 INFO spark.SecurityManager: Changing view acls to: root,spark
23/05/03 20:30:36 INFO spark.SecurityManager: Changing modify acls to: root,spark
23/05/03 20:30:36 INFO spark.SecurityManager: Changing view acls groups to: 
23/05/03 20:30:36 INFO spark.SecurityManager: Changing modify acls groups to: 
23/05/03 20:30:36 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root, spark); groups with view permissions: Set(); users  with modify permissions: Set(root, spark); groups with modify permissions: Set()
23/05/03 20:30:36 INFO yarn.Client: Submitting application application_1683105646105_0007 to ResourceManager
23/05/03 20:30:36 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
23/05/03 20:30:37 INFO impl.YarnClientImpl: Submitted application application_1683105646105_0007
23/05/03 20:30:37 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
23/05/03 20:30:38 INFO yarn.Client: Application report for application_1683105646105_0007 (state: ACCEPTED)
23/05/03 20:30:38 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: [Wed May 03 20:30:37 +0800 2023] Application is Activated, waiting for resources to be assigned for AM.  Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource = <memory:9216, vCores:12> ; Queue's Absolute capacity = 40.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 100.0 % ; 
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: spark
	 start time: 1683117036782
	 final status: UNDEFINED
	 tracking URL: http://cdh03.shokod.com:8088/proxy/application_1683105646105_0007/
	 user: spark
23/05/03 20:30:38 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
23/05/03 20:30:39 INFO yarn.Client: Application report for application_1683105646105_0007 (state: FAILED)
23/05/03 20:30:39 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: Application application_1683105646105_0007 failed 2 times due to AM Container for appattempt_1683105646105_0007_000002 exited with  exitCode: -1000
Failing this attempt.Diagnostics: [2023-05-03 20:30:38.488]Application application_1683105646105_0007 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is spark
main : requested yarn user is spark
Requested user spark is not whitelisted and has id 978,which is below the minimum allowed 1000

For more detailed output, check the application tracking page: http://cdh03.shokod.com:8088/cluster/app/application_1683105646105_0007 Then click on links to logs of each attempt.
. Failing the application.
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: spark
	 start time: 1683117036782
	 final status: FAILED
	 tracking URL: http://cdh03.shokod.com:8088/cluster/app/application_1683105646105_0007
	 user: spark
23/05/03 20:30:39 INFO yarn.Client: Deleted staging directory hdfs://nameserviceHA/user/spark/.sparkStaging/application_1683105646105_0007
23/05/03 20:30:39 ERROR cluster.YarnClientSchedulerBackend: The YARN application has already ended! It might have been killed or the Application Master may have failed to start. Check the YARN application logs for more details.
23/05/03 20:30:39 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Application application_1683105646105_0007 failed 2 times due to AM Container for appattempt_1683105646105_0007_000002 exited with  exitCode: -1000
Failing this attempt.Diagnostics: [2023-05-03 20:30:38.488]Application application_1683105646105_0007 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is spark
main : requested yarn user is spark
Requested user spark is not whitelisted and has id 978,which is below the minimum allowed 1000

For more detailed output, check the application tracking page: http://cdh03.shokod.com:8088/cluster/app/application_1683105646105_0007 Then click on links to logs of each attempt.
. Failing the application.
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:95)
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:186)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:511)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2549)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:944)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:935)
	at kudu.spark.SparkKuduTest$.createTable(SparkKuduTest.scala:41)
	at kudu.spark.SparkKuduTest$.main(SparkKuduTest.scala:20)
	at kudu.spark.SparkKuduTest.main(SparkKuduTest.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
23/05/03 20:30:39 INFO server.AbstractConnector: Stopped Spark@232024b9{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
23/05/03 20:30:39 INFO ui.SparkUI: Stopped Spark web UI at http://cdh01.shokod.com:4040
23/05/03 20:30:39 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
23/05/03 20:30:39 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
23/05/03 20:30:39 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
23/05/03 20:30:39 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
23/05/03 20:30:39 INFO cluster.YarnClientSchedulerBackend: Stopped
23/05/03 20:30:39 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/05/03 20:30:39 INFO memory.MemoryStore: MemoryStore cleared
23/05/03 20:30:39 INFO storage.BlockManager: BlockManager stopped
23/05/03 20:30:39 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
23/05/03 20:30:39 WARN metrics.MetricsSystem: Stopping a MetricsSystem that is not running
23/05/03 20:30:39 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
23/05/03 20:30:45 INFO spark.SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: Application application_1683105646105_0007 failed 2 times due to AM Container for appattempt_1683105646105_0007_000002 exited with  exitCode: -1000
Failing this attempt.Diagnostics: [2023-05-03 20:30:38.488]Application application_1683105646105_0007 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is spark
main : requested yarn user is spark
Requested user spark is not whitelisted and has id 978,which is below the minimum allowed 1000

For more detailed output, check the application tracking page: http://cdh03.shokod.com:8088/cluster/app/application_1683105646105_0007 Then click on links to logs of each attempt.
. Failing the application.
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:95)
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:186)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:511)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2549)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:944)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:935)
	at kudu.spark.SparkKuduTest$.createTable(SparkKuduTest.scala:41)
	at kudu.spark.SparkKuduTest$.main(SparkKuduTest.scala:20)
	at kudu.spark.SparkKuduTest.main(SparkKuduTest.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
23/05/03 20:30:45 INFO util.ShutdownHookManager: Shutdown hook called
23/05/03 20:30:45 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-6d3361b1-8525-4274-bca3-88c0d8e70dac
23/05/03 20:30:45 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-b12e58dd-27a8-4a2c-9958-b39f9044baad

        查看具体的报错日志:

Application application_1683105646105_0007 failed 2 times due to AM Container for appattempt_1683105646105_0007_000002 exited with exitCode: -1000

Failing this attempt.Diagnostics: [2023-05-03 20:30:38.488]Application application_1683105646105_0007 initialization failed (exitCode=255) with output: main : command provided 0

main : run as user is spark

main : requested yarn user is spark

Requested user spark is not whitelisted and has id 978,which is below the minimum allowed 1000

For more detailed output, check the application tracking page: http://cdh03.shokod.com:8088/cluster/app/application_1683105646105_0007 Then click on links to logs of each attempt.

. Failing the application.

        从上面可以看到,用户的id是低于最低1000限制。 所以,我猜测是yarn集群的配置出现了问题,在YARN的配置中查找关于“id”的相关配置。找到最小用户ID“min.user.id”(默认是1000)比较符合,修改阈值为500。重启YARN服务

 

        集群上重新运行程序,运行成功:

        自身也在kudu上验证没有问题!问题完美解决了!

 

<think>好的,我需要解决用户提出的Tez任务执行失败,错误代码为-1000的问题。首先,我需要回忆或查找关于这个特定错误代码的可能原因和解决方案。根据经验,Tez的exit code通常与资源管理、内存不足或依赖冲突有关。 首先,用户提到exitCode: -1000,这可能涉及到容器被YARN杀死的情况。常见的YARN容器退出原因包括内存不足(OOM)或资源超限。因此,我需要检查应用程序的日志,特别是Container日志,查找是否有内存相关的错误信息,比如“Container killed by YARN for exceeding memory limits”。如果是这种情况,需要调整内存配置,如增加mapreduce.map.memory.mb和mapreduce.reduce.memory.mb,并确保yarn.scheduler.maximum-allocation-mb足够大。 其次,用户环境中的依赖冲突也可能导致此类错误。例如,不同版本的库(如Guava、Protobuf)可能在Tez和Hadoop之间产生冲突。这时候需要检查类路径,排除重复或冲突的依赖项,使用hadoop classpath命令确认依赖顺序,或者使用隔离类加载器。 另外,用户可能需要检查JVM参数,特别是堆大小设置是否合理,避免堆内存设置过大导致物理内存超限。同时,检查是否有磁盘空间不足的问题,因为临时文件无法写入也可能导致任务失败。还需要确认Tez和Hadoop的版本兼容性,以及配置文件中的参数是否正确。 在提供解决方案时,需要分步骤说明,建议用户查看具体的日志信息,根据不同的原因采取相应的措施。同时,生成相关问题,帮助用户进一步排查其他潜在问题,例如如何分析Tez任务日志,或如何优化资源配置。</think>### Tez任务执行失败(exitCode: -1000)解决方案分析 Tez任务失败报错`exitCode: -1000`通常与**资源分配**或**依赖冲突**相关,以下是具体排查步骤和解决方法: --- #### 一、检查容器内存限制(最常见原因) 1. **现象**:YARN因内存超限杀死容器 日志中会出现类似`Container killed by YARN for exceeding memory limits`的提示[^1]。 2. **解决方案**: - **调整内存参数**: ```xml <!-- mapred-site.xml --> <property> <name>mapreduce.map.memory.mb</name> <value>2048</value> <!-- 根据需求调整 --> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>4096</value> </property> ``` - **同步调整YARN配置**: ```xml <!-- yarn-site.xml --> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>16384</value> <!-- 必须大于任务需求 --> </property> ``` --- #### 二、排查依赖冲突 1. **现象**:类加载冲突导致任务崩溃 错误日志可能包含`NoSuchMethodError`或`ClassNotFoundException`。 2. **解决方案**: - **检查依赖树**: ```bash hadoop classpath # 查看Hadoop类路径顺序 ``` - **排除冲突包**(如Guava、Protobuf): ```xml <!-- pom.xml示例 --> <dependency> <groupId>org.apache.tez</groupId> <artifactId>tez-api</artifactId> <exclusions> <exclusion> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> </exclusion> </exclusions> </dependency> ``` - **启用类隔离**: ```xml <property> <name>tez.use.cluster-hadoop-libs</name> <value>true</value> </property> ``` --- #### 三、其他可能原因 1. **JVM堆内存溢出** 调整`mapreduce.map.java.opts`和`mapreduce.reduce.java.opts`,确保堆大小小于容器内存: ```xml <property> <name>mapreduce.map.java.opts</name> <value>-Xmx1536m</value> <!-- 需小于mapreduce.map.memory.mb --> </property> ``` 2. **磁盘空间不足** 检查`yarn.nodemanager.local-dirs`指向的磁盘分区剩余空间。 3. **版本兼容性问题** 确认Tez与Hadoop版本匹配(如Tez 0.10.x需搭配Hadoop 3.x)。 --- #### 关键日志定位步骤 1. 访问YARN Web UI找到失败的任务Attempt ID 2. 查看对应节点的`stderr`日志: ```bash grep -A 50 "ExitCode: -1000" /var/log/hadoop-yarn/containers/application_*/container_*/stderr ``` ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值