spark-shell 运行报错 OutOfMemoryError

在运行Spark-Shell时遇到`OutOfMemoryError: unable to create new native thread`错误。错误源于用户线程数接近上限。解决方法包括查询和调整用户线程数限制,如使用`ps`和`ulimit`命令,并通过修改`/etc/profile`文件永久设置线程上限。

spark-shell 运行报错 OutOfMemoryError

错误信息:

java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1360)
at org.apache.spark.rpc.netty.Dispatcher$$anonfun$1.apply$mcVI$sp(Dispatcher.scala:198)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.rpc.netty.Dispatcher.<init>(Dispatcher.scala:197)
at org.apache.spark.rpc.netty.NettyRpcEnv.<init>(NettyRpcEnv.scala:54)
at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:447)
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:53)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:253)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
at $iwC$$iwC.<init>(<console>:15)
at $iwC.<init>(<console>:24)
at <init>(<console>:26)
at .<init>(<console>:30)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)


原因 :可能是你的用户线程数已经快要达到上限


解决方案:

1.先查询用户的当前线程数 (用你的用户名 替换 <loginName>) 

ps -u <loginName> -L | wc -l 


2.查询当前用户的线程上限

 ulimit -u 

3.设置用户线程上限(如设置成2048,按照你的需求设置)  临时

utimit -u 2048


4.永久设置线程上限

vim /etc/profile

在最后面加上

ulimit -u 2048

然后执行

source /etc/profile 

此时,再来查看线程上限:

 ulimit -u 

输出:

2048

即修改成功


至此,问题已解决





Processing anonymous's query[7af0b277-48fe-451e-9c55-d82c0a653a20]: PENDING_STATE -> RUNNING_STATE, statement: { "batchType:" SHELL, "batchName": shell, "resource": none, "className": none, "batchConf": { "kyuubi.batchConf.pod.kind" : "Pod", "kyuubi.batchConf.pod.ingress.path" : "", "kyuubi.batchConf.pod.image" : "swr.cn-north-4.myhuaweicloud.com/xiangmuguanli/kyuubi-engine-shell-centos7-linkis-version:v1.20240523164255_f1e20f8297e", "bigdata.monitor.project.name" : "工程项目管理系统", "kyuubi.batchConf.pod.tolerations" : "effect:NoSchedule,key:nodeType,operator:Equal,value:spark", "kyuubi.batchConf.pod.resource.limit.cpu" : "1", "kyuubi.batchConf.pod.resource.request.memory" : "1024Mi", "bigdata.monitor.operator.name" : "王鹏越(wangpengyue1@bcegc)", "kyuubi.batchConf.pod.rest-service.exposed.type" : "ClusterIP", "kyuubi.batchConf.pod.service.target.port" : "", "kyuubi.batchConf.pod.resource.limit.memory" : "1024Mi", "swan.dolphin.task.instance.id" : "10247980", "swan.task.id" : "11113036", "kyuubi.batchConf.pod.node-selector" : "nodeType:spark", "kyuubi.batchConf.pod.resource.request.cpu" : "0.1", "bigdata.monitor.task.name" : "未命名_1623000", "bigdata.monitor.flow.name" : "www", "kyuubi.batchConf.deployment.replicas" : "1", "kyuubi.batchConf.pod.ingress.host" : "", "kyuubi.batchConf.pod.extra.disk.volume" : "0", "bigdata.monitor.org.name" : "北京建工集团有限责任公司", "kyuubi.batchConf.pod.service.port" : "80", "kyuubi.client.ipAddress" : "10.1.0.70" }, "batchArgs": [ "--dolphin_process_id", "null", "--dolphin_task_instance_id", "10247980", "--dolphin_process_instance_id", "4710451", "--resources-token", "-9wTZh97N9NeTBMfcGx6lXEayMBQG33bMVtCmLPItc9EoowAoOr6E1mPIw-aMo8eRXF8xLR6jds7TfNfLt8rQQ", "--executionCode", "python3 ./Python资源包/模拟.py" ] } ------------------------------------------------------------------------------------------------------------------------ 25/12/10 15:29:02 WARN engine.KubernetesApplicationOperation: Get Tag: a7ad1b5e-abc2-4c03-b36b-d3f3e18a4b97 Driver Pod In Kubernetes size: 0, we expect 1 25/12/10 15:29:02 WARN engine.KubernetesApplicationOperation: Get Tag: a7ad1b5e-abc2-4c03-b36b-d3f3e18a4b97 Deployment In Kubernetes size: 0, we expect 1 25/12/10 15:29:02 INFO engine.ProcBuilder: service/kyuubi-shell-engine-a7ad1-202512101529 created 25/12/10 15:29:02 INFO engine.ProcBuilder: configmap/kyuubi-shell-engine-a7ad1-202512101529-conf created 25/12/10 15:29:02 INFO engine.ProcBuilder: pod/kyuubi-shell-engine-a7ad1-202512101529 created 25/12/10 15:29:02 INFO handler.ServerJdbcEventHandler: KyuubiOperationEvent taskId:11113036 25/12/10 15:29:02 INFO handler.ServerJdbcEventHandler: KyuubiOperationEvent engineType:2 25/12/10 15:29:03 INFO engine.ProcBuilder: logCaptureThread interrupted 25/12/10 15:29:03 INFO engine.ProcBuilder: Destroy the process, since waitCompletion is false. 25/12/10 15:29:03 INFO handler.ServerJdbcEventHandler: KyuubiOperationEvent taskId:11113036 25/12/10 15:29:03 INFO handler.ServerJdbcEventHandler: KyuubiOperationEvent engineType:2 25/12/10 15:29:03 INFO handler.ServerJdbcEventHandler: kyuubiSessionEvent engineId: 8e2eb462-1e07-4788-9e3a-84f4a9180ceb 25/12/10 15:29:03 INFO operation.BatchJobSubmission: Monitoring submitted SHELL taskId:11113036 batch[a7ad1b5e-abc2-4c03-b36b-d3f3e18a4b97] job: 8e2eb462-1e07-4788-9e3a-84f4a9180ceb 25/12/10 15:29:13 INFO operation.BatchJobSubmission: Batch report for taskId:11113036 a7ad1b5e-abc2-4c03-b36b-d3f3e18a4b97, Some(ApplicationInfo(8e2eb462-1e07-4788-9e3a-84f4a9180ceb,kyuubi-shell-engine-a7ad1-202512101529,FAILED,None,Some(container:0 terminatedStatus:ContainerStateTerminated(containerID=docker://9685354ef685e085a4fb5cae30caf35c5418524b4ff493a604f07fa460d6593e, exitCode=2, finishedAt=2025-12-10T07:29:11Z, message=null, reason=Error, signal=null, startedAt=2025-12-10T07:29:10Z, additionalProperties={})),None,None,None)) 25/12/10 15:29:13 INFO operation.BatchJobSubmission: applicationTerminated Batch report for taskId:11113036 a7ad1b5e-abc2-4c03-b36b-d3f3e18a4b97, Some(ApplicationInfo(8e2eb462-1e07-4788-9e3a-84f4a9180ceb,kyuubi-shell-engine-a7ad1-202512101529,FAILED,None,Some(container:0 terminatedStatus:ContainerStateTerminated(containerID=docker://9685354ef685e085a4fb5cae30caf35c5418524b4ff493a604f07fa460d6593e, exitCode=2, finishedAt=2025-12-10T07:29:11Z, message=null, reason=Error, signal=null, startedAt=2025-12-10T07:29:10Z, additionalProperties={})),None,None,None)) 25/12/10 15:29:13 ERROR operation.BatchJobSubmission: SHELL taskId:11113036 batch[a7ad1b5e-abc2-4c03-b36b-d3f3e18a4b97] job failed: Some(ApplicationInfo(8e2eb462-1e07-4788-9e3a-84f4a9180ceb,kyuubi-shell-engine-a7ad1-202512101529,FAILED,None,Some(container:0 terminatedStatus:ContainerStateTerminated(containerID=docker://9685354ef685e085a4fb5cae30caf35c5418524b4ff493a604f07fa460d6593e, exitCode=2, finishedAt=2025-12-10T07:29:11Z, message=null, reason=Error, signal=null, startedAt=2025-12-10T07:29:10Z, additionalProperties={})),None,None,None)) 25/12/10 15:29:13 ERROR operation.BatchJobSubmission: KyuubiOperationError, OperationClass: BatchJobSubmission java.lang.RuntimeException: SHELL batch[a7ad1b5e-abc2-4c03-b36b-d3f3e18a4b97] job failed: Some(ApplicationInfo(8e2eb462-1e07-4788-9e3a-84f4a9180ceb,kyuubi-shell-engine-a7ad1-202512101529,FAILED,None,Some(container:0 terminatedStatus:ContainerStateTerminated(containerID=docker://9685354ef685e085a4fb5cae30caf35c5418524b4ff493a604f07fa460d6593e, exitCode=2, finishedAt=2025-12-10T07:29:11Z, message=null, reason=Error, signal=null, startedAt=2025-12-10T07:29:10Z, additionalProperties={})),None,None,None)) at org.apache.kyuubi.operation.BatchJobSubmission.$anonfun$monitorBatchJob$1(BatchJobSubmission.scala:524) ~[kyuubi-server_2.12-1.6.0-incubating-gdmp-0.2.jar:1.6.0-incubating-gdmp-0.2] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_422] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_422] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_422] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_422] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_422] 25/12/10 15:29:13 INFO operation.BatchJobSubmission: ------------------------------------------------------------------------------------------------------------------------ Processing anonymous's query[7af0b277-48fe-451e-9c55-d82c0a653a20]: RUNNING_STATE -> ERROR_STATE, time taken: 10.812 seconds, statement: { "batchType:" SHELL, "batchName": shell, "resource": none, "className": none, "batchConf": { "kyuubi.batchConf.pod.kind" : "Pod", "kyuubi.batchConf.pod.ingress.path" : "", "kyuubi.batchConf.pod.image" : "swr.cn-north-4.myhuaweicloud.com/xiangmuguanli/kyuubi-engine-shell-centos7-linkis-version:v1.20240523164255_f1e20f8297e", "bigdata.monitor.project.name" : "工程项目管理系统", "kyuubi.batchConf.pod.tolerations" : "effect:NoSchedule,key:nodeType,operator:Equal,value:spark", "kyuubi.batchConf.pod.resource.limit.cpu" : "1", "kyuubi.batchConf.pod.resource.request.memory" : "1024Mi", "bigdata.monitor.operator.name" : "王鹏越(wangpengyue1@bcegc)", "kyuubi.batchConf.pod.rest-service.exposed.type" : "ClusterIP", "kyuubi.batchConf.pod.service.target.port" : "", "kyuubi.batchConf.pod.resource.limit.memory" : "1024Mi", "swan.dolphin.task.instance.id" : "10247980", "swan.task.id" : "11113036", "kyuubi.batchConf.pod.node-selector" : "nodeType:spark", "kyuubi.batchConf.pod.resource.request.cpu" : "0.1", "bigdata.monitor.task.name" : "未命名_1623000", "bigdata.monitor.flow.name" : "www", "kyuubi.batchConf.deployment.replicas" : "1", "kyuubi.batchConf.pod.ingress.host" : "", "kyuubi.batchConf.pod.extra.disk.volume" : "0", "bigdata.monitor.org.name" : "北京建工集团有限责任公司", "kyuubi.batchConf.pod.service.port" : "80", "kyuubi.client.ipAddress" : "10.1.0.70" }, "batchArgs": [ "--dolphin_process_id", "null", "--dolphin_task_instance_id", "10247980", "--dolphin_process_instance_id", "4710451", "--resources-token", "-9wTZh97N9NeTBMfcGx6lXEayMBQG33bMVtCmLPItc9EoowAoOr6E1mPIw-aMo8eRXF8xLR6jds7TfNfLt8rQQ", "--executionCode", "python3 ./Python资源包/模拟.py" ] }解释报错 给出解决方案
12-11
client token: Token { kind: YARN_CLIENT_TOKEN, service: } diagnostics: Max number of executor failures (30) reached ApplicationMaster host: emr-worker-4.cluster-239501 ApplicationMaster RPC port: 33883 queue: root.lakehouse_bi.sal start time: 1766655959156 final status: FAILED tracking URL: http://emr-header-1.cluster-239501:20888/proxy/application_1736915591771_394319/ user: cdp2stg_bi [2025-12-25, 17:52:44 CST] {ssh.py:477} WARNING - 25/12/25 17:52:44 ERROR [main] Client: Application diagnostics message: Max number of executor failures (30) reached [2025-12-25, 17:52:44 CST] {ssh.py:477} WARNING - Exception in thread "main" [2025-12-25, 17:52:44 CST] {ssh.py:477} WARNING - org.apache.spark.SparkException: Application application_1736915591771_394319 finished with failed status [2025-12-25, 17:52:44 CST] {ssh.py:477} WARNING - at org.apache.spark.deploy.yarn.Client.run(Client.scala:1242) [2025-12-25, 17:52:44 CST] {ssh.py:477} WARNING - at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1634) [2025-12-25, 17:52:44 CST] {ssh.py:477} WARNING - at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) [2025-12-25, 17:52:44 CST] {ssh.py:477} WARNING - at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [2025-12-25, 17:52:44 CST] {ssh.py:477} WARNING - [2025-12-25, 17:52:44 CST] {ssh.py:477} WARNING - 25/12/25 17:52:44 INFO [shutdown-hook-0] ShutdownHookManager: Shutdown hook called [2025-12-25, 17:52:44 CST] {ssh.py:477} WARNING - 25/12/25 17:52:44 INFO [shutdown-hook-0] ShutdownHookManager: Deleting directory /tmp/spark-db64ee83-4681-45a1-80cf-e1cb4d2fa8fe [2025-12-25, 17:52:44 CST] {ssh.py:477} WARNING - 25/12/25 17:52:44 INFO [shutdown-hook-0] ShutdownHookManager: Deleting directory /tmp/spark-c25f2af3-6027-44fd-aed6-d1f221a36564 [2025-12-25, 17:52:44 CST] {taskinstance.py:1718} ERROR - Task failed with exception Traceback (most recent call last): File "/airflow/miniconda3/lib/python3.8/site-packages/airflow/providers/ssh/operators/ssh.py", line 173, in execute result = self.run_ssh_client_command(ssh_client, self.command) File "/airflow/miniconda3/lib/python3.8/site-packages/airflow/providers/ssh/operators/ssh.py", line 160, in run_ssh_client_command self.raise_for_status(exit_status, agg_stderr) File "/airflow/miniconda3/lib/python3.8/site-packages/airflow/providers/ssh/operators/ssh.py", line 153, in raise_for_status raise AirflowException(f"error running cmd: {self.command}, error: {error_msg}") airflow.exceptions.AirflowException: error running cmd: sh /mnt/disk1/bi/bi_dwd_modules/submit-shell/sh/submit_table.sh --tablename=dwc_fact_sal_ncs_leads_sales_t -p 20251224 -f 20251225, error: 25/12/25 17:45:44 WARN [main] NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 25/12/25 17:45:45 INFO [main] Client: Requesting a new application from cluster with 4 NodeManagers 25/12/25 17:45:45 WARN [main] DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 25/12/25 17:45:45 INFO [main] Configuration: resource-types.xml not found 25/12/25 17:45:45 INFO [main] ResourceUtils: Unable to find 'resource-types.xml'. 25/12/25 17:45:45 INFO [main] Client: Verifying our application has not requested more than the maximum memory capability of the cluster (60000 MB per container) 25/12/25 17:45:45 INFO [main] Client: Will allocate AM container, with 6144 MB memory including 2048 MB overhead 25/12/25 17:45:45 INFO [main] Client: Setting up container launch context for our AM 25/12/25 17:45:45 INFO [main] Client: Setting up the launch environment for our AM container 25/12/25 17:45:45 INFO [main] Client: Preparing resources for our AM container 25/12/25 17:45:45 WARN [main] Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
最新发布
12-26
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值