ERROR PythonRunner: Python worker exited unexpectedly (crashed)解决方法

本文讲述了在PyCharm中遇到Pythonworker意外退出错误,通过实例分析发现是由于数据量大导致的问题。解决方案是关闭并重启PyCharm后重新运行。

      前段时间收到粉丝的私信,在pycharm中运行时报错了ERROR PythonRunner: Python worker exited unexpectedly (crashed)

        测试运行print(input_rdd.first())可以打印出来,但是print(input_rdd.count())触发函数就会报错

print(input_rdd.count())

      ERROR PythonRunner: Python worker exited unexpectedly (crashed)的意思是Python worker意外退出(崩溃)  

21/10/24 10:24:48 ERROR PythonRunner: Python worker exited unexpectedly (crashed)
java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)
21/10/24 10:24:48 ERROR PythonRunner: This may have been caused by a prior exception:
java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)
21/10/24 10:24:48 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)
21/10/24 10:24:48 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (LAPTOP-RK2V2UMB executor driver): java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)

21/10/24 10:24:48 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
  File "D:/Code/pycode/exercise/pyspark-study/pyspark-learning/pyspark-day04/main/01_web_analysis.py", line 28, in <module>
    print(input_rdd.first())
  File "D:\opt\Anaconda3-2020.11\lib\site-packages\pyspark\rdd.py", line 1586, in first
    rs = self.take(1)
  File "D:\opt\Anaconda3-2020.11\lib\site-packages\pyspark\rdd.py", line 1566, in take
    res = self.context.runJob(self, takeUpToNumLeft, p)
  File "D:\opt\Anaconda3-2020.11\lib\site-packages\pyspark\context.py", line 1233, in runJob
    sock_info = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)
  File "D:\opt\Anaconda3-2020.11\lib\site-packages\py4j\java_gateway.py", line 1304, in __call__
    return_value = get_return_value(
  File "D:\opt\Anaconda3-2020.11\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (LAPTOP-RK2V2UMB executor driver): java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2258)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2207)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2206)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2206)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1079)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1079)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1079)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2445)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2387)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2376)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2217)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236)
	at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:166)
	at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)


Process finished with exit code 1

        对于解决这个问题的方法,小编网上查询了一下,出现这种问题可能是很多情况导致的。就目前小编帮助解决的这个情况,在windows系统本地运行的spark,是软件的问题,数据量有点大,在pycharm上运行可能会报错。

       废话不多说,下面就谈一下粉丝这个问题的解决方法吧,很简单,就是pycharm关闭后,重新打开,再运行一下即可。注意,如果还不行,再次关闭重新运行。

RocketMQ 中,**分片(Sharding)** 是实现高并发、高可用和可展性的重要机制之一。其核心作用是将消息的存储和消费进行分布式管理,以支持大规模消息处理场景。 ### 分片的作用 1. **提升系统吞吐量** 通过将一个 Topic 的消息分布到多个 Broker 上,每个 Broker 负责一部分消息的存储和转发,从而实现横向展,提高整体系统的吞吐能力。 2. **支持负载均衡** 在消息生产与消费过程中,分片机制使得消息可以均匀分布在多个 Broker 上,生产者和消费者可以并行地处理多个分片,实现负载均衡[^5]。 3. **增强系统可用性与容错性** 每个分片可以配置主从结构(Master-Slave),实现数据复制与故障切换,确保在某个 Broker 故障时仍能保证消息的高可用[^4]。 ### 分片的工作机制 1. **Topic 与 Message Queue 的关系** 在 RocketMQ 中,每个 Topic 会被划分为多个 **Message Queue**(也称为队列或分片),这些队列分布在不同的 Broker 上。例如,一个 Topic 可能有 4 个队列,分别分布在两个 Broker 上,每个 Broker 管理两个队列。 2. **生产者的分片选择** 当生产者发送消息时,会根据一定的策略(如轮询、哈希等)选择一个合适的 Message Queue 进行投递。这一过程称为**生产者负载均衡**。生产者会定期从 NameServer 获取 Topic 的队列分布信息,以保证选择的准确性[^5]。 3. **消费者的分片分配** 消费者组(ConsumerGroup)中的每个消费者实例会负责一部分 Message Queue 的消费任务。这一过程称为**消费者负载均衡**,由 Broker 协调完成,确保每个队列只被一个消费者实例消费,从而避免重复消费和竞争问题。 4. **消息的物理存储** RocketMQ 将所有消息写入统一的 **CommitLog** 文件中,然后通过 **ConsumeQueue** 文件记录每个 Topic 的分片索引信息,实现逻辑分片与物理存储的分离。这种机制保证了写入的高效性和读取的灵活性[^4]。 ### 分片配置与管理 - **创建 Topic 时指定分片数量** 在创建 Topic 时,可以通过命令行或配置文件指定其分片数量(即 Message Queue 数量)。 - **动态容** 可以在不中断服务的情况下,向集群中新增 Broker,并为已有 Topic 增加分片,以应对不断增长的消息量。 ### 示例代码:查看 Topic 分片信息 ```bash # 查看 Topic 的队列分布信息 mqadmin topicRoute -n localhost:9876 -t MyTopic ``` 该命令将输出 Topic `MyTopic` 的路由信息,包括各个 Message Queue 所在的 Broker 地址。 ---
评论 9
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值