pyspark报错如下:
Caused by: java.net.SocketException: Connection reset by peer: socket write error
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:477)
at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala

当使用PySpark遇到'Connection reset by peer: socket write error'的问题时,可能是因为工作进程提前完成导致的。解决方法是在`worker.py`的`process`方法中添加代码,确保工作进程从执行程序读取所有数据,即使不实际使用。虽然这不是一个高效的长期解决方案,但可以避免数据写入异常。修改后的代码需要在`python/lib`下重新打包`pyspark.zip`。
最低0.47元/天 解锁文章
3914

被折叠的 条评论
为什么被折叠?



