_pickle.PicklingError: Could not serialize object: Exception:

1、详细报错:_pickle.PicklingError: Could not serialize object: Exception: It appears that you are attempting to broadcast an RDD or reference an RDD from an action or transformation. RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(lambda x: rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.

2、错误原因:pyspark中的RDD转换操作使用错误

3、错误代码如下:

from pyspark import SparkConf, SparkContext

conf = SparkConf().setMaster("local[*]").setAppName("应用名")
sc = SparkContext(conf=conf)
# 简单回顾下针对RDD的转换操作:
RDD = sc.parallelize(
    ['HDFS', 'YARN', 'MapReduce', 'Hive', 'Pig', 'Mahout', 'HBase', 'Sqoop', 'Flume', 'ZooKeeper', 'Ambari'])
print(RDD.distinct(sc.parallelize(['HDFS', 'MapReduce'])).collect())

4、校正后代码如下:

from pyspark import SparkConf, SparkContext

conf = SparkConf().setMaster("local[*]").setAppName("应用名")
sc = SparkContext(conf=conf)
# 简单回顾下针对RDD的转换操作:
# RDD名.distinct();RDD名.sample(True/False, 比例值, 种子);RDD名.intersection(RDD名2);RDD名.subtract(RDD名2)
RDD = sc.parallelize(
    ['HDFS', 'YARN', 'MapReduce', 'Hive', 'Pig', 'Mahout', 'HBase', 'Sqoop', 'Flume', 'ZooKeeper', 'Ambari'])
print(RDD.distinct().collect())

5、总结:pyspark中的distinct转换操作只针对一个RDD,代码模板为 RDD名.distinct() 效果为去除名为‘RDD名’的RDD中的重复值,莫要与subtract转换操作混淆,我便是因为混淆导致出错

/home/finance/.conda/envs/py311/lib/python3.11/site-packages/distributed/node.py:187: UserWarning: Port 8787 is already in use. Perhaps you already have a cluster running? Hosting the HTTP server on port 42061 instead warnings.warn( 2025-09-08 23:20:40,735 - distributed.protocol.pickle - ERROR - Failed to serialize LLGExpr(dsk={'train_model-89e27dbb7b722e347e66a754d8d125f0': <Task 'train_model-89e27dbb7b722e347e66a754d8d125f0' train_model(...)>}). Traceback (most recent call last): File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 73, in dumps result = cloudpickle.dumps(x, **dump_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1537, in dumps cp.dump(obj) File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1303, in dump return super().dump(obj) ^^^^^^^^^^^^^^^^^ TypeError: cannot pickle '_asyncio.Task' object During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 77, in dumps result = cloudpickle.dumps(x, **dump_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1537, in dumps cp.dump(obj) File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1303, in dump return super().dump(obj) ^^^^^^^^^^^^^^^^^ TypeError: cannot pickle '_asyncio.Task' object --------------------------------------------------------------------------- TypeError Traceback (most recent call last) File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/pickle.py:73, in dumps(x, buffer_callback, protocol) 72 buffers.clear() ---> 73 result = cloudpickle.dumps(x, **dump_kwargs) 74 except Exception: File ~/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py:1537, in dumps(obj, protocol, buffer_callback) 1536 cp = Pickler(file, protocol=protocol, buffer_callback=buffer_callback) -> 1537 cp.dump(obj) 1538 return file.getvalue() File ~/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py:1303, in Pickler.dump(self, obj) 1302 try: -> 1303 return super().dump(obj) 1304 except RuntimeError as e: TypeError: cannot pickle '_asyncio.Task' object During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/serialize.py:366, in serialize(x, serializers, on_error, context, iterate_collection) 365 try: --> 366 header, frames = dumps(x, context=context) if wants_context else dumps(x) 367 header["serializer"] = name File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/serialize.py:78, in pickle_dumps(x, context) 76 writeable.append(not f.readonly) ---> 78 frames[0] = pickle.dumps( 79 x, 80 buffer_callback=buffer_callback, 81 protocol=context.get("pickle-protocol", None) if context else None, 82 ) 83 header = { 84 "serializer": "pickle", 85 "writeable": tuple(writeable), 86 } File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/pickle.py:77, in dumps(x, buffer_callback, protocol) 76 buffers.clear() ---> 77 result = cloudpickle.dumps(x, **dump_kwargs) 78 except Exception: File ~/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py:1537, in dumps(obj, protocol, buffer_callback) 1536 cp = Pickler(file, protocol=protocol, buffer_callback=buffer_callback) -> 1537 cp.dump(obj) 1538 return file.getvalue() File ~/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py:1303, in Pickler.dump(self, obj) 1302 try: -> 1303 return super().dump(obj) 1304 except RuntimeError as e: TypeError: cannot pickle '_asyncio.Task' object The above exception was the direct cause of the following exception: TypeError Traceback (most recent call last) Cell In[18], line 28 25 return model 27 # 启动异步训练 ---> 28 future = client.submit(train_model, X, y) 30 # 监控任务状态 31 while not future.done(): File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/client.py:2157, in Client.submit(self, func, key, workers, resources, retries, priority, fifo_timeout, allow_other_workers, actor, actors, pure, *args, **kwargs) 2142 workers = [workers] 2144 expr = LLGExpr( 2145 { 2146 key: Task( (...) 2155 _determ_token=uuid.uuid4().hex, 2156 ) -> 2157 futures = self._graph_to_futures( 2158 expr, 2159 [key], 2160 workers=workers, 2161 allow_other_workers=allow_other_workers, 2162 internal_priority={key: 0}, 2163 user_priority=priority, 2164 resources=resources, 2165 retries=retries, 2166 fifo_timeout=fifo_timeout, 2167 actors=actor, 2168 span_metadata=SpanMetadata(collections=[{"type": "Future"}]), 2169 ) 2171 logger.debug("Submit %s(...), %s", funcname(func), key) 2173 return futures[key] File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/client.py:3357, in Client._graph_to_futures(self, expr, keys, span_metadata, workers, allow_other_workers, internal_priority, user_priority, resources, retries, fifo_timeout, actors) 3352 futures = {key: Future(key, self) for key in keyset} 3354 # This is done manually here to get better exception messages on 3355 # scheduler side and be able to produce the below warning about 3356 # serialized size -> 3357 expr_ser = Serialized(*serialize(to_serialize(expr), on_error="raise")) 3359 pickled_size = sum(map(nbytes, [expr_ser.header] + expr_ser.frames)) 3360 if pickled_size > parse_bytes( 3361 dask.config.get("distributed.admin.large-graph-warning-threshold") 3362 ): File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/serialize.py:284, in serialize(x, serializers, on_error, context, iterate_collection) 282 return x.header, x.frames 283 if isinstance(x, Serialize): --> 284 return serialize( 285 x.data, 286 serializers=serializers, 287 on_error=on_error, 288 context=context, 289 iterate_collection=True, 290 ) 292 # Note: don't use isinstance(), as it would match subclasses 293 # (e.g. namedtuple, defaultdict) which however would revert to the base class on a 294 # round-trip through msgpack 295 if iterate_collection is None and type(x) in (list, set, tuple, dict): File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/serialize.py:392, in serialize(x, serializers, on_error, context, iterate_collection) 390 except Exception: 391 raise TypeError(msg) from exc --> 392 raise TypeError(msg, str_x) from exc 393 else: # pragma: nocover 394 raise ValueError(f"{on_error=}; expected 'message' or 'raise'") TypeError: ('Could not serialize object of type LLGExpr', "LLGExpr(dsk={'train_model-89e27dbb7b722e347e66a754d8d125f0': <Task 'train_model-89e27dbb7b722e347e66a754d8d125f0' train_model(...)>})")
最新发布
09-09
2025-09-08 23:23:46,106 - distributed.protocol.pickle - ERROR - Failed to serialize LLGExpr(dsk={'train_model-2603baa04e8c10c2ce2156a3c677180a': <Task 'train_model-2603baa04e8c10c2ce2156a3c677180a' train_model()>}). Traceback (most recent call last): File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 73, in dumps result = cloudpickle.dumps(x, **dump_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1537, in dumps cp.dump(obj) File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1303, in dump return super().dump(obj) ^^^^^^^^^^^^^^^^^ TypeError: cannot pickle '_asyncio.Task' object During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 77, in dumps result = cloudpickle.dumps(x, **dump_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1537, in dumps cp.dump(obj) File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1303, in dump return super().dump(obj) ^^^^^^^^^^^^^^^^^ TypeError: cannot pickle '_asyncio.Task' object --------------------------------------------------------------------------- TypeError Traceback (most recent call last) File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/pickle.py:73, in dumps(x, buffer_callback, protocol) 72 buffers.clear() ---> 73 result = cloudpickle.dumps(x, **dump_kwargs) 74 except Exception: File ~/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py:1537, in dumps(obj, protocol, buffer_callback) 1536 cp = Pickler(file, protocol=protocol, buffer_callback=buffer_callback) -> 1537 cp.dump(obj) 1538 return file.getvalue() File ~/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py:1303, in Pickler.dump(self, obj) 1302 try: -> 1303 return super().dump(obj) 1304 except RuntimeError as e: TypeError: cannot pickle '_asyncio.Task' object During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/serialize.py:366, in serialize(x, serializers, on_error, context, iterate_collection) 365 try: --> 366 header, frames = dumps(x, context=context) if wants_context else dumps(x) 367 header["serializer"] = name File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/serialize.py:78, in pickle_dumps(x, context) 76 writeable.append(not f.readonly) ---> 78 frames[0] = pickle.dumps( 79 x, 80 buffer_callback=buffer_callback, 81 protocol=context.get("pickle-protocol", None) if context else None, 82 ) 83 header = { 84 "serializer": "pickle", 85 "writeable": tuple(writeable), 86 } File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/pickle.py:77, in dumps(x, buffer_callback, protocol) 76 buffers.clear() ---> 77 result = cloudpickle.dumps(x, **dump_kwargs) 78 except Exception: File ~/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py:1537, in dumps(obj, protocol, buffer_callback) 1536 cp = Pickler(file, protocol=protocol, buffer_callback=buffer_callback) -> 1537 cp.dump(obj) 1538 return file.getvalue() File ~/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py:1303, in Pickler.dump(self, obj) 1302 try: -> 1303 return super().dump(obj) 1304 except RuntimeError as e: TypeError: cannot pickle '_asyncio.Task' object The above exception was the direct cause of the following exception: TypeError Traceback (most recent call last) Cell In[19], line 45 42 return model, start 44 # 启动异步训练 ---> 45 future = client.submit(train_model) 47 # 监控任务状态 48 start_time = None File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/client.py:2157, in Client.submit(self, func, key, workers, resources, retries, priority, fifo_timeout, allow_other_workers, actor, actors, pure, *args, **kwargs) 2142 workers = [workers] 2144 expr = LLGExpr( 2145 { 2146 key: Task( (...) 2155 _determ_token=uuid.uuid4().hex, 2156 ) -> 2157 futures = self._graph_to_futures( 2158 expr, 2159 [key], 2160 workers=workers, 2161 allow_other_workers=allow_other_workers, 2162 internal_priority={key: 0}, 2163 user_priority=priority, 2164 resources=resources, 2165 retries=retries, 2166 fifo_timeout=fifo_timeout, 2167 actors=actor, 2168 span_metadata=SpanMetadata(collections=[{"type": "Future"}]), 2169 ) 2171 logger.debug("Submit %s(...), %s", funcname(func), key) 2173 return futures[key] File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/client.py:3357, in Client._graph_to_futures(self, expr, keys, span_metadata, workers, allow_other_workers, internal_priority, user_priority, resources, retries, fifo_timeout, actors) 3352 futures = {key: Future(key, self) for key in keyset} 3354 # This is done manually here to get better exception messages on 3355 # scheduler side and be able to produce the below warning about 3356 # serialized size -> 3357 expr_ser = Serialized(*serialize(to_serialize(expr), on_error="raise")) 3359 pickled_size = sum(map(nbytes, [expr_ser.header] + expr_ser.frames)) 3360 if pickled_size > parse_bytes( 3361 dask.config.get("distributed.admin.large-graph-warning-threshold") 3362 ): File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/serialize.py:284, in serialize(x, serializers, on_error, context, iterate_collection) 282 return x.header, x.frames 283 if isinstance(x, Serialize): --> 284 return serialize( 285 x.data, 286 serializers=serializers, 287 on_error=on_error, 288 context=context, 289 iterate_collection=True, 290 ) 292 # Note: don't use isinstance(), as it would match subclasses 293 # (e.g. namedtuple, defaultdict) which however would revert to the base class on a 294 # round-trip through msgpack 295 if iterate_collection is None and type(x) in (list, set, tuple, dict): File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/serialize.py:392, in serialize(x, serializers, on_error, context, iterate_collection) 390 except Exception: 391 raise TypeError(msg) from exc --> 392 raise TypeError(msg, str_x) from exc 393 else: # pragma: nocover 394 raise ValueError(f"{on_error=}; expected 'message' or 'raise'") TypeError: ('Could not serialize object of type LLGExpr', "LLGExpr(dsk={'train_model-2603baa04e8c10c2ce2156a3c677180a': <Task 'train_model-2603baa04e8c10c2ce2156a3c677180a' train_model()>})")
09-09
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值