Python的点点滴滴(Use a Python Object as User Data of C Callback)

本文详细阐述了如何在Python中通过回调函数与C语言进行交互,包括回调函数的定义、类型声明、参数传递及注册过程。通过实例演示了如何在Python中操作C的userdata,展示了回调函数在跨语言通信中的强大应用。

为了方便Python与C之间的交互,在调用C的Callback时,需要传递Python的对象到C中,作为回调函数的user data。

C中,一般user data的类型为void *。因此,在Python中,回调函数类型声明为:

item_cb = CFUNCTYPE( c_void_p, c_void_p )

对应C语言的回调函数:

void ( * item_cb )( void * data, void * user_data );

C中的回调函数注册接口为:

void item_foreach( void * list, item_cb callback, void * user_data );

在Python中,定义回调函数为:

def each_item( item, user_data ):
    print( cast( user_data, POINTER( py_object )).contents.value )
#还可以操作这个user_data所指向的对象
    cast( user_data, POINTER( py_object )).contents.value   +=  1

注册回调函数的代码:

value = 0
item_foreach( list, item_cb( each_item ), byref( py_object( value )))

执行后,value的值会在each_item中打印,并且自增1.

 

2025-09-08 23:23:46,106 - distributed.protocol.pickle - ERROR - Failed to serialize LLGExpr(dsk={'train_model-2603baa04e8c10c2ce2156a3c677180a': <Task 'train_model-2603baa04e8c10c2ce2156a3c677180a' train_model()>}). Traceback (most recent call last): File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 73, in dumps result = cloudpickle.dumps(x, **dump_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1537, in dumps cp.dump(obj) File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1303, in dump return super().dump(obj) ^^^^^^^^^^^^^^^^^ TypeError: cannot pickle '_asyncio.Task' object During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 77, in dumps result = cloudpickle.dumps(x, **dump_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1537, in dumps cp.dump(obj) File "/home/finance/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1303, in dump return super().dump(obj) ^^^^^^^^^^^^^^^^^ TypeError: cannot pickle '_asyncio.Task' object --------------------------------------------------------------------------- TypeError Traceback (most recent call last) File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/pickle.py:73, in dumps(x, buffer_callback, protocol) 72 buffers.clear() ---> 73 result = cloudpickle.dumps(x, **dump_kwargs) 74 except Exception: File ~/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py:1537, in dumps(obj, protocol, buffer_callback) 1536 cp = Pickler(file, protocol=protocol, buffer_callback=buffer_callback) -> 1537 cp.dump(obj) 1538 return file.getvalue() File ~/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py:1303, in Pickler.dump(self, obj) 1302 try: -> 1303 return super().dump(obj) 1304 except RuntimeError as e: TypeError: cannot pickle '_asyncio.Task' object During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/serialize.py:366, in serialize(x, serializers, on_error, context, iterate_collection) 365 try: --> 366 header, frames = dumps(x, context=context) if wants_context else dumps(x) 367 header["serializer"] = name File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/serialize.py:78, in pickle_dumps(x, context) 76 writeable.append(not f.readonly) ---> 78 frames[0] = pickle.dumps( 79 x, 80 buffer_callback=buffer_callback, 81 protocol=context.get("pickle-protocol", None) if context else None, 82 ) 83 header = { 84 "serializer": "pickle", 85 "writeable": tuple(writeable), 86 } File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/pickle.py:77, in dumps(x, buffer_callback, protocol) 76 buffers.clear() ---> 77 result = cloudpickle.dumps(x, **dump_kwargs) 78 except Exception: File ~/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py:1537, in dumps(obj, protocol, buffer_callback) 1536 cp = Pickler(file, protocol=protocol, buffer_callback=buffer_callback) -> 1537 cp.dump(obj) 1538 return file.getvalue() File ~/.conda/envs/py311/lib/python3.11/site-packages/cloudpickle/cloudpickle.py:1303, in Pickler.dump(self, obj) 1302 try: -> 1303 return super().dump(obj) 1304 except RuntimeError as e: TypeError: cannot pickle '_asyncio.Task' object The above exception was the direct cause of the following exception: TypeError Traceback (most recent call last) Cell In[19], line 45 42 return model, start 44 # 启动异步训练 ---> 45 future = client.submit(train_model) 47 # 监控任务状态 48 start_time = None File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/client.py:2157, in Client.submit(self, func, key, workers, resources, retries, priority, fifo_timeout, allow_other_workers, actor, actors, pure, *args, **kwargs) 2142 workers = [workers] 2144 expr = LLGExpr( 2145 { 2146 key: Task( (...) 2155 _determ_token=uuid.uuid4().hex, 2156 ) -> 2157 futures = self._graph_to_futures( 2158 expr, 2159 [key], 2160 workers=workers, 2161 allow_other_workers=allow_other_workers, 2162 internal_priority={key: 0}, 2163 user_priority=priority, 2164 resources=resources, 2165 retries=retries, 2166 fifo_timeout=fifo_timeout, 2167 actors=actor, 2168 span_metadata=SpanMetadata(collections=[{"type": "Future"}]), 2169 ) 2171 logger.debug("Submit %s(...), %s", funcname(func), key) 2173 return futures[key] File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/client.py:3357, in Client._graph_to_futures(self, expr, keys, span_metadata, workers, allow_other_workers, internal_priority, user_priority, resources, retries, fifo_timeout, actors) 3352 futures = {key: Future(key, self) for key in keyset} 3354 # This is done manually here to get better exception messages on 3355 # scheduler side and be able to produce the below warning about 3356 # serialized size -> 3357 expr_ser = Serialized(*serialize(to_serialize(expr), on_error="raise")) 3359 pickled_size = sum(map(nbytes, [expr_ser.header] + expr_ser.frames)) 3360 if pickled_size > parse_bytes( 3361 dask.config.get("distributed.admin.large-graph-warning-threshold") 3362 ): File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/serialize.py:284, in serialize(x, serializers, on_error, context, iterate_collection) 282 return x.header, x.frames 283 if isinstance(x, Serialize): --> 284 return serialize( 285 x.data, 286 serializers=serializers, 287 on_error=on_error, 288 context=context, 289 iterate_collection=True, 290 ) 292 # Note: don't use isinstance(), as it would match subclasses 293 # (e.g. namedtuple, defaultdict) which however would revert to the base class on a 294 # round-trip through msgpack 295 if iterate_collection is None and type(x) in (list, set, tuple, dict): File ~/.conda/envs/py311/lib/python3.11/site-packages/distributed/protocol/serialize.py:392, in serialize(x, serializers, on_error, context, iterate_collection) 390 except Exception: 391 raise TypeError(msg) from exc --> 392 raise TypeError(msg, str_x) from exc 393 else: # pragma: nocover 394 raise ValueError(f"{on_error=}; expected 'message' or 'raise'") TypeError: ('Could not serialize object of type LLGExpr', "LLGExpr(dsk={'train_model-2603baa04e8c10c2ce2156a3c677180a': <Task 'train_model-2603baa04e8c10c2ce2156a3c677180a' train_model()>})")
最新发布
09-09
在使用 Dask 分布式计算时,调用 `client.submit` 方法提交 `train_model` 任务出现 `TypeError: cannot pickle '_asyncio.Task' object` 以及 `Could not serialize object of type LLGExpr` 错误,通常是因为在序列化对象时尝试对不可序列化的对象(如 `_asyncio.Task`)进行操作,或者对象中包含无法序列化的部分。以下是一些解决办法: #### 检查函数和数据 确保 `train_model` 函数及其参数不包含不可序列化的对象,如异步任务、线程锁等。如果函数内部使用了异步操作,需要确保这些操作不会影响序列化过程。例如,如果函数中使用了异步生成器或异步上下文管理器,需要进行适当的处理。 #### 避免在函数中使用异步任务 确保 `train_model` 函数本身不返回或使用异步任务对象。如果函数内部有异步操作,需要确保这些操作在函数内部完成,而不是将未完成的异步任务作为结果返回。 #### 检查数据类型 确保传递给 `train_model` 函数的参数是可序列化的。常见的可序列化数据类型包括基本数据类型(如整数、浮点数、字符串)、列表、元组、字典等。如果参数包含自定义类的实例,需要确保这些类实现了 `__reduce__` 方法,以便能够正确序列化。 以下是一个示例代码,展示了如何确保 `train_model` 函数可序列化: ```python from dask.distributed import Client, LocalCluster import lightgbm as lgb import dask.dataframe as dd import numpy as np import time # 初始化 Dask 集群 cluster = LocalCluster() client = Client(cluster) # 生成模拟数据 def generate_data(): X = np.random.rand(100000, 20) y = np.random.randint(0, 2, 100000) return dd.from_array(X, columns=[f'col_{i}' for i in range(20)]), dd.from_array(y, columns=['target']) # 训练函数 def train_model(): train_X, train_y = generate_data() valid_X, valid_y = generate_data() params = { 'objective': 'binary', 'metric': 'binary_logloss', 'boosting_type': 'gbdt', 'num_leaves': 31, 'learning_rate': 0.05, 'tree_learner': 'data', 'num_workers': 4, 'time_out': 600 # 10 分钟超时 } model = lgb.DaskLGBMClassifier(**params, client=client) start = time.time() model.fit( train_X, train_y, eval_set=[(valid_X, valid_y)], eval_metric='logloss' ) print("任务已提交,异步执行中...") return model, start # 启动异步训练 future = client.submit(train_model) # 监控任务状态 start_time = None while not future.done(): print(f"运行中...") time.sleep(5) if future.status == 'finished': model, start_time = future.result() print(f"训练完成! 总耗时: {time.time() - start_time:.1f}s") model.booster_.save_model('async_model.txt') else: print("训练中断:", future.exception()) client.close() ```
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值