2025-09-17 20:01:58.401840: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:40] 'cuModuleLoadData(&module, data)' failed with 'CUDA_ERROR_INVALID_PTX'
2025-09-17 20:01:58.401897: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:40] 'cuModuleGetFunction(&function, module, kernel_name)' failed with 'CUDA_ERROR_INVALID_HANDLE'
2025-09-17 20:01:58.401939: W tensorflow/core/framework/op_kernel.cc:1827] INTERNAL: 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE'
2025-09-17 20:01:58.401976: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: INTERNAL: 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE'
[2025-09-17 20:01:58,402][pinnstf2.utils.utils][ERROR] -
Traceback (most recent call last):
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/pinnstf2/utils/utils.py", line 72, in wrap
metric_dict, object_dict = task_func(
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/pinnstf2/train.py", line 122, in train
model: PINNModule = hydra.utils.instantiate(cfg.model)(
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/pinnstf2/models/pinn_module.py", line 56, in __init__
self.opt = optimizer()
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/keras/src/optimizers/adam.py", line 62, in __init__
super().__init__(
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/keras/src/backend/tensorflow/optimizer.py", line 21, in __init__
super().__init__(*args, **kwargs)
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/keras/src/optimizers/base_optimizer.py", line 158, in __init__
iterations = backend.Variable(
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/keras/src/backend/common/variables.py", line 153, in __init__
initializer = self._convert_to_tensor(initializer, dtype=dtype)
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/keras/src/backend/tensorflow/core.py", line 69, in _convert_to_tensor
return convert_to_tensor(value, dtype=dtype)
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/keras/src/backend/tensorflow/core.py", line 139, in convert_to_tensor
return tf.cast(x, dtype)
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 5983, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InternalError: {{function_node __wrapped__Cast_device_/job:localhost/replica:0/task:0/device:GPU:0}} 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE' [Op:Cast] name:
[2025-09-17 20:01:58,407][pinnstf2.utils.utils][INFO] - Output dir: /home/jfq/pinns-tf2-main/pinns-tf2-main/examples/yuanzhu/examples/yuanzhu/outputs/20-01-53
Error executing job with overrides: []
Traceback (most recent call last):
File "/home/jfq/pinns-tf2-main/pinns-tf2-main/examples/yuanzhu/train.py", line 162, in main
metric_dict, _ = pinnstf2.train(
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/pinnstf2/utils/utils.py", line 84, in wrap
raise ex
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/pinnstf2/utils/utils.py", line 72, in wrap
metric_dict, object_dict = task_func(
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/pinnstf2/train.py", line 122, in train
model: PINNModule = hydra.utils.instantiate(cfg.model)(
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/pinnstf2/models/pinn_module.py", line 56, in __init__
self.opt = optimizer()
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/keras/src/optimizers/adam.py", line 62, in __init__
super().__init__(
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/keras/src/backend/tensorflow/optimizer.py", line 21, in __init__
super().__init__(*args, **kwargs)
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/keras/src/optimizers/base_optimizer.py", line 158, in __init__
iterations = backend.Variable(
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/keras/src/backend/common/variables.py", line 153, in __init__
initializer = self._convert_to_tensor(initializer, dtype=dtype)
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/keras/src/backend/tensorflow/core.py", line 69, in _convert_to_tensor
return convert_to_tensor(value, dtype=dtype)
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/keras/src/backend/tensorflow/core.py", line 139, in convert_to_tensor
return tf.cast(x, dtype)
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/jfq/anaconda3/envs/pinn/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 5983, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InternalError: {{function_node __wrapped__Cast_device_/job:localhost/replica:0/task:0/device:GPU:0}} 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE' [Op:Cast] name:
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
最新发布