Linux memory manager and your big data

Disclaimer: We always assume that when we have an issue and think it's the operating system, 99% of the time, it turns out to be something else. We therefore caution against assuming that the problem is with your operating system, unless your use-case and the following example completely overlap.

It all started with one of our customers reporting performance issues with their CitusDB cluster. This customer designed their cluster such that their working set would fit into memory, but their query run-times showed every indication that their queries were hitting disk. This naturally reduced their query run times by 10-100x.

We started looking into this problem by first examining CitusDB's query distribution mechanism and then by checking the PostgreSQL instances on the machines. We found that neither was the culprit here, and came up with the following observations:

  1. The customer's working set was one day's worth of query logs. Once they were done looking at a particular day, they started querying the next day's data.
  2. Their queries involved mostly sequential I/O. They didn't use indexes a lot.
  3. A day's data occupied more than 60% of the memory on each node (but way less than total available memory). They didn't have anything else using memory on their instances.

Our assumption going into this was that since each day's data easily fit into RAM, the Linux memory manager would eventually bring that day's data into the page cache. Once the customer started querying the next day's data (and only next day's data), then the new data would come into the page cache. At least, this is what a simple cache using the LRU eviction policy would do.

It turns out LRU has two shortcomings when used as a page replacement algorithm. First, an exact LRU implementation is too costly in this context. Second, the memory manager needs to account for frequency as well, so that a large file read doesn't evict the entire cache. Therefore, Linux uses a more sophisticated algorithm than LRU; and that algorithm doesn't play along well with the workload we just described.

To put things into an example, let's assume that you have a kernel newer than 2.6.31 (released in 2009) and that you're using an m2.4xlarge EC2 instance with 68 GB of memory. Let's also say that you have two days worth of clickstream data. Each day's data takes more than 60% of available memory, but individually they easily fit into RAM.

$ ls -lh clickstream.csv.*
-rw-rw-r-- ec2-user ec2-user 42G Nov 25 19:45 clickstream.csv.1
-rw-rw-r-- ec2-user ec2-user 42G Nov 25 19:47 clickstream.csv.2

Now, let's bring in the first day's data to memory by running the "word count" command on the clickstream file several times. Note the time difference between these two runs. The first time we run the command, the Linux memory manager brings the file's pages into the page cache. On the next run, everything gets served from memory.

$ time wc -l clickstream.csv.1 
336006288 clickstream.csv.1

real	10m4.575s
...

$ time wc -l clickstream.csv.1 
336006288 clickstream.csv.1

real	0m18.858s

Then, let's switch over to the second day's clickstream file. We again run the word count command multiple times to bring the file into memory. An LRU-like policy here would evict the first day's data after several runs, and bring the second day's data into memory. Unfortunately, no matter how many times you access the second file in this case, the Linux memory manager will never bring it into memory.

$ time wc -l clickstream.csv.2
336027448 clickstream.csv.2

real	9m50.542s

$ time wc -l clickstream.csv.2
336027448 clickstream.csv.2

real	9m52.265s

In fact, if you run into this scenario, the only way to bring the second day's data into memory is by manually flushing the page cache. Obviously, this cure might be worse than the disease, but for our little experiment, it helps.

$ echo 1 | sudo tee /proc/sys/vm/drop_caches
1

$ time wc -l clickstream.csv.2
336027448 clickstream.csv.2

real	9m51.906s

$ time wc -l clickstream.csv.2
336027448 clickstream.csv.2

real	0m17.874s

Taking a step back, the problem here lies with how Linux manages its page cache. The Linux memory manager keeps cached filesystem pages in two types of lists. One list holds recently accessed pages (recency list), and the other one holds pages that have been referenced multiple times (frequency list).

In current kernel versions, the memory manager splits available memory evenly between these two lists to establish a trade-off between protecting frequently used pages and detecting recently used ones. In other words, the kernel reserves 50% of available memory to the frequency list.

In the previous example, both lists start out empty. When referenced, the first day's pages first go into the recency list. On the second reference, they get promoted to the frequency list.

Next, when the user wants to work on the second day's data, this file is larger than 50% of available memory, but the recency list is not. Therefore, sequential scans over the file result in thrashing. The first filesystem page in the second file makes it into the recency list, but gets kicked out once the recency list fills up. As a result, no two pages in the second file stay long enough in the recency list for their reference counts to get incremented.

Fortunately, this issue occurs only when you have all three observations that we outlined above (very infrequent), and it's getting fixed as we speak. If you're interested, you can read more about the original problem report and the proposed fix in the Linux kernel mailing lists.

For us, the really neat part was how easy it was to identify the problem. Since Citus extends PostgreSQL, once we saw the issue, we could quickly reproduce it on Postgres. We then posted our findings to the Linux mailing lists, and the community took over from there.

Got comments? Join the discussion on Hacker News.

在使用nnunet训练时,出现如下报错jzuser@vpc87-3:~/Work_dir/Gn/pystudy/NnuNet$ nnUNetv2_train 3 2d 0 nnUNetv2_train 3 2d 1 nnUNetv2_train 3 2d 2 nnUNetv2_train 3 2d 3 nnUNetv2_train 3 2d 4 Using device: cuda:0 /home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:152: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. self.grad_scaler = GradScaler() if self.device.type == 'cuda' else None 2025-08-27 09:22:59.147944: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x70428e727680>) 2025-08-27 09:22:59.147944: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x70428fe062c0>) 2025-08-27 09:22:59.147944: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x70428f79c100>) 2025-08-27 09:22:59.147944: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x70428fe062c0>) 2025-08-27 09:22:59.147944: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x70428f79c100>) ####################################################################### Please cite the following paper when using nnU-Net: Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. ####################################################################### /home/jzuser/.local/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:62: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate. warnings.warn( 2025-08-27 09:23:04.645125: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x7042c1613700>) 2025-08-27 09:23:04.645125: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x70428e2d0ec0>) 2025-08-27 09:23:04.645125: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x7042c1613700>) 2025-08-27 09:23:04.645125: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x70428e2d0ec0>) 2025-08-27 09:23:04.645125: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x7042c1613700>) This is the configuration used by this training: Configuration name: 2d {'data_identifier': 'nnUNetPlans_2d', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 12, 'patch_size': [512, 512], 'median_image_size_in_voxels': [512.0, 512.0], 'spacing': [0.767578125, 0.767578125], 'normalization_schemes': ['CTNormalization'], 'use_mask_for_norm': [False], 'UNet_class_name': 'PlainConvUNet', 'UNet_base_num_features': 32, 'n_conv_per_stage_encoder': [2, 2, 2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2, 2, 2], 'num_pool_per_axis': [7, 7], 'pool_op_kernel_sizes': [[1, 1], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2]], 'conv_kernel_sizes': [[3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3]], 'unet_max_num_features': 512, 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'batch_dice': True} 2025-08-27 09:23:07.150209: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x7042c34041c0>) 2025-08-27 09:23:07.150209: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x70428e2d0ec0>) 2025-08-27 09:23:07.150209: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x7042c34041c0>) 2025-08-27 09:23:07.150209: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x70428e2d0ec0>) 2025-08-27 09:23:07.150209: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x7042c34041c0>) These are the global plan.json settings: {'dataset_name': 'Dataset003_Liver', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [1.0, 0.767578125, 0.767578125], 'original_median_shape_after_transp': [432, 512, 512], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 5420.0, 'mean': 99.48007202148438, 'median': 101.0, 'min': -983.0, 'percentile_00_5': -15.0, 'percentile_99_5': 197.0, 'std': 37.13840103149414}}} 2025-08-27 09:23:09.655113: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x7042c141d100>) 2025-08-27 09:23:09.655113: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x7042c2a95b80>) 2025-08-27 09:23:09.655113: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x7042c141d100>) 2025-08-27 09:23:09.655113: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x7042c2a95b80>) 2025-08-27 09:23:09.655113: failed to log: (<class 'OSError'>, OSError(28, 'No space left on device'), <traceback object at 0x7042c141d100>) 2025-08-27 09:23:09.655113: unpacking dataset... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 581, in save format.write_array(fid, arr, allow_pickle=allow_pickle, File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/format.py", line 754, in write_array array.tofile(fp) OSError: [Errno 28] No space left on device During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.10/multiprocessing/pool.py", line 51, in starmapstar return list(itertools.starmap(args[0], args[1])) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/utils.py", line 17, in _convert_to_npy np.save(npz_file[:-4] + "_seg.npy", a['seg']) File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 579, in save with file_ctx as fid: OSError: [Errno 28] No space left on device """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/jzuser/.local/bin/nnUNetv2_train", line 8, in <module> sys.exit(run_training_entry()) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 252, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 195, in run_training nnunet_trainer.run_training() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1203, in run_training self.on_train_start() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 788, in on_train_start unpack_dataset(self.preprocessed_dataset_folder, unpack_segmentation=True, overwrite_existing=False, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/utils.py", line 33, in unpack_dataset p.starmap(_convert_to_npy, zip(npz_files, File "/usr/lib/python3.10/multiprocessing/pool.py", line 375, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File "/usr/lib/python3.10/multiprocessing/pool.py", line 774, in get raise self._value OSError: [Errno 28] No space left on device Using device: cuda:0 /home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:152: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. self.grad_scaler = GradScaler() if self.device.type == 'cuda' else None Traceback (most recent call last): File "/home/jzuser/.local/bin/nnUNetv2_train", line 8, in <module> sys.exit(run_training_entry()) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 252, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 180, in run_training nnunet_trainer = get_trainer_from_args(dataset_name_or_id, configuration, fold, trainer_class_name, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 65, in get_trainer_from_args nnunet_trainer = nnunet_trainer(plans=plans, configuration=configuration, fold=fold, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 159, in __init__ maybe_mkdir_p(self.output_folder) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/utilities/file_and_folder_operations.py", line 88, in maybe_mkdir_p os.makedirs(directory, exist_ok=True) File "/usr/lib/python3.10/os.py", line 225, in makedirs mkdir(name, mode) OSError: [Errno 28] No space left on device: '/home/jzuser/Work_dir/Gn/pystudy/NnuNet/nnUNet_results/Dataset003_Liver/nnUNetTrainer__nnUNetPlans__2d/fold_1' Using device: cuda:0 /home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:152: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. self.grad_scaler = GradScaler() if self.device.type == 'cuda' else None Traceback (most recent call last): File "/home/jzuser/.local/bin/nnUNetv2_train", line 8, in <module> sys.exit(run_training_entry()) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 252, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 180, in run_training nnunet_trainer = get_trainer_from_args(dataset_name_or_id, configuration, fold, trainer_class_name, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 65, in get_trainer_from_args nnunet_trainer = nnunet_trainer(plans=plans, configuration=configuration, fold=fold, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 159, in __init__ maybe_mkdir_p(self.output_folder) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/utilities/file_and_folder_operations.py", line 88, in maybe_mkdir_p os.makedirs(directory, exist_ok=True) File "/usr/lib/python3.10/os.py", line 225, in makedirs mkdir(name, mode) OSError: [Errno 28] No space left on device: '/home/jzuser/Work_dir/Gn/pystudy/NnuNet/nnUNet_results/Dataset003_Liver/nnUNetTrainer__nnUNetPlans__2d/fold_2' Using device: cuda:0 /home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:152: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. self.grad_scaler = GradScaler() if self.device.type == 'cuda' else None Traceback (most recent call last): File "/home/jzuser/.local/bin/nnUNetv2_train", line 8, in <module> sys.exit(run_training_entry()) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 252, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 180, in run_training nnunet_trainer = get_trainer_from_args(dataset_name_or_id, configuration, fold, trainer_class_name, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 65, in get_trainer_from_args nnunet_trainer = nnunet_trainer(plans=plans, configuration=configuration, fold=fold, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 159, in __init__ maybe_mkdir_p(self.output_folder) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/utilities/file_and_folder_operations.py", line 88, in maybe_mkdir_p os.makedirs(directory, exist_ok=True) File "/usr/lib/python3.10/os.py", line 225, in makedirs mkdir(name, mode) OSError: [Errno 28] No space left on device: '/home/jzuser/Work_dir/Gn/pystudy/NnuNet/nnUNet_results/Dataset003_Liver/nnUNetTrainer__nnUNetPlans__2d/fold_3' Using device: cuda:0 /home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:152: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. self.grad_scaler = GradScaler() if self.device.type == 'cuda' else None Traceback (most recent call last): File "/home/jzuser/.local/bin/nnUNetv2_train", line 8, in <module> sys.exit(run_training_entry()) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 252, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 180, in run_training nnunet_trainer = get_trainer_from_args(dataset_name_or_id, configuration, fold, trainer_class_name, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 65, in get_trainer_from_args nnunet_trainer = nnunet_trainer(plans=plans, configuration=configuration, fold=fold, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 159, in __init__ maybe_mkdir_p(self.output_folder) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/utilities/file_and_folder_operations.py", line 88, in maybe_mkdir_p os.makedirs(directory, exist_ok=True) File "/usr/lib/python3.10/os.py", line 225, in makedirs mkdir(name, mode) OSError: [Errno 28] No space left on device: '/home/jzuser/Work_dir/Gn/pystudy/NnuNet/nnUNet_results/Dataset003_Liver/nnUNetTrainer__nnUNetPlans__2d/fold_4' jzuser@vpc87-3:~/Work_dir/Gn/pystudy/NnuNet$ nnUNetv2_train 3 2d 0 nnUNetv2_train 3 2d 1 nnUNetv2_train 3 2d 2 nnUNetv2_train 3 2d 3 nnUNetv2_train 3 2d 4 Using device: cuda:0 /home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:152: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. self.grad_scaler = GradScaler() if self.device.type == 'cuda' else None ####################################################################### Please cite the following paper when using nnU-Net: Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. ####################################################################### /home/jzuser/.local/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:62: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate. warnings.warn( This is the configuration used by this training: Configuration name: 2d {'data_identifier': 'nnUNetPlans_2d', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 12, 'patch_size': [512, 512], 'median_image_size_in_voxels': [512.0, 512.0], 'spacing': [0.767578125, 0.767578125], 'normalization_schemes': ['CTNormalization'], 'use_mask_for_norm': [False], 'UNet_class_name': 'PlainConvUNet', 'UNet_base_num_features': 32, 'n_conv_per_stage_encoder': [2, 2, 2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2, 2, 2], 'num_pool_per_axis': [7, 7], 'pool_op_kernel_sizes': [[1, 1], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2]], 'conv_kernel_sizes': [[3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3]], 'unet_max_num_features': 512, 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'batch_dice': True} These are the global plan.json settings: {'dataset_name': 'Dataset003_Liver', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [1.0, 0.767578125, 0.767578125], 'original_median_shape_after_transp': [432, 512, 512], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 5420.0, 'mean': 99.48007202148438, 'median': 101.0, 'min': -983.0, 'percentile_00_5': -15.0, 'percentile_99_5': 197.0, 'std': 37.13840103149414}}} 2025-08-27 11:19:51.857624: unpacking dataset... 2025-08-27 11:20:40.435668: unpacking done... 2025-08-27 11:20:40.436964: do_dummy_2d_data_aug: False 2025-08-27 11:20:40.438331: Creating new 5-fold cross-validation split... 2025-08-27 11:20:40.441395: Desired fold for training: 0 2025-08-27 11:20:40.441591: This split has 104 training and 27 validation cases. 2025-08-27 11:20:40.498886: Unable to plot network architecture: 2025-08-27 11:20:40.500076: No module named 'hiddenlayer' 2025-08-27 11:20:40.552532: 2025-08-27 11:20:40.553642: Epoch 0 2025-08-27 11:20:40.554312: Current learning rate: 0.01 Exception in background worker 0: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file Exception in background worker 2: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file Exception in background worker 1: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file Exception in background worker 3: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file Exception in background worker 4: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file using pin_memory on device 0 Traceback (most recent call last): File "/home/jzuser/.local/bin/nnUNetv2_train", line 8, in <module> sys.exit(run_training_entry()) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 252, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 195, in run_training nnunet_trainer.run_training() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1211, in run_training train_outputs.append(self.train_step(next(self.dataloader_train))) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 196, in __next__ item = self.__get_next_item() File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 181, in __get_next_item raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the " RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message Using device: cuda:0 /home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:152: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. self.grad_scaler = GradScaler() if self.device.type == 'cuda' else None ####################################################################### Please cite the following paper when using nnU-Net: Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. ####################################################################### /home/jzuser/.local/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:62: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate. warnings.warn( This is the configuration used by this training: Configuration name: 2d {'data_identifier': 'nnUNetPlans_2d', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 12, 'patch_size': [512, 512], 'median_image_size_in_voxels': [512.0, 512.0], 'spacing': [0.767578125, 0.767578125], 'normalization_schemes': ['CTNormalization'], 'use_mask_for_norm': [False], 'UNet_class_name': 'PlainConvUNet', 'UNet_base_num_features': 32, 'n_conv_per_stage_encoder': [2, 2, 2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2, 2, 2], 'num_pool_per_axis': [7, 7], 'pool_op_kernel_sizes': [[1, 1], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2]], 'conv_kernel_sizes': [[3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3]], 'unet_max_num_features': 512, 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'batch_dice': True} These are the global plan.json settings: {'dataset_name': 'Dataset003_Liver', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [1.0, 0.767578125, 0.767578125], 'original_median_shape_after_transp': [432, 512, 512], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 5420.0, 'mean': 99.48007202148438, 'median': 101.0, 'min': -983.0, 'percentile_00_5': -15.0, 'percentile_99_5': 197.0, 'std': 37.13840103149414}}} 2025-08-27 11:20:48.364002: unpacking dataset... 2025-08-27 11:20:51.509973: unpacking done... 2025-08-27 11:20:51.510635: do_dummy_2d_data_aug: False 2025-08-27 11:20:51.511523: Using splits from existing split file: /home/jzuser/Work_dir/Gn/pystudy/NnuNet/nnUNet_preprocessed/Dataset003_Liver/splits_final.json 2025-08-27 11:20:51.511727: The split file contains 5 splits. 2025-08-27 11:20:51.511775: Desired fold for training: 1 2025-08-27 11:20:51.511804: This split has 105 training and 26 validation cases. 2025-08-27 11:20:51.521027: Unable to plot network architecture: 2025-08-27 11:20:51.521092: No module named 'hiddenlayer' 2025-08-27 11:20:51.526808: 2025-08-27 11:20:51.526880: Epoch 0 2025-08-27 11:20:51.526956: Current learning rate: 0.01 Exception in background worker 1: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file Exception in background worker 2: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file Exception in background worker 0: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file using pin_memory on device 0 Traceback (most recent call last): File "/home/jzuser/.local/bin/nnUNetv2_train", line 8, in <module> sys.exit(run_training_entry()) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 252, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 195, in run_training nnunet_trainer.run_training() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1211, in run_training train_outputs.append(self.train_step(next(self.dataloader_train))) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 196, in __next__ item = self.__get_next_item() File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 181, in __get_next_item raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the " RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message Using device: cuda:0 /home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:152: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. self.grad_scaler = GradScaler() if self.device.type == 'cuda' else None ####################################################################### Please cite the following paper when using nnU-Net: Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. ####################################################################### /home/jzuser/.local/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:62: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate. warnings.warn( This is the configuration used by this training: Configuration name: 2d {'data_identifier': 'nnUNetPlans_2d', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 12, 'patch_size': [512, 512], 'median_image_size_in_voxels': [512.0, 512.0], 'spacing': [0.767578125, 0.767578125], 'normalization_schemes': ['CTNormalization'], 'use_mask_for_norm': [False], 'UNet_class_name': 'PlainConvUNet', 'UNet_base_num_features': 32, 'n_conv_per_stage_encoder': [2, 2, 2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2, 2, 2], 'num_pool_per_axis': [7, 7], 'pool_op_kernel_sizes': [[1, 1], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2]], 'conv_kernel_sizes': [[3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3]], 'unet_max_num_features': 512, 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'batch_dice': True} These are the global plan.json settings: {'dataset_name': 'Dataset003_Liver', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [1.0, 0.767578125, 0.767578125], 'original_median_shape_after_transp': [432, 512, 512], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 5420.0, 'mean': 99.48007202148438, 'median': 101.0, 'min': -983.0, 'percentile_00_5': -15.0, 'percentile_99_5': 197.0, 'std': 37.13840103149414}}} 2025-08-27 11:20:58.407072: unpacking dataset... 2025-08-27 11:21:01.513495: unpacking done... 2025-08-27 11:21:01.514885: do_dummy_2d_data_aug: False 2025-08-27 11:21:01.517129: Using splits from existing split file: /home/jzuser/Work_dir/Gn/pystudy/NnuNet/nnUNet_preprocessed/Dataset003_Liver/splits_final.json 2025-08-27 11:21:01.517678: The split file contains 5 splits. 2025-08-27 11:21:01.517827: Desired fold for training: 2 2025-08-27 11:21:01.517945: This split has 105 training and 26 validation cases. 2025-08-27 11:21:01.529522: Unable to plot network architecture: 2025-08-27 11:21:01.529716: No module named 'hiddenlayer' 2025-08-27 11:21:01.540922: 2025-08-27 11:21:01.541166: Epoch 0 2025-08-27 11:21:01.541448: Current learning rate: 0.01 Exception in background worker 2: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file Exception in background worker 3: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file using pin_memory on device 0 Traceback (most recent call last): File "/home/jzuser/.local/bin/nnUNetv2_train", line 8, in <module> sys.exit(run_training_entry()) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 252, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 195, in run_training nnunet_trainer.run_training() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1211, in run_training train_outputs.append(self.train_step(next(self.dataloader_train))) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 196, in __next__ item = self.__get_next_item() File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 181, in __get_next_item raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the " RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message Using device: cuda:0 /home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:152: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. self.grad_scaler = GradScaler() if self.device.type == 'cuda' else None ####################################################################### Please cite the following paper when using nnU-Net: Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. ####################################################################### /home/jzuser/.local/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:62: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate. warnings.warn( This is the configuration used by this training: Configuration name: 2d {'data_identifier': 'nnUNetPlans_2d', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 12, 'patch_size': [512, 512], 'median_image_size_in_voxels': [512.0, 512.0], 'spacing': [0.767578125, 0.767578125], 'normalization_schemes': ['CTNormalization'], 'use_mask_for_norm': [False], 'UNet_class_name': 'PlainConvUNet', 'UNet_base_num_features': 32, 'n_conv_per_stage_encoder': [2, 2, 2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2, 2, 2], 'num_pool_per_axis': [7, 7], 'pool_op_kernel_sizes': [[1, 1], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2]], 'conv_kernel_sizes': [[3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3]], 'unet_max_num_features': 512, 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'batch_dice': True} These are the global plan.json settings: {'dataset_name': 'Dataset003_Liver', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [1.0, 0.767578125, 0.767578125], 'original_median_shape_after_transp': [432, 512, 512], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 5420.0, 'mean': 99.48007202148438, 'median': 101.0, 'min': -983.0, 'percentile_00_5': -15.0, 'percentile_99_5': 197.0, 'std': 37.13840103149414}}} 2025-08-27 11:21:08.460438: unpacking dataset... 2025-08-27 11:21:11.615700: unpacking done... 2025-08-27 11:21:11.616486: do_dummy_2d_data_aug: False 2025-08-27 11:21:11.618074: Using splits from existing split file: /home/jzuser/Work_dir/Gn/pystudy/NnuNet/nnUNet_preprocessed/Dataset003_Liver/splits_final.json 2025-08-27 11:21:11.618454: The split file contains 5 splits. 2025-08-27 11:21:11.618557: Desired fold for training: 3 2025-08-27 11:21:11.618626: This split has 105 training and 26 validation cases. 2025-08-27 11:21:11.628197: Unable to plot network architecture: 2025-08-27 11:21:11.628319: No module named 'hiddenlayer' 2025-08-27 11:21:11.635873: 2025-08-27 11:21:11.636014: Epoch 0 2025-08-27 11:21:11.636152: Current learning rate: 0.01 Exception in background worker 1: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file Exception in background worker 3: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file Exception in background worker 2: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file using pin_memory on device 0 Traceback (most recent call last): File "/home/jzuser/.local/bin/nnUNetv2_train", line 8, in <module> sys.exit(run_training_entry()) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 252, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 195, in run_training nnunet_trainer.run_training() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1211, in run_training train_outputs.append(self.train_step(next(self.dataloader_train))) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 196, in __next__ item = self.__get_next_item() File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 181, in __get_next_item raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the " RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message Using device: cuda:0 /home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:152: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. self.grad_scaler = GradScaler() if self.device.type == 'cuda' else None ####################################################################### Please cite the following paper when using nnU-Net: Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. ####################################################################### /home/jzuser/.local/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:62: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate. warnings.warn( This is the configuration used by this training: Configuration name: 2d {'data_identifier': 'nnUNetPlans_2d', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 12, 'patch_size': [512, 512], 'median_image_size_in_voxels': [512.0, 512.0], 'spacing': [0.767578125, 0.767578125], 'normalization_schemes': ['CTNormalization'], 'use_mask_for_norm': [False], 'UNet_class_name': 'PlainConvUNet', 'UNet_base_num_features': 32, 'n_conv_per_stage_encoder': [2, 2, 2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2, 2, 2], 'num_pool_per_axis': [7, 7], 'pool_op_kernel_sizes': [[1, 1], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2]], 'conv_kernel_sizes': [[3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3]], 'unet_max_num_features': 512, 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'batch_dice': True} These are the global plan.json settings: {'dataset_name': 'Dataset003_Liver', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [1.0, 0.767578125, 0.767578125], 'original_median_shape_after_transp': [432, 512, 512], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 5420.0, 'mean': 99.48007202148438, 'median': 101.0, 'min': -983.0, 'percentile_00_5': -15.0, 'percentile_99_5': 197.0, 'std': 37.13840103149414}}} 2025-08-27 11:21:18.424697: unpacking dataset... 2025-08-27 11:21:21.510880: unpacking done... 2025-08-27 11:21:21.511596: do_dummy_2d_data_aug: False 2025-08-27 11:21:21.513083: Using splits from existing split file: /home/jzuser/Work_dir/Gn/pystudy/NnuNet/nnUNet_preprocessed/Dataset003_Liver/splits_final.json 2025-08-27 11:21:21.513473: The split file contains 5 splits. 2025-08-27 11:21:21.513583: Desired fold for training: 4 2025-08-27 11:21:21.513662: This split has 105 training and 26 validation cases. 2025-08-27 11:21:21.521894: Unable to plot network architecture: 2025-08-27 11:21:21.522025: No module named 'hiddenlayer' 2025-08-27 11:21:21.530473: 2025-08-27 11:21:21.530618: Epoch 0 2025-08-27 11:21:21.530759: Current learning rate: 0.01 Exception in background worker 2: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 97, in load_case seg = np.load(entry['data_file'][:-4] + "_seg.npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file Exception in background worker 4: No data left in file Exception in background worker 1: mmap length is greater than file size Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 477, in load return format.open_memmap(file, mode=mmap_mode, File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/format.py", line 965, in open_memmap marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order, File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/_core/memmap.py", line 289, in __new__ mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start) ValueError: mmap length is greater than file size Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file Exception in background worker 3: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file Exception in background worker 0: No data left in file Exception in background worker 5: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file Traceback (most recent call last): File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__ return self.generate_train_batch() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/data_loader_2d.py", line 18, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/jzuser/.local/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 460, in load raise EOFError("No data left in file") EOFError: No data left in file using pin_memory on device 0 Traceback (most recent call last): File "/home/jzuser/.local/bin/nnUNetv2_train", line 8, in <module> sys.exit(run_training_entry()) File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 252, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 195, in run_training nnunet_trainer.run_training() File "/home/jzuser/.local/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1211, in run_training train_outputs.append(self.train_step(next(self.dataloader_train))) File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 196, in __next__ item = self.__get_next_item() File "/home/jzuser/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 181, in __get_next_item raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the " RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message如何解决
最新发布
08-28
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值