nnU-Net V2代码——生成splits_final.json文件

nnU-Net V2根据splits_final.json文件划分数据集,并支持自定义分割数据集。若未进行自定义分割数据集,则nnU-Net V2将自动生成默认的splits_final.json文件。

本文先阅读官方文档,查看如何自定义分割数据集,再阅读默认情况下生成splits_final.json文件的代码

nnU-Net V2官方文档说明

说明文档在nnUNet \ documentation \ manual_data_splits.md

如何在 nnU-Net 中生成自定义数据分割

有时,nnU-Net 默认的 5 折交叉验证分割并不符合项目需求。也许您想运行 3 折交叉验证?或者您的训练案例不能随机分割,需要谨慎分层。别担心,nnU-Net 可以满足您的需求(它真的可以做到一切 ❤️)。

nnU-Net 使用的分割是在 nnUNetTrainer 的 do_split 函数中生成的。此函数会首先查找已存在的分割文件,如果不存在,则会创建一个。因此,如果您想影响分割,手动创建一个分割文件,使其被识别并使用,这是最佳选择!

do_split函数在nnUNet \ nnunetv2 \ training \ nnUNetTrainer \ nnUNetTrainer.py文件内

分割文件位于 nnUNet_preprocessed/DATASETXXX_NAME 文件夹中。因此,最佳实践是首先通过运行 nnUNetv2_plan_and_preproccess 来填充此文件夹。

运行nnUNetv2_plan_and_preproccess 命令后,nnUNet_preprocessed/DATASETXXX_NAME文件夹构建完毕

如果读者在nnUNet_raw/DATASETXXX_NAME文件夹内自定义了splits_final.json文件,则该文件在运行nnUNetv2_plan_and_preproccess 命令时,会被复制到nnUNet_preprocessed/DATASETXXX_NAME

如果读者并未自定义该文件,它将在do_split函数(运行nnUNetv2_train命令时)中被创建

分割以 .json 文件形式存储。它们是一个简单的 Python 列表。该列表的长度即为包含的分割数量(在默认的 nnU-Net 中为 5)。每个列表条目是一个字典,键为 'train''val'。值又是包含每个集合中训练标识符的简单列表。为了说明这一点,我以 Dataset002 文件为例进行了一些操作:

In [1]: from batchgenerators.utilities.file_and_folder_operations import load_json

In [2]: splits = load_json('splits_final.json')

In [3]: len(splits)
Out[3]: 5

In [4]: splits[0].keys()
Out[4]: dict_keys(['train', 'val'])

In [5]: len(splits[0]['train'])
Out[5]: 16

In [6]: len(splits[0]['val'])
Out[6]: 4

In [7]: print(splits[0])
{'train': ['la_003', 'la_004', 'la_005', 'la_009', 'la_010', 'la_011', 'la_014', 'la_017', 'la_018', 'la_019', 'la_020', 'la_022', 'la_023', 'la_026', 'la_029', 'la_030'],
'val': ['la_007', 'la_016', 'la_021', 'la_024']}

如果您仍然不确定分割应该是什么样子,只需从 Medical Decathlon 下载一些参考数据集,开始训练(以生成分割),然后使用您喜欢的文本编辑器手动检查 .json 文件即可!

运行nnUNetv2_plan_and_preproccess命令和nnUNetv2_train命令,查看生成默认的splits_final.json文件

为了生成自定义分割,您只需要重现上述数据结构,并将其保存为 nnUNet_preprocessed/DATASETXXX_NAME 文件夹中的 splits_final.json 文件。然后像往常一样使用 nnUNetv2_train 等命令即可。

生成splits_final.json文件

具体实施生成的代码位于nnUNet \ nnunetv2 \ utilities \ crossval_split.py文件的generate_crossval_split 函数内

generate_crossval_split 函数参数:

参数名称解释默认值
train_identifiers待划分的训练样本标识符列表,例如官方文档提供的代码里的['la_003', 'la_004', 'la_005']List[str]类型
seed随机种子12345
n_splits交叉验证的折数5

过程:

# 初始化保存所有划分结果的列表
splits = []

利用sklearn库的KFold实施分割,实例化KFold

kfold = KFold(n_splits=n_splits, shuffle=True, random_state=seed)

创建for循环,利用kfold的split函数生成各个折包含的的元素索引

for i, (train_idx, test_idx) in enumerate(kfold.split(train_identifiers)):
    # 根据索引获取本折的训练集和验证集
    train_keys = np.array(train_identifiers)[train_idx]
    test_keys = np.array(train_identifiers)[test_idx]
    
    splits.append({})
    
    # 将本折的训练集和验证集存入字典
    splits[-1]['train'] = list(train_keys)
    splits[-1]['val'] = list(test_keys)

最后,返回划分结果

return splits
(nnunet_env) jzuser@vpc87-3:~/Work_dir/Gn/pystudy/nnUNet/nnUNet$ ls -R .: documentation LICENSE nnunetv2 nnunetv2.egg-info pyproject.toml readme.md setup.py UNKNOWN.egg-info ./documentation: assets dataset_format.md __init__.py run_inference_with_pretrained_models.md benchmarking.md explanation_normalization.md installation_instructions.md set_environment_variables.md changelog.md explanation_plans_files.md manual_data_splits.md setting_up_paths.md competitions extending_nnunet.md pretraining_and_finetuning.md tldr_migration_guide_from_v1.md convert_msd_dataset.md how_to_use_nnunet.md region_based_training.md dataset_format_inference.md ignore_label.md resenc_presets.md ./documentation/assets: amos2022_sparseseg10_2d.png dkfz_logo.png nnUNetMagician.png regions_vs_labels.png sparse_annotation_amos.png amos2022_sparseseg10.png HI_Logo.png nnU-Net_overview.png scribble_example.png ./documentation/competitions: AortaSeg24.md AutoPETII.md FLARE24 __init__.py Toothfairy2 ./documentation/competitions/FLARE24: __init__.py Task_1 Task_2 ./documentation/competitions/FLARE24/Task_1: inference_flare_task1.py __init__.py readme.md ./documentation/competitions/FLARE24/Task_2: inference_flare_task2.py __init__.py readme.md ./documentation/competitions/Toothfairy2: inference_script_semseg_only_customInf2.py __init__.py readme.md ./nnunetv2: batch_running ensembling imageio model_sharing preprocessing training configuration.py evaluation inference paths.py run utilities dataset_conversion experiment_planning __init__.py postprocessing tests ./nnunetv2/batch_running: benchmarking collect_results_custom_Decathlon.py __init__.py release_trainings collect_results_custom_Decathlon_2d.py generate_lsf_runs_customDecathlon.py jobs.sh ./nnunetv2/batch_running/benchmarking: generate_benchmarking_commands.py __init__.py summarize_benchmark_results.py ./nnunetv2/batch_running/release_trainings: __init__.py nnunetv2_v1 ./nnunetv2/batch_running/release_trainings/nnunetv2_v1: collect_results.py generate_lsf_commands.py __init__.py ./nnunetv2/dataset_conversion: convert_MSD_dataset.py Dataset114_MNMs.py Dataset223_AMOS2022postChallenge.py convert_raw_dataset_from_old_nnunet_format.py Dataset115_EMIDEC.py Dataset224_AbdomenAtlas1.0.py Dataset015_018_RibFrac_RibSeg.py Dataset119_ToothFairy2_All.py Dataset226_BraTS2024-BraTS-GLI.py Dataset021_CTAAorta.py Dataset120_RoadSegmentation.py Dataset227_TotalSegmentatorMRI.py Dataset023_AbdomenAtlas1_1Mini.py Dataset137_BraTS21.py Dataset987_dummyDataset4.py Dataset027_ACDC.py Dataset218_Amos2022_task1.py Dataset989_dummyDataset4_2.py Dataset042_BraTS18.py Dataset219_Amos2022_task2.py datasets_for_integration_tests Dataset043_BraTS19.py Dataset220_KiTS2023.py generate_dataset_json.py Dataset073_Fluo_C3DH_A549_SIM.py Dataset221_AutoPETII_2023.py __init__.py ./nnunetv2/dataset_conversion/datasets_for_integration_tests: Dataset996_IntegrationTest_Hippocampus_regions_ignore.py Dataset998_IntegrationTest_Hippocampus_ignore.py __init__.py Dataset997_IntegrationTest_Hippocampus_regions.py Dataset999_IntegrationTest_Hippocampus.py ./nnunetv2/ensembling: ensemble.py __init__.py ./nnunetv2/evaluation: accumulate_cv_results.py evaluate_predictions.py find_best_configuration.py __init__.py ./nnunetv2/experiment_planning: dataset_fingerprint __init__.py plan_and_preprocess_entrypoints.py verify_dataset_integrity.py experiment_planners plan_and_preprocess_api.py plans_for_pretraining ./nnunetv2/experiment_planning/dataset_fingerprint: fingerprint_extractor.py __init__.py ./nnunetv2/experiment_planning/experiment_planners: default_experiment_planner.py __init__.py network_topology.py resampling resencUNet_planner.py residual_unets ./nnunetv2/experiment_planning/experiment_planners/resampling: __init__.py planners_no_resampling.py resample_with_torch.py ./nnunetv2/experiment_planning/experiment_planners/residual_unets: __init__.py residual_encoder_unet_planners.py ./nnunetv2/experiment_planning/plans_for_pretraining: __init__.py move_plans_between_datasets.py ./nnunetv2/imageio: base_reader_writer.py natural_image_reader_writer.py reader_writer_registry.py simpleitk_reader_writer.py __init__.py nibabel_reader_writer.py readme.md tif_reader_writer.py ./nnunetv2/inference: data_iterators.py export_prediction.py JHU_inference.py readme.md examples.py __init__.py predict_from_raw_data.py sliding_window_prediction.py ./nnunetv2/model_sharing: entry_points.py __init__.py model_download.py model_export.py model_import.py ./nnunetv2/postprocessing: __init__.py remove_connected_components.py ./nnunetv2/preprocessing: cropping __init__.py normalization preprocessors resampling ./nnunetv2/preprocessing/cropping: cropping.py __init__.py ./nnunetv2/preprocessing/normalization: default_normalization_schemes.py __init__.py map_channel_name_to_normalization.py readme.md ./nnunetv2/preprocessing/preprocessors: default_preprocessor.py __init__.py ./nnunetv2/preprocessing/resampling: default_resampling.py __init__.py no_resampling.py resample_torch.py utils.py ./nnunetv2/run: __init__.py load_pretrained_weights.py run_training.py ./nnunetv2/tests: example_data __init__.py integration_tests ./nnunetv2/tests/example_data: example_ct_sm.nii.gz example_ct_sm_T300_output.nii.gz ./nnunetv2/tests/integration_tests: add_lowres_and_cascade.py lsf_commands.sh run_integration_test_bestconfig_inference.py run_nnunet_inference.py cleanup_integration_test.py prepare_integration_tests.sh run_integration_test.sh __init__.py readme.md run_integration_test_trainingOnly_DDP.sh ./nnunetv2/training: data_augmentation dataloading __init__.py logging loss lr_scheduler nnUNetTrainer ./nnunetv2/training/data_augmentation: compute_initial_patch_size.py custom_transforms __init__.py ./nnunetv2/training/data_augmentation/custom_transforms: cascade_transforms.py __init__.py region_based_training.py deep_supervision_donwsampling.py masking.py transforms_for_dummy_2d.py ./nnunetv2/training/dataloading: data_loader.py __init__.py nnunet_dataset.py utils.py ./nnunetv2/training/logging: __init__.py nnunet_logger.py ./nnunetv2/training/loss: compound_losses.py deep_supervision.py dice.py __init__.py robust_ce_loss.py ./nnunetv2/training/lr_scheduler: __init__.py polylr.py warmup.py ./nnunetv2/training/nnUNetTrainer: __init__.py nnUNetTrainer.py primus variants ./nnunetv2/training/nnUNetTrainer/primus: __init__.py primus_trainers.py ./nnunetv2/training/nnUNetTrainer/variants: benchmarking data_augmentation loss network_architecture sampling competitions __init__.py lr_schedule optimizer training_length ./nnunetv2/training/nnUNetTrainer/variants/benchmarking: __init__.py nnUNetTrainerBenchmark_5epochs_noDataLoading.py nnUNetTrainerBenchmark_5epochs.py ./nnunetv2/training/nnUNetTrainer/variants/competitions: aortaseg24.py __init__.py ./nnunetv2/training/nnUNetTrainer/variants/data_augmentation: __init__.py nnUNetTrainerDAOrd0.py nnUNetTrainer_noDummy2DDA.py nnUNetTrainerDA5.py nnUNetTrainerNoDA.py nnUNetTrainerNoMirroring.py ./nnunetv2/training/nnUNetTrainer/variants/loss: __init__.py nnUNetTrainerCELoss.py nnUNetTrainerDiceLoss.py nnUNetTrainerTopkLoss.py ./nnunetv2/training/nnUNetTrainer/variants/lr_schedule: __init__.py nnUNetTrainerCosAnneal.py nnUNetTrainer_warmup.py ./nnunetv2/training/nnUNetTrainer/variants/network_architecture: __init__.py nnUNetTrainerBN.py nnUNetTrainerNoDeepSupervision.py ./nnunetv2/training/nnUNetTrainer/variants/optimizer: __init__.py nnUNetTrainerAdam.py nnUNetTrainerAdan.py ./nnunetv2/training/nnUNetTrainer/variants/sampling: __init__.py nnUNetTrainer_probabilisticOversampling.py ./nnunetv2/training/nnUNetTrainer/variants/training_length: __init__.py nnUNetTrainer_Xepochs_NoMirroring.py nnUNetTrainer_Xepochs.py ./nnunetv2/utilities: collate_outputs.py default_n_proc_DA.py helpers.py network_initialization.py crossval_split.py file_path_utilities.py __init__.py overlay_plots.py dataset_name_id_conversion.py find_class_by_name.py json_export.py plans_handling ddp_allgather.py get_network_from_plans.py label_handling utils.py ./nnunetv2/utilities/label_handling: __init__.py label_handling.py ./nnunetv2/utilities/plans_handling: __init__.py plans_handler.py ./nnunetv2.egg-info: dependency_links.txt entry_points.txt PKG-INFO requires.txt SOURCES.txt top_level.txt ./UNKNOWN.egg-info: dependency_links.txt PKG-INFO SOURCES.txt top_level.txt (nnunet_env) jzuser@vpc87-3:~/Work_dir/Gn/pystudy/nnUNet/nnUNet$
08-15
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

十凯

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值