nnUNet V2代码——数据预处理（二）

原创已于 2025-03-19 23:59:11 修改 · 2.5k 阅读

12 ·

CC 4.0 BY-SA版权

文章标签：

#深度学习 #机器学习 #人工智能 #计算机视觉 #神经网络

于 2025-01-16 14:53:46 首次发布

阅读nnUNet V2代码专栏收录该内容

15 篇文章

订阅专栏

前文请看nnUNetv2_plan_and_preprocess命令

阅读nnUNet\nnunetv2\preprocessing\preprocessors\default_preprocessor.py文件

文件内有一个DefaultPreprocessor类和example_test_case_preprocessing函数（测试用的，跳过）。

在DefaultPreprocessor类内涉及的其他函数都在文章后半部分说明。

数据预处理共两篇：

nnUNet V2代码——数据预处理（一）

nnUNet V2代码——数据预处理（二）

文章内提及的ConfigurationManager类和PlansManager类见nnUNet V2代码——数据预处理（一）

本文目录

DefaultPreprocessor

DefaultPreprocessor

1. init函数

定义verbose，该变量决定是否打印额外的信息

2. run函数

参数

dataset_name_or_id：数据集名称或id
configuration_name：配置名称，例如2d
plans_identifier：默认nnUNetPlans
num_processes：进程数

过程

配置必要的文件名称、目录结构、文件路径，读取nnUNetPlans.json文件，实例化PlansManager类，读取配置信息（get_configuration函数）等。

最后使用多进程运行类内函数run_case_save。

代码结构清晰，不做粘贴

3. run_case_save函数

本函数主要调用类内函数run_case，获取需要数据（此处的data和seg压缩保存为npz文件，npy文件在run_case函数内保存），参数也和它基本一致；保存相关数据。代码清晰，不做粘贴。

代码里的data变量是医学待分割图像，所以本文用图像代替data，便于阅读

4. run_case函数

参数

image_files：待分割图像路径
seg_file：掩码图像路径
plans_manager：PlansManager类
configuration_manager：ConfigurationManager类
dataset_json：配置nnUNet_raw文件夹时创建的，内容需要用户自己写

过程

读取文件，运行类内函数run_case_npy，并返回处理后（处理过程见run_case_npy函数）的数据。代码清晰，不做粘贴

5. run_case_npy函数

参数

properties：医学图像的相关信息，例如体素间距
其他参数和run_case函数一致

过程

预处理具体过程
首先复制一份图像和seg；查看是否有seg，存入has_seg变量。代码清晰，不做粘贴。

接下来依照nnUNetPlans.json确定的前向转置数组（transpose_forward）对图像、seg和原始图像的spacing前向转置：

data = data.transpose([0, *[i + 1 for i in plans_manager.transpose_forward]])
if seg is not None:
    seg = seg.transpose([0, *[i + 1 for i in plans_manager.transpose_forward]])
original_spacing = [properties['spacing'][i] for i in plans_manager.transpose_forward]

在这里插入图片描述

先看一遍上面三张图，方便理解接下来的裁剪操作

1️⃣获取原始图像的大小(shape_before_cropping )。
2️⃣根据crop_to_nonzero函数去除图像（data）和seg中多余的无效区域（上面第一张图中的零区域），再将seg中剩余的无效区域赋值为-1，这些-1值在归一化时可能使用，在训练时则会被去除。
3️⃣再将裁剪后图像在原图像中的位置（bbox）以及裁剪后的图像大小，存入properties字典中：

shape_before_cropping = data.shape[1:]
properties['shape_before_cropping'] = shape_before_cropping
data, seg, bbox = crop_to_nonzero(data, seg)
properties['bbox_used_for_cropping'] = bbox
properties['shape_after_cropping_and_before_resampling'] = data.shape[1:]

接下来提取体素间距，if语句用来判断当前配置是否为2d，2d配置下不改变各个切片间的体素间距

根据体素间距计算重采样后的图像大小（由compute_new_shape函数获取，代码清晰）：

target_spacing = configuration_manager.spacing
 
if len(target_spacing) < len(data.shape[1:]):
    target_spacing = [original_spacing[0]] + target_spacing
new_shape = compute_new_shape(data.shape[1:], original_spacing, target_spacing)

接下来对图像进行归一化操作、对图像和seg进行重采样操作，查询nnUNetPlans.json文件获取之前确定的归一化函数和重采样函数，依次执行。nnUNet作者强调，归一化操作必须在重采样前执行。再根据self.verbose决定是否打印更多信息：

# normalize
data = self._normalize(data, seg, configuration_manager,
                        plans_manager.foreground_intensity_properties_per_channel)

old_shape = data.shape[1:]
data = configuration_manager.resampling_fn_data(data, new_shape, original_spacing, target_spacing)
seg = configuration_manager.resampling_fn_seg(seg, new_shape, original_spacing, target_spacing)
if self.verbose:
    print(f'old shape: {old_shape}, new_shape: {new_shape}, old_spacing: {original_spacing}, '
            f'new_spacing: {target_spacing}, fn_data: {configuration_manager.resampling_fn_data}')

⭐️⭐️归一化时，nnU-Net V2会根据上面的裁剪结果判定归一化是否包含seg中值为-1的区域，如果因为裁剪操作导致图像不足原来的3/4，则只对有效区域（seg中值不为-1的区域）进行归一化；如果并未小于原来的3/4，那么对图像（data）的所有区域进行归一化。

多一步判定的原因：在一些特定的医学图像处理任务中（比如脑瘤分割 BraTS 任务），输入的图像（如 MRI 图像）通常包含某些没有意义的无效区域（比如背景区域，如空气），而其他区域（例如脑组织）是需要分割的有效区域。在对这些图像进行归一化时，如果整个图像的强度分布都参与归一化计算，可能会受到无效区域的强烈干扰。例如，背景区域的像素值可能非常低（接近 0），而脑组织区域的像素值相对较高。如果不对这些背景区域进行掩码处理，归一化时会根据整个图像的像素值分布来计算统计量（比如均值和标准差），而这些统计量可能会严重偏向背景区域。

接下来对seg内前景像素值采样，获取这些像素值的坐标：如果有seg（has_seg变量为True），实例化LabelManager类，用于处理类别（在dataset.json里用户定义的前景类别，例如A：1）；获取前景类别，之后调用 _sample_foreground_locations 函数，从 seg 中采样像素值，获取这些像素值的坐标，并存入properties[‘class_locations’] 中（这些坐标在训练过程中会用到，目的是保证一个batch中有三分之一以上的batch有前景区域）。
再调用 modify_seg_fn 函数（nnUNet作者调试用的，不做说明）对 seg 进行修改：

if has_seg:
    label_manager = plans_manager.get_label_manager(dataset_json)
    collect_for_this = label_manager.foreground_regions if label_manager.has_regions \
        else label_manager.foreground_labels
 
    # 使用忽略标签时，我们希望仅从标注区域采样。因此，我们还需要从所有类（包括背景）中均匀采样
    if label_manager.has_ignore_label:
        collect_for_this.append(label_manager.all_labels)
 
    # 无需在区域中过滤背景，因为它已在 handle_labels 中过滤
    properties['class_locations'] = self._sample_foreground_locations(seg, collect_for_this,
                                                                            verbose=self.verbose)
    seg = self.modify_seg_fn(seg, plans_manager, dataset_json, configuration_manager)