deeplabv3开源工程（3）—— 报错：2 root error(s) found. (0) Invalid argument: padded_shape[0]=168 is not...

本文围绕deeplabv3这一常用语义分割神经网络展开。介绍了使用自己数据进行deeplabv3+训练时遇到的两个问题，一是预测结果可视化报错，二是预测结果验证报错，并分别分析了查询原因，给出了添加数据缩放填充操作、保证数据集大小与训练设置尺寸一致等解决方案。

前言

deeplabv3是当前较为常用的语义分割的神经网络，且整个训练工程已经全部开源，使用公布的模型进行测试或基于自己的训练都可以得到一个较好的结果。

deeplabv3开源工程详解（1）—— 开源模型测试自己的图片
 deeplabv3开源工程详解（2）—— 使用自己的数据集进行训练、迁移学习
 deeplabv3开源工程（3）—— 报错：2 root error(s) found. (0) Invalid argument: padded_shape[0]=168 is not…

1 问题1

使用自己的数据进行deeplabv3+的训练。训练阶段可正常运行，在预测结果可视化的时候，会报如下错误：
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
 (0) Invalid argument: padded_shape[0]=168 is not divisible by block_shape[0]=18
	 [[{{node aspp3_depthwise/depthwise/SpaceToBatchND}}]]
	 [[ArgMax/_4403]]
 (1) Invalid argument: padded_shape[0]=168 is not divisible by block_shape[0]=18
	 [[{{node aspp3_depthwise/depthwise/SpaceToBatchND}}]]
0 successful operations.
查询原因

在训练阶段，工程会根据传入参数 --vis_crop_size="512,512" 对图片进行缩放和填充等操作，处理后的数据集的图片的尺寸与 vis_crop_size 保持一致，从而正确的成批传进网络进行训练。
在测试阶段，图片的尺寸将不会被处理，此时就会报错如上。

解决方案

在测试的情况下，添加数据缩放填充的操作，使得图片的尺寸与vis_crop_size保持一致即可。
如果是要验证，获取验证集上的 IOU，那么就需要对 image、label 同时进行缩放填充到 vis_crop_size 的操作。

在脚本【./research/deeplab/input_preprocess.py】中，我们可以看到如下代码
 # Randomly crop the image and label.
 if is_training and label is not None:
   processed_image, label = preprocess_utils.random_crop(
       [processed_image, label], crop_height, crop_width)
在这个代码后，添加：
 else:
   rr = tf.minimum(tf.cast(crop_height, tf.float32) / tf.cast(image_height, tf.float32), \
                   tf.cast(crop_width, tf.float32) / tf.cast(image_width, tf.float32))
   newh = tf.cast(tf.cast(image_height, tf.float32) * rr, tf.float32)
   neww = tf.cast((tf.cast(image_width, tf.float32) * rr), tf.float32)
   processed_image = tf.image.resize_images(
       processed_image, (newh, neww), method=tf.image.ResizeMethod.BILINEAR, align_corners=True)
   processed_image = preprocess_utils.pad_to_bounding_box(
       processed_image, 0, 0, crop_height, crop_width, mean_pixel)
       

2 问题2

使用自己的数据进行deeplabv3+的训练。训练阶段可正常运行，在预测结果验证的时候，会报如下错误：
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
 (0) Invalid argument: Incompatible shapes: [262144] vs. [250000]
	 [[node true_positives_1/LogicalAnd (defined at deeplab/eval.py:175) ]]
	 >[[ConstantFoldingCtrl/mean_iou/confusion_matrix/assert_non_negative_1/assert_less_equal/Assert/AssertGuard/Switch_0/_4462]]
 (1) Invalid argument: Incompatible shapes: [262144] vs. [250000]
	 [[node true_positives_1/LogicalAnd (defined at deeplab/eval.py:175) ]]
0 successful operations.
0 derived errors ignored.
查询原因

当自己的输入图片大小为(512, 512)。
在训练阶段，工程设置传入参数 --vis_crop_size="500,500" 对图片进行缩放和填充等操作，处理后的数据集的图片的尺寸与 vis_crop_size 保持一致，从而正确的成批传进网络进行训练。
在运行验证代码时，便会报错。上面的报错我们可以看到，250000 = 500*500，262144 = 512*512。

解决方案

保证自己的数据集的大小与训练时的设置的图片尺寸保持一致。
另外，修改代码也可解决这个问题，这里不作讲解。