【深度学习入门】在cifar10上用Keras搭建简单的深度学习模型（数据处理、数据增强、各种loss）

本文详细介绍了使用Keras构建深度学习项目的全过程，包括数据读取、预处理、模型搭建及训练保存等关键步骤。深入探讨了数据增强、损失函数选择与网络结构设计，并提供了ResNet32模型实例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

How to Build a Deep Learning Project——With Keras

Step One: Data reading

For cifar10, this step is very easy, Keras has already packaged it and split it into training data and testing data.

from keras.datasets import cifar10, cifar100
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

For our DIY datasets, we should read and save the pictures one by one.

def get_data(dir)
'''
    Return two lists
'''
    images = []
    labels = []
    dir = 'D:/'
    images_files = os.listdir(dir+'/images')
    labels_files = os.listdir(dir+'/labels')
    for x in images_files:
        images.append(x)
    for y in labels_files:
        labels.append(y)

Step Two: Data processing

One-hot Encoding

In many classification tasks, we need to convert the label images into one-hot encoding format, we can use keras.utils.to_categorical.

y_train = keras.utils.to_categorical(y_train, 10) # number of classes
y_test = keras.utils.to_categorical(y_test,10)# number of classes

Data Augmentation

There are six commen measures:

Image Centralization
Image Normalization
Image Shifting
Image Scaling
Image Zooming
Image Flipping

Image Centralization

Also called zero-mean.
In RGB format, the value in every pixel is positive, so the gradients will change in the same direction (all positive/all ngetive), that will cause a very slow convergence of the weights. But after centralization, the number of positive x and negtive x will approach the same, so the changing direction of the gradients will be variable, that can accelerate the convergence.

How to build a deeplearning project-a23bca47.png

x_train = load_data(img_dir)  #read images. x_train.shape = (5000,32,32,3)

# transform 2D images to 1D.   x_train.shape = (5000,3072)
x_train = np.reshape(x_train, (x_train.shape[0], -1))

#calculate mean of every pixel in all images. mean_image.shape = (1, 3072)
mean_image = np.mean(x_train, axis=0)
x_train -= mean_image  #subtract mean_image

Image Normalization

To make the distribution in accordance with normal distibution, we can subtract mean value and then divide standard diviation.

How to build a deeplearning project-97f2bb0c.png

If the samples have different scales, by using image normalization we can get the same input scale, for example, pictures from different sources may have different brightness, contrast, saturation and etc. some values may be too large that can restrain convergence. Nomarlization can accelerate it and improve its accuracy.

def image_normalization(x_train,x_test):
   x_train = x_train.astype('float32')
   x_test = x_test.astype('float32')
   print(x_train[0][0][0])  # output => [59. 62. 63.]
   # in cifar10 the mean and standard diviation are known
   mean = [125.307, 122.95, 113.865]
   std  = [62.9932, 62.0887, 66.7048]
   for i in range(3):
       x_train[:,:,:,i] = (x_train[:,:,:,i] - mean[i]) / std[i]
       x_test[:,:,:,i] = (x_test[:,:,:,i] - mean[i]) / std[i]
   print(x_train[0][0][0]) # output => [-1.0526057  -0.98166007 -0.76253873]
   return x_train, x_test

ImageDataGenerator Class in Keras keras document

By using keras.processing.image.ImageDataGenerator, we can handle all the cases above with differet parameters.

from keras.preprocessing.image import ImageDataGenerator
ImageDataGenerator(featurewise_center=False,
                   samplewise_center=False,
                   featurewise_std_normalization=False,
                   samplewise_std_normalization=False,
                   zca_whitening=False,
                   zca_epsilon=1e-6,
                   rotation_range=0.,
                   width_shift_range=0.,
                   height_shift_range=0.,
                   shear_range=0.,
                   zoom_range=0.,
                   channel_shift_range=0.,
                   fill_mode='nearest',
                   cval=0.,
                   horizontal_flip=False,
                   vertical_flip=False,
                   rescale=None,
                   preprocessing_function=None,
                   data_format=K.image_data_format())

datagen = ImageDataGenerator(horizontal_flip=True,
                             width_shift_range=0.125,
                             height_shift_range=0.125,
                             fill_mode='constant',cval=0.)

datagen.fit(x_train)

We can use datagen.flowto take data & label arrays, generates batches of augmented data.

flow(x=None,             #input data
     y=None,             #labels
     batch_size=32,      ##default: 32
     shuffle=True,       #default: True
     sample_weight=None,
     seed=None,
     save_to_dir=None,
     save_prefix='',
     save_format='png',
     subset=None)

It returns an Iterator yielding tuples of (x, y) where x is a numpy array of image data (in the case of a single image input) or a list of numpy arrays (in the case with additional inputs) and y is a numpy array of corresponding labels. If ‘sample_weight’ is not None, the yielded tuples are of the form (x, y, sample_weight). If y is None, only the numpy array x is returned.

Step Three: Net building

loss function

In segmentation tasks, the following loss functions are often used.

BCE (Binary Cross Entropy)
Dice
Focal loss
Focal loss + Dice loss
BCE + Dice loss
Weighted BCE loss
Weighted BCE Dice loss
Mean IOU

1. BCE (Binary Cross Entropy)

A special case of softmax cross entropy, when class number is two, BCE works.

How to build a deeplearning project-84751965.png

In keras, we can easily use keras.losses.binary_crossentropyto handle this.

2. Dice loss

In medical image segmentation, we often use dice as loss function, it is a statistic used to gauge the similarity of two samples.

How to build a deeplearning project-95dc4b12.png

```python def dice_coef(y_true, y_pred, smooth=1): intersection = K.sum(y_true * y_pred, axis=[1,2,3]) union = K.sum(y_true, axis=[1,2,3]) + K.sum(y_pred, axis=[1,2,3]) return K.mean( (2. * intersection + smooth) / (union + smooth), axis=0) def dice_coef_loss(y_true, y_pred): 1 - dice_coef(y_true, y_pred, smooth=1) ``` #### 3. Focal loss >_The Focal Loss is designed to address the one-stage object detection scenario in which there is an extreme imbalance between foreground and background classes during training (e.g., 1:1000)._

How to build a deeplearning project-9431a790.png

While α balances the importance of positive/negative examples, it does not differentiate between easy/hard examples. Intuitively, the modulating factor γ reduces the loss contribution from easy examples and extends the range in which an example receives low loss.

How to build a deeplearning project-b8f45e75.png

```python def focal_loss(y_true, y_pred): gamma=2 alpha=0.25 pt_1 = tf.where(tf.equal(y_true, 1), y_pred, tf.ones_like(y_pred)) pt_0 = tf.where(tf.equal(y_true, 0), y_pred, tf.zeros_like(y_pred))

pt_1 = K.clip(pt_1, 1e-3, .999)
pt_0 = K.clip(pt_0, 1e-3, .999)

return -K.sum(alpha * K.pow(1. - pt_1, gamma) * K.log(pt_1))
-K.sum((1-alpha) * K.pow( pt_0, gamma) * K.log(1. - pt_0))

#### 4. Focal loss + Dice loss
```python
def mixedLoss(y_ture,y_pred,alpha):
    return alpha * focal_loss(y_ture,y_pred) - K.log(dice_loss(y_ture,y_pred))

5. BCE + Dice loss

def bce_logdice_loss(y_true, y_pred):
    return binary_crossentropy(y_true, y_pred) - K.log(1. - dice_loss(y_true, y_pred))

6. Weighted BCE loss

def weighted_dice_loss(y_true, y_pred, weight):
    smooth = 1.
    w, m1, m2 = weight, y_true, y_pred
    intersection = (m1 * m2)
    score = (2. * K.sum(w * intersection) + smooth) / (K.sum(w * m1) +
              K.sum(w * m2) + smooth)
    loss = 1. - K.sum(score)
    return loss

7. Weighted BCE Dice loss

def weighted_bce_dice_loss(y_true, y_pred):
    y_true = K.cast(y_true, 'float32')
    y_pred = K.cast(y_pred, 'float32')
    # if we want to get same size of output, kernel size must be odd
    averaged_mask = K.pool2d(
            y_true, pool_size=(50, 50), strides=(1, 1), padding='same', pool_mode='avg')
    weight = K.ones_like(averaged_mask)
    w0 = K.sum(weight)
    weight = 5. * K.exp(-5. * K.abs(averaged_mask - 0.5))
    w1 = K.sum(weight)
    weight *= (w0 / w1)
    loss = weighted_bce_loss(y_true, y_pred, weight) + dice_loss(y_true, y_pred)
    return loss

8. Mean IOU

How to build a deeplearning project-4bec0c52.png

```python def mean_iou(y_true, y_pred): prec = [] for t in np.arange(0.5, 1.0, 0.05): y_pred_ = tf.to_int32(y_pred > t) score, up_opt = tf.metrics.mean_iou(y_true, y_pred_, 2) K.get_session().run(tf.local_variables_initializer()) with tf.control_dependencies([up_opt]): score = tf.identity(score) prec.append(score) return K.mean(K.stack(prec), axis=0) ```

Net Structure

Using RenNet32 as an example.

def res_32(input_shape):u
    # input: 32x32x3 output: 32x32x16
    img_input = Input(input_shape)
    x = Conv2D(16, (3, 3), strides=(1, 1), padding='same',
               kernel_regularizer=keras.regularizers.l2(weight_decay),
               kernel_initializer="he_normal")(img_input)

    # res_block1 to res_block5 input: 32x32x16 output: 32x32x16
    for _ in range(5):
        b0 = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
        a0 = Activation('relu')(b0)
        conv_1 = Conv2D(16, kernel_size=(3, 3), strides=(1, 1), padding='same',
                        kernel_regularizer=regularizers.l2(weight_decay),
                        kernel_initializer="he_normal")(a0)
        b1 = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1)
        a1 = Activation('relu')(b1)
        conv_2 = Conv2D(16, kernel_size=(3, 3), strides=(1, 1), padding='same',
                        kernel_regularizer=regularizers.l2(weight_decay),
                        kernel_initializer="he_normal")(a1)

        x = add([x, conv_2])

    # res_block6 input: 32x32x16 output: 16x16x32
    b0 = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
    a0 = Activation('relu')(b0)
    conv_1 = Conv2D(32, kernel_size=(3, 3), strides=(2, 2), padding='same',
                    kernel_regularizer=regularizers.l2(weight_decay),
                    kernel_initializer="he_normal")(a0)
    b1 = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1)
    a1 = Activation('relu')(b1)
    conv_2 = Conv2D(32, kernel_size=(3, 3), strides=(1, 1), padding='same',
                    kernel_regularizer=regularizers.l2(weight_decay),
                    kernel_initializer="he_normal")(a1)

    projection = Conv2D(32, kernel_size=(1, 1), strides=(2, 2), padding='same',
                        kernel_regularizer=regularizers.l2(weight_decay),
                        kernel_initializer="he_normal")(a0)
    x = add([projection, conv_2])

    # res_block7 to res_block10 input: 16x16x32 output: 16x16x32
    for _ in range(1, 5):
        b0 = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
        a0 = Activation('relu')(b0)
        conv_1 = Conv2D(32, kernel_size=(3, 3), strides=(1, 1), padding='same',
                        kernel_regularizer=regularizers.l2(weight_decay),
                        kernel_initializer="he_normal")(a0)
        b1 = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1)
        a1 = Activation('relu')(b1)
        conv_2 = Conv2D(32, kernel_size=(3, 3), strides=(1, 1), padding='same',
                        kernel_regularizer=regularizers.l2(weight_decay),
                        kernel_initializer="he_normal")(a1)
        x = add([x, conv_2])

    # res_block11 input: 16x16x32 output: 8x8x64
    b0 = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
    a0 = Activation('relu')(b0)
    conv_1 = Conv2D(64, kernel_size=(3, 3), strides=(2, 2), padding='same',
                    kernel_regularizer=regularizers.l2(weight_decay),
                    kernel_initializer="he_normal")(a0)
    b1 = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1)
    a1 = Activation('relu')(b1)
    conv_2 = Conv2D(64, kernel_size=(3, 3), strides=(1, 1), padding='same',
                    kernel_regularizer=regularizers.l2(weight_decay),
                    kernel_initializer="he_normal")(a1)

    projection = Conv2D(64, kernel_size=(1, 1), strides=(2, 2), padding='same',
                        kernel_regularizer=regularizers.l2(weight_decay),
                        kernel_initializer="he_normal")(a0)
    x = add([projection, conv_2])

    # res_block12 to res_block15 input: 8x8x64 output: 8x8x64
    for _ in range(1, 5):
        b0 = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
        a0 = Activation('relu')(b0)
        conv_1 = Conv2D(64, kernel_size=(3, 3), strides=(1, 1), padding='same',
                        kernel_regularizer=regularizers.l2(weight_decay),
                        kernel_initializer="he_normal")(a0)
        b1 = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1)
        a1 = Activation('relu')(b1)
        conv_2 = Conv2D(64, kernel_size=(3, 3), strides=(1, 1), padding='same',
                        kernel_regularizer=regularizers.l2(weight_decay),
                        kernel_initializer="he_normal")(a1)
        x = add([x, conv_2])

    # Dense input: 8x8x64 output: 64
    x = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
    x = Activation('relu')(x)
    x = GlobalAveragePooling2D()(x)

    # input: 64 output: 10
    x = Dense(10, activation='softmax', kernel_initializer="he_normal",
              kernel_regularizer=regularizers.l2(weight_decay))(x)
    model = Model(input=img_input, output=x)
    # set optimizer
    sgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)
    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
    return model

Step Four: Model Fitting and Saving

We can use keras.model.fitor keras.model.fit_generatorto fit our model in keras.

fit(x=None,
    y=None,
    batch_size=None,
    epochs=1,
    verbose=1,
    callbacks=None,
    validation_split=0.0,
    validation_data=None,
    shuffle=True,
    class_weight=None,
    sample_weight=None,
    initial_epoch=0,
    steps_per_epoch=None,
    validation_steps=None,
    validation_freq=1)

fit_generator(generator,
              steps_per_epoch=None,
              epochs=1,
              verbose=1,
              callbacks=None,
              validation_data=None,
              validation_steps=None,
              validation_freq=1,
              class_weight=None,
              max_queue_size=10,
              workers=1,
              use_multiprocessing=False,
              shuffle=True,
              initial_epoch=0)

Here is the model fitting and saving part.

resnet = res_32((32, 32, 3))

# fits the model on batches with real-time data augmentation
resnet.fit_generator(datagen.flow(x_train, y_train,batch_size=batch_size),
                     steps_per_epoch=iterations,
                     epochs=epochs,
                     callbacks=cbks,
                     validation_data=(x_test, y_test))

# here's a more "manual" example with keras.model.fit
'''
for e in range(epochs):
    print('Epoch', e)
    batches = 0
    for x_batch, y_batch in datagen.flow(x_train, y_train, batch_size=32):
        resnet.fit(x_batch, y_batch)
        batches += 1
        if batches >= len(x_train) / 32:
            # we need to break the loop by hand because
            # the generator loops indefinitely
            break
'''
# save model
resnet.save('resnet_32.h5')
print("saving done")