How to Build a Deep Learning Project——With Keras
Step One: Data reading
For cifar10, this step is very easy, Keras has already packaged it and split it into training data and testing data.
from keras.datasets import cifar10, cifar100
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
For our DIY datasets, we should read and save the pictures one by one.
def get_data(dir)
'''
Return two lists
'''
images = []
labels = []
dir = 'D:/'
images_files = os.listdir(dir+'/images')
labels_files = os.listdir(dir+'/labels')
for x in images_files:
images.append(x)
for y in labels_files:
labels.append(y)
Step Two: Data processing
One-hot Encoding
In many classification tasks, we need to convert the label images into one-hot encoding format, we can use keras.utils.to_categorical
.
y_train = keras.utils.to_categorical(y_train, 10) # number of classes
y_test = keras.utils.to_categorical(y_test,10)# number of classes
Data Augmentation
There are six commen measures:
- Image Centralization
- Image Normalization
- Image Shifting
- Image Scaling
- Image Zooming
- Image Flipping
Image Centralization
Also called zero-mean.
In RGB format, the value in every pixel is positive, so the gradients will change in the same direction (all positive/all ngetive), that will cause a very slow convergence of the weights. But after centralization, the number of positive x and negtive x will approach the same, so the changing direction of the gradients will be variable, that can accelerate the convergence.

x_train = load_data(img_dir) #read images. x_train.shape = (5000,32,32,3)
# transform 2D images to 1D. x_train.shape = (5000,3072)
x_train = np.reshape(x_train, (x_train.shape[0], -1))
#calculate mean of every pixel in all images. mean_image.shape = (1, 3072)
mean_image = np.mean(x_train, axis=0)
x_train -= mean_image #subtract mean_image
Image Normalization
To make the distribution in accordance with normal distibution, we can subtract mean value and then divide standard diviation.

def image_normalization(x_train,x_test):
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
print(x_train[0][0][0]) # output => [59. 62. 63.]
# in cifar10 the mean and standard diviation are known
mean = [125.307, 122.95, 113.865]
std = [62.9932, 62.0887, 66.7048]
for i in range(3):
x_train[:,:,:,i] = (x_train[:,:,:,i] - mean[i]) / std[i]
x_test[:,:,:,i] = (x_test[:,:,:,i] - mean[i]) / std[i]
print(x_train[0][0][0]) # output => [-1.0526057 -0.98166007 -0.76253873]
return x_train, x_test
ImageDataGenerator Class in Keras keras document
By using keras.processing.image.ImageDataGenerator
, we can handle all the cases above with differet parameters.
from keras.preprocessing.image import ImageDataGenerator
ImageDataGenerator(featurewise_center=False,
samplewise_center=False,
featurewise_std_normalization=False,
samplewise_std_normalization=False,
zca_whitening=False,
zca_epsilon=1e-6,
rotation_range=0.,
width_shift_range=0.,
height_shift_range=0.,
shear_range=0.,
zoom_range=0.,
channel_shift_range=0.,
fill_mode='nearest',
cval=0.,
horizontal_flip=False,
vertical_flip=False,
rescale=None,
preprocessing_function=None,
data_format=K.image_data_format())
datagen = ImageDataGenerator(horizontal_flip=True,
width_shift_range=0.125,
height_shift_range=0.125,
fill_mode='constant',cval=0.)
datagen.fit(x_train)
We can use datagen.flow
to take data & label arrays, generates batches of augmented data.
flow(x=None, #input data
y=None, #labels
batch_size=32, ##default: 32
shuffle=True, #default: True
sample_weight=None,
seed=None,
save_to_dir=None,
save_prefix='',
save_format='png',
subset=None)
It returns an Iterator yielding tuples of (x, y) where x is a numpy array of image data (in the case of a single image input) or a list of numpy arrays (in the case with additional inputs) and y is a numpy array of corresponding labels. If ‘sample_weight’ is not None, the yielded tuples are of the form (x, y, sample_weight). If y is None, only the numpy array x is returned.
Step Three: Net building
loss function
In segmentation tasks, the following loss functions are often used.
- BCE (Binary Cross Entropy)
- Dice
- Focal loss
- Focal loss + Dice loss
- BCE + Dice loss
- Weighted BCE loss
- Weighted BCE Dice loss
- Mean IOU
1. BCE (Binary Cross Entropy)
A special case of softmax cross entropy, when class number is two, BCE works.

In keras, we can easily use keras.losses.binary_crossentropy
to handle this.
2. Dice loss
In medical image segmentation, we often use dice as loss function, it is a statistic used to gauge the similarity of two samples.


While α balances the importance of positive/negative examples, it does not differentiate between easy/hard examples. Intuitively, the modulating factor γ reduces the loss contribution from easy examples and extends the range in which an example receives low loss.

pt_1 = K.clip(pt_1, 1e-3, .999)
pt_0 = K.clip(pt_0, 1e-3, .999)
return -K.sum(alpha * K.pow(1. - pt_1, gamma) * K.log(pt_1))
-K.sum((1-alpha) * K.pow( pt_0, gamma) * K.log(1. - pt_0))
#### 4. Focal loss + Dice loss
```python
def mixedLoss(y_ture,y_pred,alpha):
return alpha * focal_loss(y_ture,y_pred) - K.log(dice_loss(y_ture,y_pred))
5. BCE + Dice loss
def bce_logdice_loss(y_true, y_pred):
return binary_crossentropy(y_true, y_pred) - K.log(1. - dice_loss(y_true, y_pred))
6. Weighted BCE loss
def weighted_dice_loss(y_true, y_pred, weight):
smooth = 1.
w, m1, m2 = weight, y_true, y_pred
intersection = (m1 * m2)
score = (2. * K.sum(w * intersection) + smooth) / (K.sum(w * m1) +
K.sum(w * m2) + smooth)
loss = 1. - K.sum(score)
return loss
7. Weighted BCE Dice loss
def weighted_bce_dice_loss(y_true, y_pred):
y_true = K.cast(y_true, 'float32')
y_pred = K.cast(y_pred, 'float32')
# if we want to get same size of output, kernel size must be odd
averaged_mask = K.pool2d(
y_true, pool_size=(50, 50), strides=(1, 1), padding='same', pool_mode='avg')
weight = K.ones_like(averaged_mask)
w0 = K.sum(weight)
weight = 5. * K.exp(-5. * K.abs(averaged_mask - 0.5))
w1 = K.sum(weight)
weight *= (w0 / w1)
loss = weighted_bce_loss(y_true, y_pred, weight) + dice_loss(y_true, y_pred)
return loss
8. Mean IOU

Net Structure
Using RenNet32 as an example.
def res_32(input_shape):u
# input: 32x32x3 output: 32x32x16
img_input = Input(input_shape)
x = Conv2D(16, (3, 3), strides=(1, 1), padding='same',
kernel_regularizer=keras.regularizers.l2(weight_decay),
kernel_initializer="he_normal")(img_input)
# res_block1 to res_block5 input: 32x32x16 output: 32x32x16
for _ in range(5):
b0 = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
a0 = Activation('relu')(b0)
conv_1 = Conv2D(16, kernel_size=(3, 3), strides=(1, 1), padding='same',
kernel_regularizer=regularizers.l2(weight_decay),
kernel_initializer="he_normal")(a0)
b1 = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1)
a1 = Activation('relu')(b1)
conv_2 = Conv2D(16, kernel_size=(3, 3), strides=(1, 1), padding='same',
kernel_regularizer=regularizers.l2(weight_decay),
kernel_initializer="he_normal")(a1)
x = add([x, conv_2])
# res_block6 input: 32x32x16 output: 16x16x32
b0 = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
a0 = Activation('relu')(b0)
conv_1 = Conv2D(32, kernel_size=(3, 3), strides=(2, 2), padding='same',
kernel_regularizer=regularizers.l2(weight_decay),
kernel_initializer="he_normal")(a0)
b1 = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1)
a1 = Activation('relu')(b1)
conv_2 = Conv2D(32, kernel_size=(3, 3), strides=(1, 1), padding='same',
kernel_regularizer=regularizers.l2(weight_decay),
kernel_initializer="he_normal")(a1)
projection = Conv2D(32, kernel_size=(1, 1), strides=(2, 2), padding='same',
kernel_regularizer=regularizers.l2(weight_decay),
kernel_initializer="he_normal")(a0)
x = add([projection, conv_2])
# res_block7 to res_block10 input: 16x16x32 output: 16x16x32
for _ in range(1, 5):
b0 = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
a0 = Activation('relu')(b0)
conv_1 = Conv2D(32, kernel_size=(3, 3), strides=(1, 1), padding='same',
kernel_regularizer=regularizers.l2(weight_decay),
kernel_initializer="he_normal")(a0)
b1 = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1)
a1 = Activation('relu')(b1)
conv_2 = Conv2D(32, kernel_size=(3, 3), strides=(1, 1), padding='same',
kernel_regularizer=regularizers.l2(weight_decay),
kernel_initializer="he_normal")(a1)
x = add([x, conv_2])
# res_block11 input: 16x16x32 output: 8x8x64
b0 = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
a0 = Activation('relu')(b0)
conv_1 = Conv2D(64, kernel_size=(3, 3), strides=(2, 2), padding='same',
kernel_regularizer=regularizers.l2(weight_decay),
kernel_initializer="he_normal")(a0)
b1 = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1)
a1 = Activation('relu')(b1)
conv_2 = Conv2D(64, kernel_size=(3, 3), strides=(1, 1), padding='same',
kernel_regularizer=regularizers.l2(weight_decay),
kernel_initializer="he_normal")(a1)
projection = Conv2D(64, kernel_size=(1, 1), strides=(2, 2), padding='same',
kernel_regularizer=regularizers.l2(weight_decay),
kernel_initializer="he_normal")(a0)
x = add([projection, conv_2])
# res_block12 to res_block15 input: 8x8x64 output: 8x8x64
for _ in range(1, 5):
b0 = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
a0 = Activation('relu')(b0)
conv_1 = Conv2D(64, kernel_size=(3, 3), strides=(1, 1), padding='same',
kernel_regularizer=regularizers.l2(weight_decay),
kernel_initializer="he_normal")(a0)
b1 = BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1)
a1 = Activation('relu')(b1)
conv_2 = Conv2D(64, kernel_size=(3, 3), strides=(1, 1), padding='same',
kernel_regularizer=regularizers.l2(weight_decay),
kernel_initializer="he_normal")(a1)
x = add([x, conv_2])
# Dense input: 8x8x64 output: 64
x = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)
x = Activation('relu')(x)
x = GlobalAveragePooling2D()(x)
# input: 64 output: 10
x = Dense(10, activation='softmax', kernel_initializer="he_normal",
kernel_regularizer=regularizers.l2(weight_decay))(x)
model = Model(input=img_input, output=x)
# set optimizer
sgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
return model
Step Four: Model Fitting and Saving
We can use keras.model.fit
or keras.model.fit_generator
to fit our model in keras.
fit(x=None,
y=None,
batch_size=None,
epochs=1,
verbose=1,
callbacks=None,
validation_split=0.0,
validation_data=None,
shuffle=True,
class_weight=None,
sample_weight=None,
initial_epoch=0,
steps_per_epoch=None,
validation_steps=None,
validation_freq=1)
fit_generator(generator,
steps_per_epoch=None,
epochs=1,
verbose=1,
callbacks=None,
validation_data=None,
validation_steps=None,
validation_freq=1,
class_weight=None,
max_queue_size=10,
workers=1,
use_multiprocessing=False,
shuffle=True,
initial_epoch=0)
Here is the model fitting and saving part.
resnet = res_32((32, 32, 3))
# fits the model on batches with real-time data augmentation
resnet.fit_generator(datagen.flow(x_train, y_train,batch_size=batch_size),
steps_per_epoch=iterations,
epochs=epochs,
callbacks=cbks,
validation_data=(x_test, y_test))
# here's a more "manual" example with keras.model.fit
'''
for e in range(epochs):
print('Epoch', e)
batches = 0
for x_batch, y_batch in datagen.flow(x_train, y_train, batch_size=32):
resnet.fit(x_batch, y_batch)
batches += 1
if batches >= len(x_train) / 32:
# we need to break the loop by hand because
# the generator loops indefinitely
break
'''
# save model
resnet.save('resnet_32.h5')
print("saving done")