Char_Level_CNN_Model
模型介绍
模型参数总览
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
sent_input (InputLayer) (None, 1014) 0
_________________________________________________________________
embedding_14 (Embedding) (None, 1014, 128) 8960
_________________________________________________________________
conv1d_30 (Conv1D) (None, 1008, 256) 229632
_________________________________________________________________
activation_20 (Activation) (None, 1008, 256) 0
_________________________________________________________________
max_pooling1d_13 (MaxPooling (None, 336, 256) 0
_________________________________________________________________
conv1d_31 (Conv1D) (None, 330, 256) 459008
_________________________________________________________________
activation_21 (Activation) (None, 330, 256) 0
_________________________________________________________________
max_pooling1d_14 (MaxPooling (None, 110, 256) 0
_________________________________________________________________
conv1d_32 (Conv1D) (None, 108, 256) 196864
_________________________________________________________________
activation_22 (Activation) (None, 108, 256) 0
_________________________________________________________________
conv1d_33 (Conv1D) (None, 106, 256) 196864
_________________________________________________________________
activation_23 (Activation) (None, 106, 256) 0
_________________________________________________________________
conv1d_34 (Conv1D) (None, 104, 256) 196864
_________________________________________________________________
activation_24 (Activation) (None, 104, 256) 0
_________________________________________________________________
conv1d_35 (Conv1D) (None, 102, 256) 196864
_________________________________________________________________
activation_25 (Activation) (None, 102, 256) 0
_________________________________________________________________
max_pooling1d_15 (MaxPooling (None, 34, 256) 0
_________________________________________________________________
flatten_5 (Flatten) (None, 8704) 0
_________________________________________________________________
dense_13 (Dense) (None, 1024) 8913920
_________________________________________________________________
dropout_9 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_14 (Dense) (None, 1024) 1049600
_________________________________________________________________
dropout_10 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_15 (Dense) (None, 4) 4100
=================================================================
Total params: 11,452,676
Trainable params: 11,452,676
Non-trainable params: 0
_________________________________________________________________
可以看出该模型参数量较大,因此对数据集的要求也很庞大,CPU跑如此大的数据集应该很不方便,但如果数据集太小,那么就会很容易overfit这些参数。
逐层介绍
Input Layer
input_size = 1014
,表示输入层每次输入的单词量为1014,输入层无参数, 多退少补,每行就是1014。
# Input
inputs = Input(shape=(input_size,), name='sent_input', dtype='int64') # shape=(?, 1014)
Embedding Layer
需要识别的单词类别是69个, 'abcdefghijklmnopqrstuvwxyz0123456789-,;.!?:\'"/\\|_@#$%^&*~+-=<>()[]{}'
, 其他无法识别的得用一个<\UNK>
去代替,所以识别单词数是70个, 而embedding_size = 128
, 所以这一层的参数数是
128
∗
70
=
8960
128*70 = 8960
128∗70=8960。
# Embedding layer
conv = Embedding(alphabet_size+1, embedding_size, input_length=input_size)(inputs)
注意这里得用conv层表示,因为后面5层用了循环,然后如果这里用embedding,而后面用conv, 循环每次调用的都是embedding层,而不是几个顺次的conv层,从而报错。
Convolution layers 1D
for filter_num, filter_size, pooling_size in conv_layers:
conv = Conv1D(filter_num, filter_size)(conv)
conv = Activation('relu')(conv)
if pooling_size != -1:
conv = MaxPooling1D(pool_size=pooling_size)(conv)
这六层的参数,每一层的三个参数分别代表filter_num, filter_size, pooling_size
,其中若pooling_size == -1
, 意味着这一层不做MaxPooling。
[
[
256
,
7
,
3
]
,
[
256
,
7
,
3
]
,
[
256
,
3
,
−
1
]
,
[
256
,
3
,
−
1
]
,
[
256
,
3
,
−
1
]
,
[
256
,
3
,
3
]
]
,
[[256, 7, 3], [256, 7, 3], [256, 3, -1], [256, 3, -1], [256, 3, -1], [256, 3, 3]],
[[256,7,3],[256,7,3],[256,3,−1],[256,3,−1],[256,3,−1],[256,3,3]],
逐层计算参数量:
- Conv1D-1: 输入size ( N o n e , 1014 , 128 ) (None, 1014, 128) (None,1014,128),256 个 filter, 每个filter(1D)的size是 ( 128 , 7 ) (128, 7) (128,7),从而参数量为 ( 128 ∗ 7 + 1 ) ∗ 256 = 229632 (128*7+1)*256 = 229632 (128∗7+1)∗256=229632,输出size变为 ( N o n e , 1014 − 7 + 1 , 256 ) = ( N o n e , 1008 , 256 ) (None, 1014-7+1, 256) = (None, 1008, 256) (None,1014−7+1,256)=(None,1008,256)
- Activation-1: ’ReLu’函数,没有参数,不变输入size
- Maxpooling-1: 输入size ( N o n e , 1008 , 256 ) (None, 1008, 256) (None,1008,256),size = 3,无参数,输出size为 ( N o n e , 336 , 256 ) (None, 336, 256) (None,336,256)
- Conv1D-2:输入size ( N o n e , 336 , 256 ) (None, 336, 256) (None,336,256),256个filter,每个filter(1D)的size是 ( 256 , 7 ) (256, 7) (256,7),从而参数量为 ( 256 ∗ 7 + 1 ) ∗ 256 = 459008 (256*7+1)*256 = 459008 (256∗7+1)∗256=459008,输出size变为 ( N o n e , 336 − 7 + 1 , 256 ) = ( N o n e , 330 , 256 ) (None, 336-7+1, 256) = (None, 330, 256) (None,336−7+1,256)=(None,330,256)
- Activation-2: ’ReLu’函数,没有参数,不变输入size
- Maxpooling-2: 输入size ( N o n e , 330 , 256 ) (None, 330, 256) (None,330,256),size = 3,无参数,输出size为 ( N o n e , 110 , 256 ) (None, 110, 256) (None,110,256)
- Conv1D-3:输入size ( N o n e , 110 , 256 ) (None, 110, 256) (None,110,256),256个filter,每个filter(1D)的size是 ( 256 , 3 ) (256, 3) (256,3),从而参数量为 ( 256 ∗ 3 + 1 ) ∗ 256 = 196864 (256*3+1)*256 = 196864 (256∗3+1)∗256=196864,输出size变为 ( N o n e , 110 − 3 + 1 , 256 ) = ( N o n e , 108 , 256 ) (None, 110-3+1, 256) = (None, 108, 256) (None,110−3+1,256)=(None,108,256)
- Activation-3: ’ReLu’函数,没有参数,不变输入size
- Conv1D-4:输入size ( N o n e , 108 , 256 ) (None, 108, 256) (None,108,256),256个filter,每个filter(1D)的size是 ( 256 , 3 ) (256, 3) (256,3),从而参数量为 ( 256 ∗ 3 + 1 ) ∗ 256 = 196864 (256*3+1)*256 = 196864 (256∗3+1)∗256=196864,输出size变为 ( N o n e , 108 − 3 + 1 , 256 ) = ( N o n e , 106 , 256 ) (None, 108-3+1, 256) = (None, 106, 256) (None,108−3+1,256)=(None,106,256)
- Activation-4: ’ReLu’函数,没有参数,不变输入size
- Conv1D-5:输入size ( N o n e , 106 , 256 ) (None, 106, 256) (None,106,256),256个filter,每个filter(1D)的size是 ( 256 , 3 ) (256, 3) (256,3),从而参数量为 ( 256 ∗ 3 + 1 ) ∗ 256 = 196864 (256*3+1)*256 = 196864 (256∗3+1)∗256=196864,输出size变为 ( N o n e , 106 − 3 + 1 , 256 ) = ( N o n e , 104 , 256 ) (None, 106-3+1, 256) = (None, 104, 256) (None,106−3+1,256)=(None,104,256)
- Activation-5: ’ReLu’函数,没有参数,不变输入size
- Conv1D-6:输入size ( N o n e , 104 , 256 ) (None, 104, 256) (None,104,256),256个filter,每个filter(1D)的size是 ( 256 , 3 ) (256, 3) (256,3),从而参数量为 ( 256 ∗ 3 + 1 ) ∗ 256 = 196864 (256*3+1)*256 = 196864 (256∗3+1)∗256=196864,输出size变为 ( N o n e , 104 − 3 + 1 , 256 ) = ( N o n e , 102 , 256 ) (None, 104-3+1, 256) = (None, 102, 256) (None,104−3+1,256)=(None,102,256)
- Activation-6: ’ReLu’函数,没有参数,不变输入size
- Maxpooling-6: 输入size ( N o n e , 102 , 256 ) (None, 102, 256) (None,102,256),size = 3,无参数,输出size为 ( N o n e , 34 , 256 ) (None, 34, 256) (None,34,256)
Flatten Layer
一层Flatten layer 便于后面做FC layer。无参数 。
x = Flatten()(conv) # (None, 256*34 = 8704)
Full Connected(FC) Layers and Output Layer
有三层FC层,前两层的输出size都是
(
N
o
n
e
,
1024
)
(None, 1024)
(None,1024), 最后一层输出size与label的one-hot向量的size一致,这样才能对该这段alphabet的输入进行出自哪个label的概率判断。(如果是推测概率需要用softmax函数,如果是预测则可用sigmoid函数)。每个connected层的每个neuron有dropout_p = 0.1
的概率在该次训练中被deactivate。
# Full Connected Layer
for dense_size in fully_connected_layers:
x = Dense(dense_size, activation='relu')(x) # dense_size == 1024
x = Dropout(dropout_p)(x)
# Output Layer
predictions = Dense(num_of_classes, activation='softmax')(x)
参数数量预测:
- FC Layer1:
input_size
= 8704,output_size
= 1024, 所以参数量为 8704 ∗ 1024 + 1024 = 8913920 8704*1024+1024 = 8913920 8704∗1024+1024=8913920 - FC Layer2:
input_size
= 1024,output_size
= 1024, 所以参数量为 1024 ∗ 1024 + 1024 = 1049600 1024*1024+1024 = 1049600 1024∗1024+1024=1049600 - Dropout1 and Dropout2: 没有参数
- Output Layer:
input_size
= 1024,output_size
= 4, 所以参数量为 1024 ∗ 4 + 4 = 4100 1024*4+4 = 4100 1024∗4+4=4100
Build Model
loss = 'categorical_crossentropy',
optimizer='adam'
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer=optimizer, loss=loss) # Adam, categorical_crossentropy
Train Model
# Create callbacks
from tensorflow.keras.callbacks import TensorBoard
tensorboard = TensorBoard(log_dir='./logs', histogram_freq=checkpoint_every, batch_size=batch_size,
write_graph=False, write_grads=True, write_images=False,
embeddings_freq=checkpoint_every,
embeddings_layer_names=None)
# Training
model.fit(training_inputs, training_labels,
validation_data=(validation_inputs, validation_labels),
batch_size=batch_size,
epochs=epochs,
verbose=1,
callbacks=[tensorboard])