Char_Level_CNN_Model 模型介绍

最新推荐文章于 2024-10-05 03:17:53 发布

原创最新推荐文章于 2024-10-05 03:17:53 发布 · 619 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#tensorflow #神经网络 #python #机器学习 #深度学习

DL 专栏收录该内容

9 篇文章

订阅专栏

本文深入剖析了一个基于字符级别的卷积神经网络（Char_Level_CNN_Model）模型，详细介绍了模型的各层结构，包括InputLayer、EmbeddingLayer、多个1D卷积层及全连接层。模型参数总数高达11,452,676，对数据集大小有较高要求，防止过拟合。文章还探讨了训练模型的构建与训练过程，使用了TensorBoard回调进行训练过程监控。

Char_Level_CNN_Model

模型介绍

模型介绍

模型参数总览

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
sent_input (InputLayer)      (None, 1014)              0         
_________________________________________________________________
embedding_14 (Embedding)     (None, 1014, 128)         8960      
_________________________________________________________________
conv1d_30 (Conv1D)           (None, 1008, 256)         229632    
_________________________________________________________________
activation_20 (Activation)   (None, 1008, 256)         0         
_________________________________________________________________
max_pooling1d_13 (MaxPooling (None, 336, 256)          0         
_________________________________________________________________
conv1d_31 (Conv1D)           (None, 330, 256)          459008    
_________________________________________________________________
activation_21 (Activation)   (None, 330, 256)          0         
_________________________________________________________________
max_pooling1d_14 (MaxPooling (None, 110, 256)          0         
_________________________________________________________________
conv1d_32 (Conv1D)           (None, 108, 256)          196864    
_________________________________________________________________
activation_22 (Activation)   (None, 108, 256)          0         
_________________________________________________________________
conv1d_33 (Conv1D)           (None, 106, 256)          196864    
_________________________________________________________________
activation_23 (Activation)   (None, 106, 256)          0         
_________________________________________________________________
conv1d_34 (Conv1D)           (None, 104, 256)          196864    
_________________________________________________________________
activation_24 (Activation)   (None, 104, 256)          0         
_________________________________________________________________
conv1d_35 (Conv1D)           (None, 102, 256)          196864    
_________________________________________________________________
activation_25 (Activation)   (None, 102, 256)          0         
_________________________________________________________________
max_pooling1d_15 (MaxPooling (None, 34, 256)           0         
_________________________________________________________________
flatten_5 (Flatten)          (None, 8704)              0         
_________________________________________________________________
dense_13 (Dense)             (None, 1024)              8913920   
_________________________________________________________________
dropout_9 (Dropout)          (None, 1024)              0         
_________________________________________________________________
dense_14 (Dense)             (None, 1024)              1049600   
_________________________________________________________________
dropout_10 (Dropout)         (None, 1024)              0         
_________________________________________________________________
dense_15 (Dense)             (None, 4)                 4100      
=================================================================
Total params: 11,452,676
Trainable params: 11,452,676
Non-trainable params: 0
_________________________________________________________________

可以看出该模型参数量较大，因此对数据集的要求也很庞大，CPU跑如此大的数据集应该很不方便，但如果数据集太小，那么就会很容易overfit这些参数。

逐层介绍

Input Layer

input_size = 1014，表示输入层每次输入的单词量为1014，输入层无参数，多退少补，每行就是1014。

# Input 
inputs = Input(shape=(input_size,), name='sent_input', dtype='int64')  # shape=(?, 1014)

Embedding Layer

需要识别的单词类别是69个, 'abcdefghijklmnopqrstuvwxyz0123456789-,;.!?:\'"/\\|_@#$%^&*~+-=<>()[]{}'，其他无法识别的得用一个<\UNK>去代替，所以识别单词数是70个，而embedding_size = 128, 所以这一层的参数数是 $128 * 70 = 8960$ 。

# Embedding layer
conv = Embedding(alphabet_size+1, embedding_size, input_length=input_size)(inputs)

注意这里得用conv层表示，因为后面5层用了循环，然后如果这里用embedding，而后面用conv, 循环每次调用的都是embedding层，而不是几个顺次的conv层，从而报错。

Convolution layers 1D

for filter_num, filter_size, pooling_size in conv_layers:
    conv = Conv1D(filter_num, filter_size)(conv) 
    conv = Activation('relu')(conv)
    if pooling_size != -1:
        conv = MaxPooling1D(pool_size=pooling_size)(conv)

这六层的参数，每一层的三个参数分别代表filter_num, filter_size, pooling_size，其中若pooling_size == -1，意味着这一层不做MaxPooling。
$[[256, 7, 3], [256, 7, 3], [256, 3, - 1], [256, 3, - 1], [256, 3, - 1], [256, 3, 3]],$

逐层计算参数量：

Conv1D-1: 输入size $(N o n e, 1014, 128)$ ，256 个 filter，每个filter(1D)的size是 $(128, 7)$ ，从而参数量为 $(128 * 7 + 1) * 256 = 229632$ ，输出size变为 $(N o n e, 1014 - 7 + 1, 256) = (N o n e, 1008, 256)$
Activation-1: ’ReLu’函数，没有参数，不变输入size
Maxpooling-1: 输入size $(N o n e, 1008, 256)$ ，size = 3，无参数，输出size为 $(N o n e, 336, 256)$
Conv1D-2:输入size $(N o n e, 336, 256)$ ，256个filter，每个filter(1D)的size是 $(256, 7)$ ，从而参数量为 $(256 * 7 + 1) * 256 = 459008$ ，输出size变为 $(N o n e, 336 - 7 + 1, 256) = (N o n e, 330, 256)$
Activation-2: ’ReLu’函数，没有参数，不变输入size
Maxpooling-2: 输入size $(N o n e, 330, 256)$ ，size = 3，无参数，输出size为 $(N o n e, 110, 256)$
Conv1D-3:输入size $(N o n e, 110, 256)$ ，256个filter，每个filter(1D)的size是 $(256, 3)$ ，从而参数量为 $(256 * 3 + 1) * 256 = 196864$ ，输出size变为 $(N o n e, 110 - 3 + 1, 256) = (N o n e, 108, 256)$
Activation-3: ’ReLu’函数，没有参数，不变输入size
Conv1D-4:输入size $(N o n e, 108, 256)$ ，256个filter，每个filter(1D)的size是 $(256, 3)$ ，从而参数量为 $(256 * 3 + 1) * 256 = 196864$ ，输出size变为 $(N o n e, 108 - 3 + 1, 256) = (N o n e, 106, 256)$
Activation-4: ’ReLu’函数，没有参数，不变输入size
Conv1D-5:输入size $(N o n e, 106, 256)$ ，256个filter，每个filter(1D)的size是 $(256, 3)$ ，从而参数量为 $(256 * 3 + 1) * 256 = 196864$ ，输出size变为 $(N o n e, 106 - 3 + 1, 256) = (N o n e, 104, 256)$
Activation-5: ’ReLu’函数，没有参数，不变输入size
Conv1D-6:输入size $(N o n e, 104, 256)$ ，256个filter，每个filter(1D)的size是 $(256, 3)$ ，从而参数量为 $(256 * 3 + 1) * 256 = 196864$ ，输出size变为 $(N o n e, 104 - 3 + 1, 256) = (N o n e, 102, 256)$
Activation-6: ’ReLu’函数，没有参数，不变输入size
Maxpooling-6: 输入size $(N o n e, 102, 256)$ ，size = 3，无参数，输出size为 $(N o n e, 34, 256)$

Flatten Layer

一层Flatten layer 便于后面做FC layer。无参数。

x = Flatten()(conv) # (None, 256*34 = 8704)

Full Connected(FC) Layers and Output Layer

有三层FC层，前两层的输出size都是 $(N o n e, 1024)$ ，最后一层输出size与label的one-hot向量的size一致，这样才能对该这段alphabet的输入进行出自哪个label的概率判断。(如果是推测概率需要用softmax函数，如果是预测则可用sigmoid函数）。每个connected层的每个neuron有dropout_p = 0.1的概率在该次训练中被deactivate。

# Full Connected Layer
for dense_size in fully_connected_layers:
    x = Dense(dense_size, activation='relu')(x) # dense_size == 1024
    x = Dropout(dropout_p)(x)
# Output Layer
predictions = Dense(num_of_classes, activation='softmax')(x)

参数数量预测：

FC Layer1: input_size = 8704, output_size = 1024，所以参数量为 $8704 * 1024 + 1024 = 8913920$
FC Layer2: input_size = 1024, output_size = 1024，所以参数量为 $1024 * 1024 + 1024 = 1049600$
Dropout1 and Dropout2: 没有参数
Output Layer: input_size = 1024, output_size = 4，所以参数量为 $1024 * 4 + 4 = 4100$

Build Model

loss = 'categorical_crossentropy',
optimizer='adam'
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer=optimizer, loss=loss) # Adam, categorical_crossentropy

Train Model

# Create callbacks
from tensorflow.keras.callbacks import TensorBoard
tensorboard = TensorBoard(log_dir='./logs', histogram_freq=checkpoint_every, batch_size=batch_size,
                          write_graph=False, write_grads=True, write_images=False,
                          embeddings_freq=checkpoint_every,
                          embeddings_layer_names=None)

# Training
model.fit(training_inputs, training_labels,
          validation_data=(validation_inputs, validation_labels),
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          callbacks=[tensorboard])