Char_Level_CNN_Model 模型介绍

本文深入剖析了一个基于字符级别的卷积神经网络(Char_Level_CNN_Model)模型,详细介绍了模型的各层结构,包括InputLayer、EmbeddingLayer、多个1D卷积层及全连接层。模型参数总数高达11,452,676,对数据集大小有较高要求,防止过拟合。文章还探讨了训练模型的构建与训练过程,使用了TensorBoard回调进行训练过程监控。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

模型介绍

模型参数总览

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
sent_input (InputLayer)      (None, 1014)              0         
_________________________________________________________________
embedding_14 (Embedding)     (None, 1014, 128)         8960      
_________________________________________________________________
conv1d_30 (Conv1D)           (None, 1008, 256)         229632    
_________________________________________________________________
activation_20 (Activation)   (None, 1008, 256)         0         
_________________________________________________________________
max_pooling1d_13 (MaxPooling (None, 336, 256)          0         
_________________________________________________________________
conv1d_31 (Conv1D)           (None, 330, 256)          459008    
_________________________________________________________________
activation_21 (Activation)   (None, 330, 256)          0         
_________________________________________________________________
max_pooling1d_14 (MaxPooling (None, 110, 256)          0         
_________________________________________________________________
conv1d_32 (Conv1D)           (None, 108, 256)          196864    
_________________________________________________________________
activation_22 (Activation)   (None, 108, 256)          0         
_________________________________________________________________
conv1d_33 (Conv1D)           (None, 106, 256)          196864    
_________________________________________________________________
activation_23 (Activation)   (None, 106, 256)          0         
_________________________________________________________________
conv1d_34 (Conv1D)           (None, 104, 256)          196864    
_________________________________________________________________
activation_24 (Activation)   (None, 104, 256)          0         
_________________________________________________________________
conv1d_35 (Conv1D)           (None, 102, 256)          196864    
_________________________________________________________________
activation_25 (Activation)   (None, 102, 256)          0         
_________________________________________________________________
max_pooling1d_15 (MaxPooling (None, 34, 256)           0         
_________________________________________________________________
flatten_5 (Flatten)          (None, 8704)              0         
_________________________________________________________________
dense_13 (Dense)             (None, 1024)              8913920   
_________________________________________________________________
dropout_9 (Dropout)          (None, 1024)              0         
_________________________________________________________________
dense_14 (Dense)             (None, 1024)              1049600   
_________________________________________________________________
dropout_10 (Dropout)         (None, 1024)              0         
_________________________________________________________________
dense_15 (Dense)             (None, 4)                 4100      
=================================================================
Total params: 11,452,676
Trainable params: 11,452,676
Non-trainable params: 0
_________________________________________________________________

可以看出该模型参数量较大,因此对数据集的要求也很庞大,CPU跑如此大的数据集应该很不方便,但如果数据集太小,那么就会很容易overfit这些参数。

逐层介绍

Input Layer

input_size = 1014,表示输入层每次输入的单词量为1014,输入层无参数, 多退少补,每行就是1014。

# Input 
inputs = Input(shape=(input_size,), name='sent_input', dtype='int64')  # shape=(?, 1014)

Embedding Layer

需要识别的单词类别是69个, 'abcdefghijklmnopqrstuvwxyz0123456789-,;.!?:\'"/\\|_@#$%^&*~+-=<>()[]{}', 其他无法识别的得用一个<\UNK>去代替,所以识别单词数是70个, 而embedding_size = 128, 所以这一层的参数数是 128 ∗ 70 = 8960 128*70 = 8960 12870=8960

# Embedding layer
conv = Embedding(alphabet_size+1, embedding_size, input_length=input_size)(inputs)

注意这里得用conv层表示,因为后面5层用了循环,然后如果这里用embedding,而后面用conv, 循环每次调用的都是embedding层,而不是几个顺次的conv层,从而报错。

Convolution layers 1D

for filter_num, filter_size, pooling_size in conv_layers:
    conv = Conv1D(filter_num, filter_size)(conv) 
    conv = Activation('relu')(conv)
    if pooling_size != -1:
        conv = MaxPooling1D(pool_size=pooling_size)(conv) 

这六层的参数,每一层的三个参数分别代表filter_num, filter_size, pooling_size,其中若pooling_size == -1, 意味着这一层不做MaxPooling。
[ [ 256 , 7 , 3 ] , [ 256 , 7 , 3 ] , [ 256 , 3 , − 1 ] , [ 256 , 3 , − 1 ] , [ 256 , 3 , − 1 ] , [ 256 , 3 , 3 ] ] , [[256, 7, 3], [256, 7, 3], [256, 3, -1], [256, 3, -1], [256, 3, -1], [256, 3, 3]], [[256,7,3],[256,7,3],[256,3,1],[256,3,1],[256,3,1],[256,3,3]],

逐层计算参数量:
  • Conv1D-1: 输入size ( N o n e , 1014 , 128 ) (None, 1014, 128) (None,1014,128),256 个 filter, 每个filter(1D)的size是 ( 128 , 7 ) (128, 7) (128,7),从而参数量为 ( 128 ∗ 7 + 1 ) ∗ 256 = 229632 (128*7+1)*256 = 229632 (1287+1)256=229632,输出size变为 ( N o n e , 1014 − 7 + 1 , 256 ) = ( N o n e , 1008 , 256 ) (None, 1014-7+1, 256) = (None, 1008, 256) (None,10147+1,256)=(None,1008,256)
  • Activation-1: ’ReLu’函数,没有参数,不变输入size
  • Maxpooling-1: 输入size ( N o n e , 1008 , 256 ) (None, 1008, 256) (None,1008,256),size = 3,无参数,输出size为 ( N o n e , 336 , 256 ) (None, 336, 256) (None,336,256)
  • Conv1D-2:输入size ( N o n e , 336 , 256 ) (None, 336, 256) (None,336,256),256个filter,每个filter(1D)的size是 ( 256 , 7 ) (256, 7) (256,7),从而参数量为 ( 256 ∗ 7 + 1 ) ∗ 256 = 459008 (256*7+1)*256 = 459008 (2567+1)256=459008,输出size变为 ( N o n e , 336 − 7 + 1 , 256 ) = ( N o n e , 330 , 256 ) (None, 336-7+1, 256) = (None, 330, 256) (None,3367+1,256)=(None,330,256)
  • Activation-2: ’ReLu’函数,没有参数,不变输入size
  • Maxpooling-2: 输入size ( N o n e , 330 , 256 ) (None, 330, 256) (None,330,256),size = 3,无参数,输出size为 ( N o n e , 110 , 256 ) (None, 110, 256) (None,110,256)
  • Conv1D-3:输入size ( N o n e , 110 , 256 ) (None, 110, 256) (None,110,256),256个filter,每个filter(1D)的size是 ( 256 , 3 ) (256, 3) (256,3),从而参数量为 ( 256 ∗ 3 + 1 ) ∗ 256 = 196864 (256*3+1)*256 = 196864 (2563+1)256=196864,输出size变为 ( N o n e , 110 − 3 + 1 , 256 ) = ( N o n e , 108 , 256 ) (None, 110-3+1, 256) = (None, 108, 256) (None,1103+1,256)=(None,108,256)
  • Activation-3: ’ReLu’函数,没有参数,不变输入size
  • Conv1D-4:输入size ( N o n e , 108 , 256 ) (None, 108, 256) (None,108,256),256个filter,每个filter(1D)的size是 ( 256 , 3 ) (256, 3) (256,3),从而参数量为 ( 256 ∗ 3 + 1 ) ∗ 256 = 196864 (256*3+1)*256 = 196864 (2563+1)256=196864,输出size变为 ( N o n e , 108 − 3 + 1 , 256 ) = ( N o n e , 106 , 256 ) (None, 108-3+1, 256) = (None, 106, 256) (None,1083+1,256)=(None,106,256)
  • Activation-4: ’ReLu’函数,没有参数,不变输入size
  • Conv1D-5:输入size ( N o n e , 106 , 256 ) (None, 106, 256) (None,106,256),256个filter,每个filter(1D)的size是 ( 256 , 3 ) (256, 3) (256,3),从而参数量为 ( 256 ∗ 3 + 1 ) ∗ 256 = 196864 (256*3+1)*256 = 196864 (2563+1)256=196864,输出size变为 ( N o n e , 106 − 3 + 1 , 256 ) = ( N o n e , 104 , 256 ) (None, 106-3+1, 256) = (None, 104, 256) (None,1063+1,256)=(None,104,256)
  • Activation-5: ’ReLu’函数,没有参数,不变输入size
  • Conv1D-6:输入size ( N o n e , 104 , 256 ) (None, 104, 256) (None,104,256),256个filter,每个filter(1D)的size是 ( 256 , 3 ) (256, 3) (256,3),从而参数量为 ( 256 ∗ 3 + 1 ) ∗ 256 = 196864 (256*3+1)*256 = 196864 (2563+1)256=196864,输出size变为 ( N o n e , 104 − 3 + 1 , 256 ) = ( N o n e , 102 , 256 ) (None, 104-3+1, 256) = (None, 102, 256) (None,1043+1,256)=(None,102,256)
  • Activation-6: ’ReLu’函数,没有参数,不变输入size
  • Maxpooling-6: 输入size ( N o n e , 102 , 256 ) (None, 102, 256) (None,102,256),size = 3,无参数,输出size为 ( N o n e , 34 , 256 ) (None, 34, 256) (None,34,256)

Flatten Layer

一层Flatten layer 便于后面做FC layer。无参数 。

x = Flatten()(conv) # (None, 256*34 = 8704)

Full Connected(FC) Layers and Output Layer

有三层FC层,前两层的输出size都是 ( N o n e , 1024 ) (None, 1024) (None,1024), 最后一层输出size与label的one-hot向量的size一致,这样才能对该这段alphabet的输入进行出自哪个label的概率判断。(如果是推测概率需要用softmax函数,如果是预测则可用sigmoid函数)。每个connected层的每个neuron有dropout_p = 0.1的概率在该次训练中被deactivate。

# Full Connected Layer
for dense_size in fully_connected_layers:
    x = Dense(dense_size, activation='relu')(x) # dense_size == 1024
    x = Dropout(dropout_p)(x)
# Output Layer
predictions = Dense(num_of_classes, activation='softmax')(x)
参数数量预测:
  • FC Layer1: input_size = 8704, output_size = 1024, 所以参数量为 8704 ∗ 1024 + 1024 = 8913920 8704*1024+1024 = 8913920 87041024+1024=8913920
  • FC Layer2: input_size = 1024, output_size = 1024, 所以参数量为 1024 ∗ 1024 + 1024 = 1049600 1024*1024+1024 = 1049600 10241024+1024=1049600
  • Dropout1 and Dropout2: 没有参数
  • Output Layer: input_size = 1024, output_size = 4, 所以参数量为 1024 ∗ 4 + 4 = 4100 1024*4+4 = 4100 10244+4=4100

Build Model

loss = 'categorical_crossentropy',
optimizer='adam'
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer=optimizer, loss=loss) # Adam, categorical_crossentropy

Train Model

# Create callbacks
from tensorflow.keras.callbacks import TensorBoard
tensorboard = TensorBoard(log_dir='./logs', histogram_freq=checkpoint_every, batch_size=batch_size,
                          write_graph=False, write_grads=True, write_images=False,
                          embeddings_freq=checkpoint_every,
                          embeddings_layer_names=None)

# Training
model.fit(training_inputs, training_labels,
          validation_data=(validation_inputs, validation_labels),
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          callbacks=[tensorboard])
import pandas as pd import numpy as np from sklearn.preprocessing import MinMaxScaler, OneHotEncoder import tensorflow from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense, Embedding, LSTM, Concatenate, Dropout, BatchNormalization from tensorflow.keras.optimizers import Adam from sklearn.model_selection import train_test_split from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau # 1.数据预处理与特征工程 # 加载数据集 df = pd.read_csv("training_data.csv") # 数值特征标准化 num_features = ['position', 'quality'] scaler = MinMaxScaler() df[num_features] = scaler.fit_transform(df[num_features]) # 序列特征编码 tokenizer = Tokenizer(char_level=True, num_words=4) # 仅A,C,G,T四种碱基 tokenizer.fit_on_texts(df['context']) sequences = tokenizer.texts_to_sequences(df['context']) max_length = max(len(seq) for seq in sequences) padded_sequences = pad_sequences(sequences, maxlen=max_length, padding='post') # 标签提取 labels = df['label'].values # 2.双输入混合模型架构 # 序列输入分支 sequence_input = Input(shape=(max_length,), name='sequence_input') embedding = Embedding(input_dim=5, output_dim=8, input_length=max_length)(sequence_input) # 5=4碱基+填充 lstm_out = LSTM(32, return_sequences=False)(embedding) # 数值特征输入分支 numeric_input = Input(shape=(len(num_features),), name='numeric_input') dense_numeric = Dense(16, activation='relu')(numeric_input) bn_numeric = BatchNormalization()(dense_numeric) # 合并分支 concatenated = Concatenate()([lstm_out, bn_numeric]) dense1 = Dense(64, activation='relu')(concatenated) dropout1 = Dropout(0.3)(dense1) dense2 = Dense(32, activation='relu')(dropout1) output = Dense(1, activation='sigmoid')(dense2) # 构建模型 model = Model(inputs=[sequence_input, numeric_input], outputs=output) model.compile( optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy', 'AUC'] ) model.summary() # 3.模型训练与评估 # 划分训练集和测试集 X_seq_train, X_seq_test, X_num_train, X_num_test, y_train, y_test = train_test_split( padded_sequences, df[num_features].values, labels, test_size=0.2, stratify=labels ) # 回调函数 callbacks = [ EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True), ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=1e-6) ] # 训练模型 history = model.fit( [X_seq_train, X_num_train], y_train, validation_data=([X_seq_test, X_num_test], y_test), epochs=100, batch_size=64, callbacks=callbacks, class_weight={0: 1, 1: 2} # 处理类别不平衡 ) # 评估模型 test_loss, test_acc, test_auc = model.evaluate( [X_seq_test, X_num_test], y_test ) print(f"测试准确率: {test_acc:.4f}, AUC: {test_auc:.4f}") 请优化该代码 tensorflow.keras.preprocessing.text tensorflow.keras.preprocessing.sequence tensorflow.keras.models tensorflow.keras.layers tensorflow.keras.optimizers tensorflow.keras.callbacks 无法导入
最新发布
07-19
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值