keras —— 常用模型构建

本文介绍如何使用Keras的序贯模型进行多种任务的深度学习建模,包括多层感知机、卷积神经网络及循环神经网络等,并提供详细的代码示例。
部署运行你感兴趣的模型镜像

序列模型Sequential是层的线性堆叠

可以通过将一个层列表传递到构建器的方式创建Sequential

from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential([
    Dense(32, input_shape=(784,)),
    Activation('relu'),
    Dense(10),
    Activation('softmax'),
])

也可以通过.add()方法增加层

model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation('relu'))

明确输入形状

模型需要知道预期的输入形状,因此Sequential模型的第一层(只需第一层,因为后面的层能自动计算形状)需要收到输入形状的信息。有几种方式可以实现:

*将input_shape申明传入第一层。这是一个形状元组(整数或None,None意味可能是任意正整数),这里不包含批次维度。

*一些2D层如Dense,通过申明input_dim支持指明输入形状,一些3D的时序层支持申明input_dim和input_length。

*如果需要指明固定的输入批次规模(对状态循环网络有用),可以将batch_size申明传入一个层。如果传入batch_size=32和input_shape(6, 8)至一个层,它将期望所有输入批次形状为(32,6, 8)。

下列片断意义相同:

model = Sequential()
model.add(Dense(32, input_shape=(784,)))

model = Sequential()
model.add(Dense(32, input_dim=784))
编译

训练模型前应设置学习进程,通过compile方法实现,它接受3个申明:

*优化器,可以是现成的优化器如rmsprop或者adagrad,或者是自定义Optimizer类的实例。

*损失函数,模型要最小化的对象,可以是现成的如categorical_crossentropy或者mse,或者自定义。

*度量列表,对于分类问题我们使用metrics=['accuracy'],可以是其它现成度量或自定义。

# For a multi-class classification problem 多类分类问题
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# For a binary classification problem 二分问题
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# For a mean squared error regression problem 平均平方差回归问题
model.compile(optimizer='rmsprop',
              loss='mse')

# For custom metrics 自定义度量
import keras.backend as K

def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy', mean_pred])

训练

keras模型在输入数据和标签的Numpy数组上训练,对于训练模型一般使用fit。

# For a single-input model with 2 classes (binary classification): 二分问题

model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Generate dummy data 生成简单试验数据
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))

# Train the model, iterating on the data in batches of 32 samples 训练模型
model.fit(data, labels, epochs=10, batch_size=32)

# For a single-input model with 10 classes (categorical classification): 多分类问题

model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Generate dummy data 生成试验数据
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(10, size=(1000, 1))

# Convert labels to categorical one-hot encoding 将标签转化成类
one_hot_labels = keras.utils.to_categorical(labels, num_classes=10)

# Train the model, iterating on the data in batches of 32 samples 训练模型
model.fit(data, one_hot_labels, epochs=10, batch_size=32)


一些有用的例子

MLP 多层感知机二分问题

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout

# Generate dummy data
x_train = np.random.random((1000, 20))
y_train = np.random.randint(2, size=(1000, 1))
x_test = np.random.random((100, 20))
y_test = np.random.randint(2, size=(100, 1))

model = Sequential()
model.add(Dense(64, input_dim=20, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

model.fit(x_train, y_train,
          epochs=20,
          batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)

VGG-类卷积网络

import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import SGD

# Generate dummy data
x_train = np.random.random((100, 100, 100, 3))
y_train = keras.utils.to_categorical(np.random.randint(10, size=(100, 1)), num_classes=10)
x_test = np.random.random((20, 100, 100, 3))
y_test = keras.utils.to_categorical(np.random.randint(10, size=(20, 1)), num_classes=10)

model = Sequential()
# input: 100x100 images with 3 channels -> (100, 100, 3) tensors.
# this applies 32 convolution filters of size 3x3 each.
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(100, 100, 3)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)

model.fit(x_train, y_train, batch_size=32, epochs=10)
score = model.evaluate(x_test, y_test, batch_size=32)

LSTM 序列分类

from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Embedding
from keras.layers import LSTM

model = Sequential()
model.add(Embedding(max_features, output_dim=256))
model.add(LSTM(128))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

model.fit(x_train, y_train, batch_size=16, epochs=10)
score = model.evaluate(x_test, y_test, batch_size=16)

有1D卷积的序列分类

from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Embedding
from keras.layers import Conv1D, GlobalAveragePooling1D, MaxPooling1D

model = Sequential()
model.add(Conv1D(64, 3, activation='relu', input_shape=(seq_length, 100)))
model.add(Conv1D(64, 3, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(128, 3, activation='relu'))
model.add(Conv1D(128, 3, activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

model.fit(x_train, y_train, batch_size=16, epochs=10)
score = model.evaluate(x_test, y_test, batch_size=16)

堆叠LSTM 时序分类

这里我们堆叠3个LSTM层,使模型能学习高级时序表征。

前两层LSTM返回全部输出序列,但最后一层只返回输出序列最后一步,舍弃时序维度(将输入序列转化为单一向量)

stacked LSTM

from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np

data_dim = 16
timesteps = 8
num_classes = 10

# expected input data shape: (batch_size, timesteps, data_dim)
model = Sequential()
model.add(LSTM(32, return_sequences=True,
               input_shape=(timesteps, data_dim)))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32))  # return a single vector of dimension 32
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# Generate dummy training data
x_train = np.random.random((1000, timesteps, data_dim))
y_train = np.random.random((1000, num_classes))

# Generate dummy validation data
x_val = np.random.random((100, timesteps, data_dim))
y_val = np.random.random((100, num_classes))

model.fit(x_train, y_train,
          batch_size=64, epochs=5,
          validation_data=(x_val, y_val))

状态堆叠LSTM模型

状态循环模型是内部状态(记忆)由处理一批次样本后重复利用作为下一批次的初始状态得到。这使处理更长的序列并保持卷积复杂度可控成为可能。

from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np

data_dim = 16
timesteps = 8
num_classes = 10
batch_size = 32

# Expected input batch shape: (batch_size, timesteps, data_dim)
# Note that we have to provide the full batch_input_shape since the network is stateful.
# the sample of index i in batch k is the follow-up for the sample i in batch k-1.
model = Sequential()
model.add(LSTM(32, return_sequences=True, stateful=True,
               batch_input_shape=(batch_size, timesteps, data_dim)))
model.add(LSTM(32, return_sequences=True, stateful=True))
model.add(LSTM(32, stateful=True))
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# Generate dummy training data
x_train = np.random.random((batch_size * 10, timesteps, data_dim))
y_train = np.random.random((batch_size * 10, num_classes))

# Generate dummy validation data
x_val = np.random.random((batch_size * 3, timesteps, data_dim))
y_val = np.random.random((batch_size * 3, num_classes))

model.fit(x_train, y_train,
          batch_size=batch_size, epochs=5, shuffle=False,
          validation_data=(x_val, y_val))

您可能感兴趣的与本文相关的镜像

TensorFlow-v2.9

TensorFlow-v2.9

TensorFlow

TensorFlow 是由Google Brain 团队开发的开源机器学习框架,广泛应用于深度学习研究和生产环境。 它提供了一个灵活的平台,用于构建和训练各种机器学习模型

<think>好的,我现在需要帮助用户解决如何构建和训练ResNet模型的问题,用户提到了使用PyTorch或TensorFlow。首先,我得回忆一下ResNet的基本结构,特别是残差块和跳跃连接的设计。ResNet的核心是通过残差学习来解决深度网络中的梯度消失问题,所以需要解释清楚残差块的结构。 接下来,用户可能需要具体的代码示例,所以我得考虑用PyTorch和TensorFlow分别展示模型构建。PyTorch方面,可以使用torch.nn模块,定义残差块,然后搭建ResNet-18。对于TensorFlow,可以用Keras的Functional API来构建类似的结构,确保跳跃连接正确实现。 然后是关于训练的部分,用户可能想知道数据预处理、损失函数、优化器的选择以及训练循环的设置。这里需要提到常用的优化器如SGD或Adam,学习率调整策略,以及数据增强的方法。还要注意在训练过程中使用验证集来监控模型性能,防止过拟合。 另外,用户可能对ResNet的变体(如ResNet-34、ResNet-50)感兴趣,可以简要提到调整层数的方法。同时,可能存在的常见问题如梯度消失、模型过深导致的训练困难,需要说明残差连接如何缓解这些问题。 最后,生成相关问题时,要涵盖原理、结构、应用场景、与其他模型的比较以及实际训练中的调参技巧,这样能帮助用户更全面地理解ResNet。</think>### 如何构建和训练ResNet模型? #### 1. ResNet核心原理 ResNet(残差网络)通过引入残差学习解决深度神经网络梯度消失问题,核心结构为残差块。其数学形式为: $$y = F(x, \{W_i\}) + x$$ 其中$x$是输入,$F$是残差函数,$y$是输出[^1]。当输入输出维度不同时,使用线性投影: $$y = F(x, \{W_i\}) + W_sx$$ #### 2. PyTorch实现 ```python import torch import torch.nn as nn class ResidualBlock(nn.Module): def __init__(self, in_channels, out_channels, stride=1): super().__init__() self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channels) self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(out_channels) self.shortcut = nn.Sequential() if stride != 1 or in_channels != out_channels: self.shortcut = nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(out_channels) ) def forward(self, x): out = torch.relu(self.bn1(self.conv1(x))) out = self.bn2(self.conv2(out)) out += self.shortcut(x) return torch.relu(out) class ResNet18(nn.Module): def __init__(self, num_classes=1000): super().__init__() self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False) self.bn1 = nn.BatchNorm2d(64) self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) # 创建残差块 self.layer1 = self._make_layer(64, 64, stride=1) self.layer2 = self._make_layer(64, 128, stride=2) self.layer3 = self._make_layer(128, 256, stride=2) self.layer4 = self._make_layer(256, 512, stride=2) self.avgpool = nn.AdaptiveAvgPool2d((1,1)) self.fc = nn.Linear(512, num_classes) def _make_layer(self, in_channels, out_channels, stride): return nn.Sequential( ResidualBlock(in_channels, out_channels, stride), ResidualBlock(out_channels, out_channels) ) ``` #### 3. TensorFlow/Keras实现 ```python import tensorflow as tf from tensorflow.keras import layers, Model def residual_block(x, filters, stride=1): shortcut = x if stride != 1 or x.shape[-1] != filters: shortcut = layers.Conv2D(filters, 1, strides=stride)(x) shortcut = layers.BatchNormalization()(shortcut) x = layers.Conv2D(filters, 3, strides=stride, padding='same')(x) x = layers.BatchNormalization()(x) x = layers.ReLU()(x) x = layers.Conv2D(filters, 3, padding='same')(x) x = layers.BatchNormalization()(x) x = layers.Add()([x, shortcut]) return layers.ReLU()(x) def build_resnet18(input_shape=(224,224,3), num_classes=1000): inputs = layers.Input(shape=input_shape) x = layers.Conv2D(64, 7, strides=2, padding='same')(inputs) x = layers.BatchNormalization()(x) x = layers.ReLU()(x) x = layers.MaxPool2D(3, strides=2, padding='same')(x) # 堆叠残差块 x = residual_block(x, 64) x = residual_block(x, 64) x = residual_block(x, 128, stride=2) x = residual_block(x, 128) x = residual_block(x, 256, stride=2) x = residual_block(x, 256) x = residual_block(x, 512, stride=2) x = residual_block(x, 512) x = layers.GlobalAveragePooling2D()(x) outputs = layers.Dense(num_classes, activation='softmax')(x) return Model(inputs, outputs) ``` #### 4. 训练步骤(PyTorch示例) ```python # 数据预处理 transform = transforms.Compose([ transforms.Resize(256), transforms.RandomCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) # 初始化模型 model = ResNet18(num_classes=1000) criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1) # 训练循环 for epoch in range(100): model.train() for inputs, labels in train_loader: outputs = model(inputs) loss = criterion(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() scheduler.step() # 验证代码... ``` #### 5. 关键技巧 1. **数据增强**:随机裁剪、水平翻转、颜色抖动 2. **学习率策略**:初始学习率0.1,每30个epoch衰减10倍 3. **权重初始化**:使用He初始化卷积层权重 4. **正则化**:权重衰减(weight decay)设为$1\times10^{-4}$ 5. **梯度裁剪**:设置梯度阈值防止爆炸
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值