Deep learning1

最新推荐文章于 2023-10-10 14:07:43 发布

YoutiaoNo2

最新推荐文章于 2023-10-10 14:07:43 发布

阅读量152

点赞数

本文链接：https://blog.youkuaiyun.com/YoutiaoNo2/article/details/109309345

版权

本文深入探讨了深度学习领域的关键概念和技术，包括半监督学习、迁移学习、强化学习等，并详细介绍了Keras框架的应用、CNN的工作原理及参数计算方法，以及解决梯度消失问题的有效途径。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Deep Learning

- 知识点

知识点

各种学习。
- semi - supervised learning：部分有label，部分没有label。
- transfer learning：包含不相关数据的学习
- unsupervised learning：没有label的学习
- structured learning：输入和输出都有结构化的对象。输出可以是图像，语音，语句等，较复杂。
- reinforcement learning：从评价中学习，比较符合人类真实的学习过程。
Learning rate调参: 随着epoch增大减小。
Adagrad: 自适应调整学习率
$w_{t+1} = w_{t}-\frac{\eta_{t}}{\sigma_{t}}g^{t} \quad \sigma_{t}= \sqrt{\frac{1}{t+1}\sum_{i=0}^{t}(g^{i})^2 }$

在这里插入图片描述

反向传播：链导法。
Keras: 集成度高，灵活度不高，高层API，易学易用。e.g.

from Keras import ....
model = Sequential()
#first layer
model.add(Dense(input_dim=28*28,units = 500,activation='sigmoid'))
#second layer
model.add(Dense(units = 500,activation='relu'))
#output layer
model.add(Dense(units=10,activation='softmax'))
#model configure
model.compile(loss = 'categorical crossentropy',
optimizer = 'adam',metrics=['accuracy'])
# train model
model.fit(train_x,train_y,batch_size=100,nb_epoch= 20)
model.evaluate(validation_x,validation_y)
result = model.predict(test_x)

Mini-Batch下降法：batch size可以取2的平方，太大会导致表现变差。
Mini-batch比Stochastic Gradient方法快在先合成矩阵再进行计算,Matrix Optimization。
梯度消失：随着层数变多，靠近输出层，梯度较大，更新较快，靠近输入层更新较慢。由于链导法导致。
解决方法：ReLu。
$\sigma(z) = max(z,0)$
- 很快
- 无限的sigmoid function叠在一起
- 解决梯度消失问题
- 生物学理由
Maxout：Learnable activation function。ReLU是Maxout的特殊情况。
RMS Prop:
Dropout作用: 随机丢弃一部分neuron，防止过拟合。Testing时不要dropout。
算是一种ensemble的方法，每次使用不同的set来training。
图像。
- CNN：卷积神经网络
- Pooling Layer：池化层
- Flatten：压扁后全连接输出
CNN核心参数：
- stride: 卷积框运算间隔
- padding：是否在周围加上一圈值，防止stride无法整除
- filter size: 长，宽，通道数
  卷积参数量计算：
  $f_1\times x \times x\times f2$
  其中 $f_1$ 为上层通道数， $f_2$ 为下层通道数。
语音识别
- classification问题: input -> acoustic feature, output -> state
- Each state has a stationary distribution for acoustic features
- 常用模型包括HMM和GMM
- Phoneme音调
- The lower layers detect the manner of articulation
- All the phonemes share the results from the same set of detectors
- Use parameters effectively
- End to end Learning: 不需要考虑中间的hand crafted过程。
半监督学习：
- Transductive Learning: Unlabeled data is the testing data
- Inductive Learning Unlabeled data is not the testing data
- Why doing this? Collecting ‘labeled’ data is hard所以我们需要做半监督学习。
Graph-based 相似度衡量：
PCA：

求解过程
$z_1= w^1\cdot x \quad \bar{z_1} = w^{1}\cdot \bar{x}$
$\begin{aligned} Var(z_1) &= \sum_{z_1}(z_1-\bar{z_1})^2 \\ & = \sum_{x}(w^1x-w^1\bar{x})^2\\ & = \sum(w^1\cdot(x-\bar{x}))^2\\ & = \sum((w^1)^{T})(x-\bar{x})(x-\bar{x})^{T}w^1\\ & = (w^1)^{T}Cov(x)w^1 \end{aligned}$
所以转换为如下优化问题：
$\quad (w^1)^{T}Sw^1 \quad s.t. (w^1)^{T}w^1=1$

使用拉格朗日乘子法求解：
$g(w^1) = (w^1)^{T}Sw^1-\alpha((w^1)^{T}w^1-1)$
$\begin{cases} \frac{\partial g(w^1)}{\partial w_1^1}=0 \\ \frac{\partial g(w^1)}{\partial w_2^1}=0\\ \dots \end{cases} \Rightarrow Sw^1 = \alpha w^1 \Rightarrow (w^1)^{T}Sw^1 = \alpha$
$\alpha$ 最大化即 $w^1$ 是S的最大特征值对应的特征向量。同理， $w^2$ 是S的第二大特征值对应的特征向量。

主成分分析去关联性：
$\quad Cov(z)=D$
$\begin{aligned} Cov(z) &= \sum(z-\bar{z})(z-\bar{z})^{T}=WSW^{T} \\ & = WS[w^1 \dots w^{K}] \\ & = W[Sw^1 \dots Sw^{K}] \\ & = W[\lambda_1w^1 \dots \lambda_{K}w^{K}] \\ & = [\lambda_1Ww^1\dots \lambda_{K}Ww^{K}]\\ & = [\lambda_1e^1 \dots \lambda_{K}e^{K}] \end{aligned}$
PCA看上去像有一隐层的神经网络，线性激活函数。
更多方法：

CBOW：上下文预测该词。Skip-Gram：该词预测上下文。
LLE: locally Linear Embedding:
Laplacian Eigenmap, Graph based approach:
$L=\sum_{x^{r}}C(y^{r},\hat{y^{r}})+\lambda S \\ S = \frac{1}{2}\sum_{i,j}w_{i,j}(y^{i}-y^{j})^2=y^{T}Ly\\ L = D-W \quad graph matrix$
For unsupervised learning,
$S=\frac{1}{2}\sum_{i,j}w_{i,j}(z^{i}-z^{j})^2$
$span\{z^1,z^2,\dots,z^{m}\}=R^{m}$
Spectral clustering: clustering on z
T-distributed Stochastic Neighbor Embedding(t-SNE):
高维点进行可视化的好方法。

excellent tutorial of t-SNE:
https://github.com/oreillymedia/t-SNE-tutorial
Auto - Encoder

text retrieval
Vector Space Model
- 把query或者document表示成bag_of_words向量
- 使用autoencoder转换成2维向量
Similar image search
- 将图像转换成256维向量
- 使用距离度量计算相似度

Generative model:

Pixel RNN
Variational Auto encoder
Generative Adversarial Network(拟态)
GAN非常难训练。

Transfer learning：用不太相关数据辅助训练

Tasks: Speech recognition, image recognition, Text analysis, etc.
Finetune Model:
- Speech: copy the last layers
- Images: copy the first layers
Multitask Learning: 共用某些过程，执行不同任务
Domain Adversarial training
zero shot learning

Structured Learning:

Three problems
- Evaluation: What does F(x,y) looks like
- Inference: How to solve the ‘arg max’ problem
- Training: Given training data, how to find F(x,y)
Example1: Object Detection: 使用CNN输出bounding box和box label

Structured SVM:
https://blog.youkuaiyun.com/yjw123456/article/details/105010218
和SVM的异同：都是二次规划问题，structured SVM限制更多，使用cutting plane法求解。
Sequence Labeling:
Example: POS Tagging(标记一个句子中每个词的词性)